Summary Essay on “Web Pages, Text Types. Linguistics Features: Some Issues” by Marina
Santini
Author Marina Santini in her journal,
“Web Pages, Text Types, Linguistics Features: Some Issues” emphasized that web
page is considered as a new type of document. She suggests that web page possesses more complexities than paper documents. One document on a web page
contains several texts with different communicative function. For instance, one
page can be divided into some parts and those parts were organized by links.
Various tablets of words scattered around the main documents, such as
navigation button, menu, ads, and search bar are the link that connecting one
document to the others. Unlike paper documents, it is possible for web page to
lose its specific linguistics and textual characteristic because of its visual
structure. Thus, the author tries to investigate text typology on web page
based on linguistics features, more specifically text types. In her journal,
Santini chooses two well-established studies by Biber (Multidimensional
analysis: 2004) and Werlich (1976) to learn whether the text types suggested
in those studies are suitable and applicable to web pages. Biber’s
Multidimensional Analysis relies on inductive statistical approach based on
factor analysis and cluster analysis and it focuses only on linguistic features
(lexical, morphological, and syntactic
classes). The analyses resulting in four dimensions: personal involved narration, persuasive-argumentative discourse,
advice, and abstract-technical discourse. However, Werlich analyzed five
text types: narration, description,
exposition, argumentation, and instruction. The author also adds two broad
text types in her study: Nominal vs
Verbal. NLP tools were used to converting web pages from HTML version into
ASCII format.
Furthermore in her study, Santini finds
out that over 50% threshold of the web page refers to the nominal text type.
This means less linguistics features on web pages and the probability of its
suitableness with text types is decreased. She proposes six issues related to
this unsuitableness. The six issues are: Elements
of text coded as images, headings, lists, proper nouns, tabular text, and mixed
text. The first issues happens when some text elements of web page coded as image embedded in HTML page are
lost when converted into ASCII version (text without pictures). Santini agrees
that the solution for this problem is hardly to find. The second issue occurs when the tools did
not detect a heading because it were
wrote inside a sentence rather than as an independent unit. Adding HTML tags
for headings <h#> can solve the
issue. As for lists, the issues lies
on stylometric measurement such as the average length of a sentence. It is
because the nature of lists which is always semantically incomplete and does
not end with punctuation. Solution for this issue is by adding <li> mark as artificial sentence
boundaries. The fourth issue comes to proper
nouns which can be found in almost every web page. Usually it contains a
list of names or personal details. Unfortunately, the NLP tools is useless in
this case. The tabular structure is
quite difficult to be analyzed by linguistics standpoints. Last issue goes to mixed texts. It talks about the strings
of text surrounding the main body of a web page that semantically separated
from its main body and provides only additional information to the reader. At
least contains of three text types: a
comment (the main article), an
informational list (the headlines on the right side), and an index (the items on the left). According
to these issues, the author concludes that the issues stated above do not have
an easy solution. Textuality of web page also do not make it any easier for
automatic extraction application such as NLP to interpret it. The author acknowledges that it is possible for the same thing to happen to other similar
tools when used for similar study. Thus, further discussions and investigations
are needed.
wynne ert , february 2012
hadoohhhhhh...keren neh si Non....
BalasHapusBlognya pake bahasa English...sampe2 kurang paham gw bacanya. hahaha...
Btw...sukur banget akhirnya bisa liat profil Wynee..wkakakaa
Salam kenal....
KEEP PEACE `n ROCX...4EVER!
(lazzo adi)
perkenalkn saya psikiater ank2 muda jk ada mslh dan membutuhkan bantuan bs contac saya 08176895644 or Pin BB 220CF6D0.untk mslh kcl akn mndptkn free konsultasi
BalasHapusFum IS chum
BalasHapusWah.. Lumayan kak blog.ny.. Tp maaf aq ga bsa ngomentarin, al.ny aq pakar bhsa Indonesia..hehehe..
BalasHapusSo, kmbangkan lg, kalo bsa saingan ma blog Cerita-Misteri..hehehe..
wah..cermisi udh terkenal..lagipula beda aliran.. Blog ini khusus ngbahas linguistik n sastra inggris..
BalasHapus