Tikalon Blog is now in archive mode.
An easily printed and saved version of this article, and a link
to a directory of all articles, can be found below: |
This article |
Directory of all articles |
Readability and Word Length
September 14, 2012
As
students know, some things are harder to read than others. The spectrum of writing extends from the single
syllable words in the
Dick and Jane books of my youth, to the heady reading found in
college text books and
articles in scholarly journals.
In one memorable episode of
The Bob Newhart Show, Bob's
dentist friend, Jerry, writes a
children's book. He submits his manuscript as many pages with one word per page. His rejection letter arrives in the same format, one word per page.
Many
word processors have
lexicographic analysis functions, such as word count, which is an important metric for student submissions. They also have a
readability analysis designed to estimate the
target audience for your text. The most popular of these is the
Flesch-Kincaid readability test[1] that presents its results as either a percentage reading ease, or as a grade level index; viz.,
Flesch Reading Ease =
205.835 - (1.015*(words/sentences))-(84.6*(syllables/words))
Flesch Kincaid Grade =
return (0.39*(words/sentences))+(11.8*(syllables/words))-15.59,
in which words, sentences and syllables are the total counts for these objects in the manuscript. The grade level is designed to track the
school grade levels in the US. The reading ease corresponds to text being understood by a particular age group (90->100 = 11 year olds, 60->70 = 13-15 year olds, and 0->30 =
college graduate.)
Writer's angst.
Teachers and publishers alike are happy that modern electronics have eliminated the chicken scratching they were once forced to decipher.
A manuscript page of À la recherche du temps perdu (Remembrance of Things Past) by Marcel Proust.
(Not to be confused with "La vie et l'époque de Frank Perdue."[2])
(Via Wikimedia Commons))
When my
children were in
elementary school and
high school, we would apply the grade test to their school reports to see how they fared. The object there was to make the grade level as high as possible. The purpose of these tests is actually the opposite. Development of the grade level test was funded by the
US military to ensure that their
training materials and
maintenance manuals were understood. It's also used by some
publishers to "
dumb-down" their content to make it more salable. Next time you're in the
supermarket, scan the
tabloids at the
checkout.
I don't try to dumb-down anything in this blog, but its reading level is not that extreme. A recent, relatively low-tech article,
Work, September 3, 2012 has a Flesch Reading Ease of 62%, and a Flesch-Kincaid Grade Level of 8.9. The previous, more technical article (
Harder than Diamond, August 31, 2012), scores 49.9% and grade 9.5. There's no reason why your high-schooler shouldn't be reading this blog!
These scores were calculated by a
C language program I developed just for this purpose. You can grab the
source code here. Looking at the above formulas, you would think that such a program is easy to write. Counting words and sentences is somewhat easy, but the syllable count is the hard part. An extreme program might use a
dictionary for this, but many of the words in this blog would not be found there. Instead, we use a simple method that's accurate enough for our purpose.
The
vowels (a,e,i,o,u,y) are the key. The number of syllables in a word is almost always equal to the number of vowels, with two conditions. When vowels appear in pairs (
diphthongs), they have a single sound, so we eliminate any vowel that follows another. Also, there are certain silent endings that must be addressed. We simply eliminate -e, -es and -ed from our count. This syllable count is not 100% accurate, but how accurate are the readability scores themselves? All
scientists know that
approximation is allowed in certain cases.
As can be seen in the readability formulas, the number of syllables per word is the most important factor. This is no surprise to children who complain about "big words," so word length is an important
linguistic concept. An article about word length has recently been posted on the
arXiv preprint server.[3]
The authors used the
Google Books corpus for the analysis of temporal trend in word length. I wrote about linguistic analysis using Google Books and the
Google Ngram Viewer in two previous articles (
Culturomics, January 13, 2011 and
Word Extinction, August 17, 2011). Their results are shown in the graph, below.
Trend in word length.
Blue=common text; green=fiction; red = British English; aqua = American English.
(arXiv Preprint Server, fig. 1 of ref. 3.)[3)]
Note the recent "dumbing-down" of
American English. The authors of the arXiv paper associate the decrease in average word length with a shifting
political environment. I prefer my dumbing-down
hypothesis. Word length is an easily understood concept, but linguistics can get into more complicated areas, as another just published paper demonstrates.[4-5]
References:
- J. Peter Kincaid, Richard Braby and John E. Mears, "Electronic authoring and delivery of technical information," Journal of Instructional Development, vol. 11, no. 2 (June, 1988), pp. 8-13.
- "The Life and Times of Frank Purdue" (Unpublished).
- Vladimir V. Bochkarev, Anna V. Shevlyakova and Valery D. Solovyev, "Average word length dynamics as indicator of cultural changes in society," arXiv Preprint Server, August 30, 2012.
- How language change sneaks in, Linguistic Society of America Press Release, September 4, 2012 (PDF File).
- Hendrik De Smet, "The Course of Actualization," Preprint of Language paper, to appear, September, 2012 (PDF File).
- Phonics on the Web.
Permanent Link to this article
Linked Keywords: Student; syllable; Dick and Jane books; college text book; Sokal affair; articles in scholarly journals; The Bob Newhart Show; dentist; children's literature; children's book; word processor; lexicography; lexicographic; readability; analysis; target audience; Flesch-Kincaid readability test; education in the United States; school grade levels in the US; college graduate; In Search of Lost Time; À la recherche du temps perdu; Marcel Proust; Frank Perdue; Wikimedia Commons; child; children; elementary school; high school; United States Armed Forces; US military; training; maintenance, repair, and operations; manual; publisher; dumb-down; supermarket; supermarket tabloid; point of sale; checkout; C language; program; source code; readability.c; dictionary; vowel; diphthong; scientist; approximation; linguistics; arXiv preprint server; Google Books; corpus; Culturomics; Google Ngram Viewer; American English; politics; political; environment; hypothesis; J. Peter Kincaid, Richard Braby and John E. Mears.