Monday, January 11, 2016

How to learn ten thousand words

This is a companion post to Word frequency and incidental learning

While there are countless words in the English language, knowledge of between 3000 and 5000 most frequent word families will yield lexical coverage of 95-98%, depending on language content. This vocabulary size provides a good basis for comprehension and language use. This core vocabulary also provides a good basis for further comfortable vocabulary acquisition through extensive reading and listening which is the core mantra of the comprehensible input crowd (i.e. Krashen et al).

How do we go about developing a core vocabulary of some 3000 word families? Pick up any decent comprehensive course like the Assimil series and it will likely include the bulk of the required vocabulary. People occasionally "pick up" languages. If you're lucky, your previous linguistic knowledge will provide plenty of cognates. Speakers of multiple languages may possess a passive knowledge of cognates rivaling that of native speakers with little or no involvement in their new target language. Not everyone is equally successful at recognizing and exploiting cognates, but we'll leave that for another time.

Successful language learners will learn the bulk of their vocabulary through repeated encounters in different contexts.Words and their shades of meaning are learned gradually. According to some published research, in order to have a high probability of learning a new word from context you need to encounter it between five and twenty times*.

In his paper aptly named How much input do you need to learn 10,000 words? Nation suggested that a learner needs to meet around 3,000,000 words in order to learn the most frequent 9000 word families in English.

According to the statistical analysis table developed by Dr. Rob Waring:
  • To meet all the 3000 most frequent words in English 1, 5, 10 and 20 times, you’d need to read or otherwise meet 47,300, 236,700, 473,000, and 947,000 words, respectively.
  • To meet all the 5000 most frequent words in English 1, 5, 10 and 20 times, you’d need to read 132,100, 661,000, 1,321,000, and 2,642,000 words.
  • To meet all the 10,000 most frequent words in English 1, 5 and 10 times, you’d need to read 632,000, 3,164,000 and 6,328,947 words, respectively.
 In order to meet the most frequent 10,000 words 20 times, you will need to read or otherwise meet the equivalent of 12,657,895 running words or the equivalent of approximately 100 books the length of Pride and Prejudice. At 140 words per minute, you would hear 12.7 million words after 1507 hours of listening to audiobooks.

I see a bit of a discrepancy here. According to Nation 3.0 million words are sufficient for 12 repetitions at the "9th 1000 word level". According to Waring, in order to meet the 10,000th word 5 times  one needs to read/listen to approximately 3.2 million running words. This is likely due to the fact that Nation uses "word families"as a reference point, which inherently increases the number of repetitions (and which is also mentioned in his paper). While Nation's methodology allows for a higher number of repetition counts, it presumes a knowledge of morphology and excludes cases of polysemy.

Different types of language material may have very different word frequencies. Per 1 million words, the word "Dear" occurs 1284 times in the Dickens corpus, 54 times in the Brown corpus (a compendium of a variety of sources), and 223 times in Subtlexus (Movies and TV shows). "Me" occurs 9,242 times in Subtlexus vs 1,183 in Brown. See other comparisons here and here. Based on the foregoing, it may be concluded that a listening and reading strategy involving substantial blocks of different types of language content may lead to a wider vocabulary and long-term vocabulary retention.

*Over the course of the past 20 years the suggested number of exposures in order to retain a word increased from 5-6 initially, to the current recommendation of 15-20 exposures. Some recent research suggests that incidental acquisition of vocabulary can happen "extremely fast even with complete beginners in a FL" with "as little as two exposures to new words."  Moreover the researchers found that "the impact of exposure was not constant across number of exposures, but rather decreased following the initial encounters." See the first link.

How much input do you need to learn 10,000 words? by Paul Nation
The Role of Repeated Exposure to Multimodal Input in Incidental Acquisition of Foreign Language Vocabulary by Marie-Josée Bisson, Walter J. B. van Heuven, Kathy Conklin and Richard J. Tunney
Words are Learned Incrementally over Multiple Exposures by Steven A Stahl
Why should we build up a Start-up vocabulary quickly? (Rob Waring)
The inescapable case for extensive reading (Rob Waring)
Lexical Threshold revisited: Lexical text coverage, learners' vocabulary size and reading comprehension by Batia Laufer and Geke C Ravenhorst-Kalovski
Vocabulary Size, Text Coverage and Word Lists by Paul Nation and Robert Waring
At what rate do learners learn and retain new vocabulary from reading a graded reader? by Rob Waring and Misako Takaki
Vocabulary Demands of Television Programs by Stuart Webb and Michael P. H. Rodgers
Effect of Frequency and Idiomacity on Second Language Reading Comprehension by Ron Martinez

Sunday, January 10, 2016

The average book length

An average English language literary novel would have around 80-100,000 words (depending on genre) and 5,000-7,000 sentences. An average trade 5.5x8.5 inch book may have approximately 300 words per page and a 6x9 inch size book may have 350 words per page. The smaller, cheaper mass market paperbacks (usually 4.125 x 6.75 inches) pack on average 250 words per printed page but longer books may pack more than 350 words per printed page.

The median length for all Amazon books is about 64,000 words.  Here are some stats about book lengths taken from Amazon's text stats taken a couple of years ago.

Pride and Prejudice
Words: 120,528 (29% have more)
Sentences: 5,878 (35% have more)
Words per Sentence: 20.5 (42% have more)

Effi Briest (German)
Words: 98,455
Sentences: 5,501
Words per Sentence: 17.9

Anna Karenina (translated)
Words: 334,108 1% have more
Sentences: 19,345 2% have more
Words per Sentence: 17.3

James Clavell's Shogun
Words: 428,978
Sentences: 43,647
Words per Sentence: 9.8

Animal Farm
29,966 words (75% of books have more words)

Ethan Frome
30,191 words (75% of books have more words)

The Crying of Lot 49
46,573 words (64% of books have more words)

47,192 words (64% of books have more words)

Lord of the Flies
62,481 words (51% of books have more words)

Brave New World
64,531 words (50% of books have more words)

The Adventures of Tom Sawyer
70,570 words (45% of books have more words)

Portnoy’s Complaint
78,535 words (41% of books have more words)

112,473 words (21% of books have more words)

Madame Bovary
117,963 words (18% of books have more words)

Mansfield Park
159, 344 words (9% of books have more words)

209,117 words (4% of books have more words)

262,869 words (2% of books have more words)

310,593 words (2% of books have more words)

War and Peace
544,406 words (0% of books have more words)

The European Reading Challenge

The European Reading Challenge
January 1, 2016 to January 31, 2017

Hosted by Rose City Reader.

I intend to read five books from five different European authors in their native languages. For good measure I also entered into the Classics Challenge 2016.

Thursday, January 7, 2016

Senator Says Maryland's Italian State Motto Is Sexist

Senator Bryan Simonaire  (R-Anne Arundel) has filed a bill that would update the nearly 400-year-old phrase “Fatti maschii, parole femine” which literally translates into “manly deeds, womanly words” to something more gender-neutral. Senator Simonaire proposes to change the official state motto to English“Strong deeds, gentle words,” which is also Maryland's current officially cited translation of the motto.

Maryland is the only US state with a motto in Italian. In 1993 a move to change the English translation of Maryland's Italian motto from "Manly Deeds, Womanly Words" to "Strong Deeds, Gentle Words" passed a House committee but never made it to the House floor. In 2001, the official translation was changed to the aforementioned gentler, politically correct version.

The Washington Post chimes in:

How a ‘sexist’ quote from 16th-century pope became Maryland’s state motto

The Maryland motto is sexist in any language (Opinion)

See also No longer manly, state seal uses gentle words (The Baltimore Sun, January 12, 2001).