How to learn ten thousand words

While there are countless words in the English language, knowledge of between 3000 and 5000 most frequent word families will yield lexical coverage of 95-98%, depending on language content. This vocabulary size provides a good basis for comprehension and language use. This core vocabulary also provides a good basis for further comfortable vocabulary acquisition through extensive reading and listening which is the core mantra of the comprehensible input crowd (i.e. Krashen et al).

How do we go about developing a core vocabulary of some 3000 word families? Pick up any decent comprehensive course like the Assimil series and it will likely include the bulk of the required vocabulary. People occasionally "pick up" languages. If you're lucky, your previous linguistic knowledge will provide plenty of cognates. Speakers of multiple languages may possess a passive knowledge of cognates rivaling that of native speakers with little or no involvement in their new target language. Not everyone is equally successful at recognizing and exploiting cognates, but we'll leave that for another time.

Successful language learners will learn the bulk of their vocabulary through repeated encounters in different contexts.Words and their shades of meaning are learned gradually. According to some published research, in order to have a high probability of learning a new word from context you need to encounter it between five and twenty times*.

In his paper aptly named How much input do you need to learn 10,000 words? Nation suggested that a learner needs to meet around 3,000,000 words in order to learn the most frequent 9000 word families in English.

According to the statistical analysis table developed by Dr. Rob Waring:
  • To meet all the 3000 most frequent words in English 1, 5, 10 and 20 times, you’d need to read or otherwise meet 47,300, 236,700, 473,000, and 947,000 words, respectively.
  • To meet all the 5000 most frequent words in English 1, 5, 10 and 20 times, you’d need to read 132,100, 661,000, 1,321,000, and 2,642,000 words.
  • To meet all the 10,000 most frequent words in English 1, 5 and 10 times, you’d need to read 632,000, 3,164,000 and 6,328,947 words, respectively.
 In order to meet the most frequent 10,000 words 20 times, you will need to read or otherwise meet the equivalent of 12,657,895 running words or the equivalent of approximately 100 books the length of Pride and Prejudice. At 140 words per minute, you would hear 12.7 million words after 1507 hours of listening to audiobooks.

I see a bit of a discrepancy here. According to Nation 3.0 million words are sufficient for 12 repetitions at the "9th 1000 word level". According to Waring, in order to meet the 10,000th word 5 times  one needs to read/listen to approximately 3.2 million running words. This is likely due to the fact that Nation uses "word families"as a reference point, which inherently increases the number of repetitions (and which is also mentioned in his paper). While Nation's methodology allows for a higher number of repetition counts, it presumes a knowledge of morphology and excludes cases of polysemy.

Different types of language material may have very different word frequencies. Per 1 million words, the word "Dear" occurs 1284 times in the Dickens corpus, 54 times in the Brown corpus (a compendium of a variety of sources), and 223 times in Subtlexus (Movies and TV shows). "Me" occurs 9,242 times in Subtlexus vs 1,183 in Brown. See other comparisons here and here. Based on the foregoing, it may be concluded that a listening and reading strategy involving substantial blocks of different types of language content may lead to a wider vocabulary and long-term vocabulary retention.

*Over the course of the past 20 years the suggested number of exposures in order to retain a word increased from 5-6 initially, to the current recommendation of 15-20 exposures. Some recent research suggests that incidental acquisition of vocabulary can happen "extremely fast even with complete beginners in a FL" with "as little as two exposures to new words."  Moreover the researchers found that "the impact of exposure was not constant across number of exposures, but rather decreased following the initial encounters." See the first link.

