While there are countless words in the English language, knowledge of between 3000 and 5000 most frequent word families will yield lexical coverage of 95-98%, depending on language content. This vocabulary size provides a good basis for comprehension and language use. This core vocabulary also provides a good basis for further comfortable vocabulary acquisition through extensive reading and listening which is the core mantra of the comprehensible input crowd (i.e. Krashen et al).
How do we go about developing a core vocabulary of some 3000 word families? Pick up any decent comprehensive course like the Assimil series and it will likely include the bulk of the required vocabulary. People occasionally "pick up" languages. If you're lucky, your previous linguistic knowledge will provide plenty of cognates. Speakers of multiple languages may possess a passive knowledge of cognates rivaling that of native speakers with little or no involvement in their new target language. Not everyone is equally successful at recognizing and exploiting cognates, but we'll leave that for another time.
Successful language learners will learn the bulk of their vocabulary through repeated encounters in different contexts.Words and their shades of meaning are learned gradually. According to some published research, in order to have a high probability of learning a new word from context you need to encounter it between five and twenty times*.
- To meet all the 3000 most frequent words in English 1, 5, 10 and 20 times, you’d need to read or otherwise meet 47,300, 236,700, 473,000, and 947,000 words, respectively.
- To meet all the 5000 most frequent words in English 1, 5, 10 and 20 times, you’d need to read 132,100, 661,000, 1,321,000, and 2,642,000 words.
- To meet all the 10,000 most frequent words in English 1, 5 and 10 times, you’d need to read 632,000, 3,164,000 and 6,328,947 words, respectively.
Different types of language material may have very different word frequencies. Per 1 million words, the word "Dear" occurs 1284 times in the Dickens corpus, 54 times in the Brown corpus (a compendium of a variety of sources), and 223 times in Subtlexus (Movies and TV shows). "Me" occurs 9,242 times in Subtlexus vs 1,183 in Brown. See other comparisons here and here. Based on the foregoing, it may be concluded that a listening and reading strategy involving substantial blocks of different types of language content may lead to a wider vocabulary and long-term vocabulary retention.
*Over the course of the past 20 years the suggested number of exposures in order to retain a word increased from 5-6 initially, to the current recommendation of 15-20 exposures. Some recent research suggests that incidental acquisition of vocabulary can happen "extremely fast even with complete beginners in a FL" with "as little as two exposures to new words." Moreover the researchers found that "the impact of exposure was not constant across number of exposures, but rather decreased following the initial encounters." See the first link.
The Role of Repeated Exposure to Multimodal Input in Incidental Acquisition of Foreign Language Vocabulary by Marie-Josée Bisson, Walter J. B. van Heuven, Kathy Conklin and Richard J. Tunney
Words are Learned Incrementally over Multiple Exposures by Steven A Stahl
Why should we build up a Start-up vocabulary quickly? (Rob Waring)
The inescapable case for extensive reading (Rob Waring)
Lexical Threshold revisited: Lexical text coverage, learners' vocabulary size and reading comprehension by Batia Laufer and Geke C Ravenhorst-Kalovski
Vocabulary Size, Text Coverage and Word Lists by Paul Nation and Robert Waring
At what rate do learners learn and retain new vocabulary from reading a graded reader? by Rob Waring and Misako Takaki
Vocabulary Demands of Television Programs by Stuart Webb and Michael P. H. Rodgers
Effect of Frequency and Idiomacity on Second Language Reading Comprehension by Ron Martinez