Monday, April 27, 2009

word frequency and incidental learning

The most frequent 4000 word families from the BNC provide 95% coverage of new texts which translates into “adequate comprehension” (1 in 20 words per 2 lines unknown) with “some learners” (Hu and Nation). Most, however, do not have adequate comprehension even with 95% coverage. For most learners, 98 % coverage was necessary to achieve adequate comprehension of fiction. For reading to be considered a pleasurable activity some researchers (Hirsh and Nation, 1992) suggest that 98-99% coverage may be necessary (one unknown word in every 50-100 running words). 7000 words are needed for 98% coverage (Nation, 2006).

Word coverage
80% 1 unknown word in 5
90% 1 in 10 words per line
95% 1 in 20 words per 2 lines
98% coverage (eight unknown words per 400 word page)

A collection of excerpts regarding vocabulary acquisition:

“The results showed that knowledge of the most frequent 3,000 word families plus proper nouns and marginal words provided 95.76% coverage, and knowledge of the most frequent 6,000 word families plus proper nouns and marginal words provided 98.15% coverage of movies. Both American and British movies reached 95% coverage at the 3,000 word level. However, American movies reached 98% coverage at the 6,000 word level while British movies reached 98% coverage at the 7,000 word level. The vocabulary size necessary to reach 95% coverage of the different genres ranged from 3,000 to 4,000 word families plus proper nouns and marginal words, and 5,000 to 10,000 word families plus proper nouns and marginal words to reach 98% coverage.”

The Lexical Coverage of Movies
Stuart Webb and Michael P. H. Rodgers

“A corpus of one million words would probably have over 60,000 instances of the word the but is unlikely to include any of the following: gastronomic, plagiarism, incoherent, reassuring, preach all of which have a frequency rating of well under one-hit-per-million-words, yet could hardly be described as obscure.”

HLT Magazine
"My one's bigger than your one"

“The source text consisted of three months (approximately 5 million words) of Le Monde

#sentences 167,359
#words (total) 4,244,810

Less than 20% of the distinct words account for over 95% of all word occurrences. In fact, 40% (about 35,000 words) occurred only once in the text, and 60% of the words appeared at most 3 times. This effect is even more pronounced for syllables, where the roughly 20% most common syllables account for 98% of all syllable occurrences.”

“Given this enormous amount of material, you might expect to find a lot of frequent idioms. If so, you would be disappointed. Simpson and Mendis found only 8 idioms that occurred more than 10 times (ranging from 10-17 times) in their corpus of nearly 2 million words/197 hours. Another 107 occur 1.2-2.4 times per million words. Liu, with an even larger corpora (roughly 6 million words) and a more generous definition, found only 47 items with a frequency of 50 or more tokens per million words. Another 107 had a frequency of 11-49 per million words and the other 148 had a frequency of 2-19 per million words. That’s a total of only 302 idioms, which strikes me as not only a relatively limited number, but also a very teachable number. The lack of many common idioms, makes the task of teaching idioms both easier and harder. It is easier because we can focus our teaching on those idioms that are fairly frequent”.

The effect of frequency of occurrence on incidental word learning.

“It should also be pointed out that the volume of text that would need to be read to meet an unknown word increases with reading ability level. This is because rarer words are met less frequently and thus more text has to be read to meet an unknown word the required number of times. This also has implications for the amount of text that needs to be read.”

Beyond A Clockwork Orange: Acquiring Second Language Vocabulary through Reading

“The frequency of words in the language as a whole was also investigated; Brown (1993) found overall frequency to be a better predictor of incidental vocabulary growth than frequency in the specific texts her subjects read. The third explanatory variable was learner vocabulary size. It was assumed that knowing more words would assure better global comprehension of the text and, as a result, more incidental word acquisition. Laufer (1989, 1992) found evidence of a strong relationship between measures of learner vocabulary size and text comprehension.”

The Mayor of Casterbridge listening/reading experiment:

"Unfortunately, the experimental support for incidental vocabulary acquisition through reading in a second language is weak and plagued by methodological flaws..."

"The first study claiming to show that second language vocabulary learning occurs incidentally through reading is a well known experiment by Saragi, Nation and Meister (1978). They tested native speakers of English who had read Anthony Burgess's A Clockwork Orange on their understanding of many of the Russian-based slang words that occur in the novel. They found that the subjects were able to correctly identify the meanings of most these nadsat words on a surprise multiple-choice test , especially the frequently occurring ones. But it seems strange to equate the circumstances of this study with second language learning. Here, native speakers of English used contexts which they must have fully understood to infer, for example, that droog meant friend; but making such connections is probably much harder for readers in a foreign language for whom many words in the context may be unknown or only partially known.

The mean number of words subjects acquired in the experiment was 68.4, amounting to about three quarters of the 90 words tested. But replications of this study with second language learners have not managed to reproduce these impressive results (see Table 1 below)... Dupuy and Krashen (1993) report a larger gain of almost seven words, but this higher than usual result may have little to do with reading since their experiment also involved viewing a video..."

"The (Mayor of Casterbridge) novel is one of a series of simplified classics published by Nelson for learners of English who know approximately 2000 basewords.” …”21,232 words of the simplified Mayor of Casterbridge text: subjects followed along in their books while the entire text was read aloud in class by the teacher... The remaining 34 (students) appeared to be absorbed by the story of secret love, dissolution and remorse, and tears were shed for the mayor when he met his lonely death at the end...The knowledge gain of five of the 23 means that about 22 per cent of the words that could have been learned were learned; in other words, there was an average pick-up rate of about one new word in every five..."

"Laufer (1982, 1989) claims that readers need a sight recognition of at least 95 percent of the words in a text for it to be comprehensible enough for meanings of unknown words to be inferred.”

"As far as implications for vocabulary learning are concerned, the experiment makes a stronger case for incidental acquisition than was made in the earlier Clockwork Orange replication studies. Subjects who read a full-length book recognized the meanings of new words at a higher rate than in previous studies with shorter texts, and built associations between new words as well... ‘Cobb (1997) found that encountering new words in multiple contexts resulted in a deeper, more transferrable knowledge of words than the usual strategy of studying short definitions.

"...But even though it may be possible to develop better resources for incidental learning, the study suggests that extensive reading is not a very effective way for learners who have a mean vocabulary size of around 3000 words to expand their lexicons...In brief, the experiment indicates that teachers of low intermediate learners of English can expect vocabulary growth from reading a simplified novel to be small and far from universal… In the last two decades, it has often been assumed that incidental acquisition was a sufficient strategy to take care of learner's lexical needs, to the point that explicit vocabulary instruction effectively disappeared from many coursebooks and vocabulary acquisition became "a neglected aspect of language learning" (Meara 1980:221). The present study suggests that the the power of incidental acquisition may have been overestimated.

...Nagy, Herman and Anderson (1985) propose that for children learning English as their first language, school reading can account for the acquisition of thousands of new words each year. Even though the incidental pick-up rate was found to be low, large gains occur, they argue, because children encounter millions of words annually. But this is hardly applicable to beginning second language learners; for the subjects of this study, encountering one million words would entail reading fifty graded readers the size of The Mayor of Casterbridge - a worthy but unattainable goal for most learners at this level.”

“The results of this study point to several things. Firstly, the data support the notion that words can be learned incidentally from context. However, these data suggest that few new words appear to be learned from this type of reading, and half of those that are learned are soon lost....Assuming an optimistic scenario in which reading fifty novels per year was possible ...even if yearly gains increased marginally with increased vocabulary size, it would take many years to acquire incidentally the 5,000 words most frequent word families of English, the figure which has been proposed as the minimum knowledge base needed for learners of English to be able to infer the meanings of new words they encounter in normal, unsimplified texts (Hirsh & Nation 1992, Laufer 1989)...

That is not to say that low intermediate learners should never read, but that teaching decisions should be based on an adequate account of what they can gain from their reading. Through reading extensively, they will probably enrich their knowledge of the words they already know, increase lexical access speeds, build network linkages between words, and more, but as this study has shown, only a few new words will be acquired. Therefore, it seems clear that in the early stages of their second language acquisition, learners should direct a considerable portion of their energies to using intentional strategies to learn high frequency vocabulary, in preparation for the day when they will know enough words and can read in enough volume for more substantial incidental benefits to accrue.”

Incidental vocabulary acquisition from reading, reading-while-listening, and listening to stories

"The results showed that new words could be learned incidentally in all 3 modes, but that most words were not learned. Items occurring more frequently in the text were more likely to be learned and were more resistant to decay. The data demonstrated that, on average, when subjects were tested by unprompted recall, the meaning of only 1 of the 28 items met in either of the reading modes and the meaning of none of the items met in the listening-only mode, would be retained after 3 months...

...The subjects, it seems, displayed a critical lack of familiarity with spoken English. As they listened to the story, they had to pay constant attention to a stream of speech whose speed they could not control. Because they were incapable of processing the phonological information as fast as the stream of speech, they may have failed to recognize many of the spoken forms of words that they already knew in their written forms."

Current Research and Practice in Teaching Vocabulary

Alan Hunt and David Beglar

“In the long run, most words in both first and second languages are probably learned incidentally, through extensive reading and listening (Nagy, Herman, & Anderson, 1985). Several recent studies have confirmed that incidental L2 vocabulary learning through reading does occur (Chun & Plass 1996; Day, Omura, & Hiramatsu, 1991; Hulstijn, Hollander & Greidanus, 1996; Knight, 1994; Zimmerman, 1997). Although most research concentrates on reading, extensive listening can also increase vocabulary learning (Elley, 1989). Nagy, Herman, & Anderson (1985) concluded that (for native speakers of English) learning vocabulary from context is a gradual process, estimating that, given a single exposure to an unfamiliar word, there was about a 10% chance of learning its meaning from context. Likewise, L2 learners can be expected to require many exposures to a word in context before understanding its meaning...The incidental learning of vocabulary may eventually account for a majority of advanced learners' vocabulary; however, intentional learning through instruction also significantly contributes to vocabulary development (Nation, 1990; Paribakht & Wesche, 1996; Zimmerman, 1997). Explicit instruction is particularly essential for beginning students whose lack of vocabulary limits their reading ability. Coady (1997b) calls this the beginner's paradox. He wonders how beginners can "learn enough words to learn vocabulary through extensive reading when they do not know enough words to read well" (p. 229). His solution is to have students supplement their extensive reading with study of the 3,000 most frequent words until the words' form and meaning become automatically recognized (i.e., "sight vocabulary"). The first stage in teaching these 3,000 words commonly begins with word-pairs in which an L2 word is matched with an L1 translation... Translation has a necessary and useful role for L2 learning, but it can hinder learners' progress if it is used to the exclusion of L2-based techniques. Prince (1996) found that both "advanced" and "weaker" learners could recall more newly learned words using L1 translations than using L2 context. However, "weaker" learners were less able to transfer knowledge learned from translation into an L2 context. Prince claims that weaker learners require more time when using an L2 context as they have less developed L2 networks and are slower to use syntactic information... “Understanding of a word acquired from meeting it in context in extensive reading is ‘fragile knowledge’, and may not be internalized longterm if there are no further encounters with it; but it is still useful...Vocabulary lists can be an effective way to quickly learn word-pair translations (Nation, 1990). However, it is more effective to use vocabulary cards, because learners can control the order in which they study the words (Atkinson, 1972). Also, additional information can easily be added to the cards. When teaching unfamiliar vocabulary, teachers need to consider the following:

1. Learners need to do more than just see the form (Channell, 1988). They need to hear the pronunciation and practice saying the word aloud as well (Ellis & Beaton, 1993; Fay and Cutler, 1977; Siebert, 1927). The syllable structure and stress pattern of the word are important because they are two ways in which words are stored in memory (Fay and Cutler, 1977).
2. Start by learning semantically unrelated words. Also avoid learning words with similar forms (Nation, 1990) and closely related meanings (Higa, 1963; Tinkham, 1993) at the same time... Likewise, words with similar, opposite, or closely associated (e.g., types of fruit, family members) meanings may interfere with one another if they are studied at the same time.
3. It is more effective to study words regularly over several short sessions than to study them for one or two longer sessions. As most forgetting occurs immediately after initial exposure to the word (Pimsleur, 1967), repetition and review should take place almost immediately after studying a word for the first time.
4. Study 5-7 words at a time, dividing larger numbers of words into smaller groups.
5. Use activities like the keyword technique to promote deeper mental processing and better retention (Craik and Lockhart, 1972). Associating a visual image with a word helps learners remember the word. “

“Provide opportunities for developing fluency with known vocabulary.
Fluency building activities recycle already known words in familiar grammatical and organizational patterns so that students can focus on recognizing or using words without hesitation. “

At what rate do learners learn and retain new vocabulary from reading a graded reader?
Rob Waring

"The results show that words can be learned incidentally but that most of the words were not learned. More frequent words were more likely to be learned and were more resistant to decay. The data suggest that, on average, the meaning of only one of the 25 items will be remembered after three months, and the meaning of none of the items that were met fewer than eight times will be remembered three months later. The data thus suggest that very little new vocabulary is retained from reading one graded reader, and that a massive amount of graded reading is needed to build new vocabulary...

...This suggests that it is far more difficult to pick up words from listening-only than from either the reading-only or reading-while-listening
modes. There was, however, no significant difference between reading-only and reading-while listening modes.

…This suggests that meanings are lost faster than other the types of word knowledge tested here.”

Number of meetings needed to learn a word

“As we saw in the introduction, previous estimates of the number of times it takes to learn a word from reading varied considerably. It is clear from this research that it is very difficult to pin a number on this age-old question. It seems much more complex than a simple single figure. From the results of this experiment, it seems that to have a 50% chance of recognizing a word form again three months later, learners have to meet the word at least eight times. Similar results could be said for prompted recognition. However, for unprompted form-meaning recognition (i.e., word learning) there is only a 10% to 15% chance that the word's meaning will be remembered after three months even if it was met more than 18 times. If the word was met fewer than 5 times, the chance is next to zero. This is rather disappointing because it suggests that we do not learn a lot of new words from our reading even with a 96% coverage rate. There are several reasons why this might be so. Firstly, the learners are presumably focused on comprehending and enjoying the story rather than on the words themselves. The words were not made explicit by bolding or highlighting the words in any way, as is the case in natural reading. Because of this, the learners are not being forced to notice them and their awareness of the words is not being raised. Some recent research has suggested the noticing of a form is an essential step in word learning (Schmidt, 1990)... Thirdly, the reason for low vocabulary rate retention may have simply been that there were too few chances to learn the words. As we have seen, it takes much more than one meeting of a word to learn it from reading. Moreover, even words met more than fifteen times in the text still have only a 40% change of being learned. This seems to suggest that it would take well over 20 or even 30 meetings for most of those words to be learned.”

“A number of studies have shown that second language learners acquire vocabulary through reading, but only relatively small amounts. However, most of these studies used only short texts, measured only the acquisition of meaning, and did not credit partial learning of words.”

“The results showed that knowledge of 65% of the target words was enhanced in some way, for a pickup rate of about 1 of every 1.5 words tested. Spelling was strongly enhanced, even from a small number of exposures. Meaning and grammatical knowledge were also enhanced, but not to the same extent. Overall, the study indicates that more vocabulary acquisition is possible from extensive reading than previous studies have suggested.”

“There is no frequency point where meaning acquisition is assured, but by about 10+ exposures, there does seem to be a discernable rise in the learning rate. However, even after 20+ exposures, the meaning of some words eluded G, echoing Grabe and Stoller's (1997) point that some words simply seem hard to learn.”

“As a whole, the results are consistent with those of Schmitt (1998), who found that it is possible for L2 learners to have other kinds of word knowledge without having acquired knowledge of the word's meaning.”

“...the role of frequency of occurrence in the texts in the enhancement of the three types of word knowledge... As mentioned before, it seems that spelling knowledge can be gained with even a few exposures. Meaning does not seem to be as affected by frequency as much as one might expect, with 2-19 text occurrences yielding uptake rates ranging between 16-36% when we take the nouns and verbs together. Only at the extremes of frequency do we see a noticeable effect. Single encounters produced hardly any learning of meaning at all (3.4%), while it took 20+ occurrences to lead to a noticeable increase in uptake rates (60%). Only in the case of grammar (when articles and prepositions are considered together) was there a relatively steady increase of learning along the frequency scale. Overall, only when words were seen twenty or more times was there a good chance of all three word knowledge facets being enhanced.”

“Chun and Plass' (1996) study of American university students learning German found that unfamiliar words were most efficiently learned when both pictures and text were available for students. This was more effective than text alone or combining text and video, possibly because learners can control the length of time spent viewing the pictures.”

Beyond raw frequency: Incidental vocabulary acquisition in extensive reading:

“However, words of lower frequency were better learned than words of higher frequency when the meanings of the lower frequency words were crucial for meaning comprehension.”

"...a richer sense of a word is learned through contextualized input. Furthermore, the incidental acquirer not only acquires word meanings but also increases his or her chances to get a feel for collocations and colligations that are not easily learned by learners of English as a foreign language (Bahns & Eldaw, 1993); therefore, learning can be facilitated by repeated exposure to words that go together (cf. Lewis, 1993; Nattinger & DeCarrico, 1992, for the importance of learning lexical phrases)...

“It does not seem feasible to define a number of exposures that is sufficient for successful acquisition, such as at least 10 exposures (Saragi et al., 1978) or 5–16 exposures (Nation, 1990). As Henriksen (1999, p. 314) pointed out, word acquisition seems to be able to range “over continua of lexical knowledge” from partial recognition knowledge to productive use ability, depending on how many and what kinds of exposures are needed for successful acquisition. The observation that some words that do not appear frequently, but are nevertheless acquired and retained, apparently because they are salient and significant to a story, is highly interesting. We suggest that the rate of incidental vocabulary learning is not simply related to the raw frequency of specific words in the language. We further propose that learning is a consequence of noticing and the conscious learning of words that are important in the narrative. (Schmidt, 2001).“


“Another reason, as Larson-Freeman (2002) and Ellis (2002b) point out, is that if second language learning were simply a matter of acquiring the most frequently occurring patters of target language (TL), then English language learners (ELLs) would be proficient in their uses of the definite and indefinite articles, the most frequently occurring free morphemes in English. This, of course, is not the case. It is clear that the frequency of input is not the only factor involved in learning a second language; however, we believe it plays a significant role. Ultimately, we hope to show that high-frequency constructions provide more exemplars for L2 learners to make generalizations than low-frequency constructions and that this directly relates to the number and kind of L2 learner errors.”

“ELLs will tend to produce more errors with low frequency constructions…”



Keith said...

incidental learning? Now post something on accidental learning. :)

T-Pain said...

Thank you very much for this compilation... it seems that few have really closely looked at frequency and the research on it... this post is virtually a treasure trove.

cathy said...

i'm nowhere near 98% comprehension for French, but I know I am happy if I at least understand something every now and then! but the day I do hit 98%, i'll be absolutely ecstatic and will devour books like crazy!

languagefixation said...

To me, this all points strongly at adopting a method like Khatsumoto's AJATT strategies. Reading is great, but if you find something really interesting then stick it in your SRS and you're guaranteed that you'll never forget it. This is where you can get an optimal amount of exposure to that sentence, over and over again.

This is also a way that we can supplement our TV watching. Yes, it's easier to get words from the context while watching TV, but you also need some reminders or you'll forget it in a few weeks if it's a less common word. Stick interesting stuff in your SRS :)