Tuesday, April 20, 2010

Migraine Gives Woman Foreign Accent

Migraine Gives Woman Foreign Accent
Tuesday April 20, 2010

Tim Hewage, Sky News Online

A woman from Devon has begun speaking with a Chinese accent after suffering severe migraines.

Thirty-five-year-old Sarah Colwill puts the startling change down to an extremely rare medical condition known as Foreign Accent Syndrome (FAS).

"I knew I sounded different but I didn't know how much and people said I sounded a bit Chinese.

"Then I had another attack and when the ambulance crew arrived they said I definitely sounded Chinese."

The rare disorder is thought to be caused by strokes and brain injuries and causes sufferers to lose the ability to talk in their native accent.

There have been an estimated 60 recorded cases of FAS since it was first identified in the 1940s.

"It is not the kind of problem that there are any easy generalisations about."

Sufferers can develop an accent without ever having been exposed to it as it is the change in speech patterns from a brain injury which causes the lengthening of syllables, change in pitch or mispronunciation of sounds...

Experts believe FAS is triggered following a stroke or head injury, when tiny areas of the left side of the brain linked with language, pitch and speech patterns are damaged.

The result is often a drawing out or clipping of the vowels that mimic the accent of a particular country, even though the sufferer may have had limited exposure to that accent.

One of the first reported cases was in 1941 when a Norwegian woman developed a German accent after being hit by bomb shrapnel during an air raid.

As a result, she was shunned by her community, who falsely believed she was a German spy.

In 2006 Linda Walker, 60, woke from a stroke to find that her Geordie accent had been transformed into a Jamaican one.

Saturday, April 17, 2010

Multitasking brain

Multitasking Brain Divides And Conquers, To A Point

April 15, 2010 | Jon Hamilton | National Public Radio

Our brains are set up to do two things at once, but not three, a French team reports in the journal Science.

The researchers reached that conclusion after studying an area of the brain involved in goals and rewards. Their experiment tested people's abilities to accomplish up to three mental tasks at the same time. The tasks involved matching letters in different ways, and for incentive, participants were paid up to a euro for doing a task perfectly.

When volunteers were doing just one task, there was activity in goal-oriented areas of both frontal lobes, says Etienne Koechlin, a professor at the Ecole Normale Superieure in Paris. That suggested that the two sides of the brain were working together to get the job done, he says.

But when people took on a second task, the lobes divided their responsibilities. "Each frontal lobe was pursuing its own goal," Koechlin says.

The lobe on the left side of the brain focused on the first task, while the lobe on the right focused on the second. When the researchers offered a greater reward for a task being supervised by one side of the brain, the amount of activity on that side increased accordingly.

Brain Maxes Out At Multitasking

But the brain has only two frontal lobes, suggesting there might be a limit to the number of goals and rewards it can handle. So the team decided to do another experiment.

They offered people rewards to do three things at once.

And when people started a third task, one of the original goals disappeared from their brains, Koechlin says. Also people slowed down and made many more mistakes. That suggests that our frontal lobes "can't maintain more than two tasks," Koechlin says.

The evidence that the brain assigns one task to each side of the brain is "very surprising," says Rene Marois, a neuroscientist at Vanderbilt University.

He says the findings, if they hold up, have implications for people trying to do more than two things at a time.

For example, Marois says, someone who is writing a report might be able to take on a second task, like checking e-mail, without losing their train of thought. But if that e-mail asked for a decision about something, that would amount to a third task, and the brain would be overwhelmed, he says.

High Stakes Could Improve Performance

David Meyer, who studies multitasking at the University of Michigan, says he doesn't think the study shows it's impossible to keep three tasks in mind.

"In the real world, there are life and death matters which hinge on exactly what happens with multitasking, which certainly wasn't the case with this study," Meyer says.

The frontal lobes of the brain might respond differently if the reward was survival, instead of the equivalent of a couple of dollars, he says. Meyer is also puzzled by what he sees as a disconnect between what was happening in people's brains and what they actually did.

For example, offering people more money increased their brain activity quite a bit. But the extra brain activity didn't make people much faster or more accurate at multitasking, Meyer says.

"The effects of these motivational manipulations on the behavior were extremely small for the most part," he says. "At most only a few percentage points."

That could mean studies need to use more powerful rewards, Meyer says. Or, he says, it could mean that no reward or internal goal can make us very good at doing even two tasks at once.

Wednesday, April 7, 2010

Neural predictors of auditory word learning

Neural predictors of auditory word learning


The present fMRI study aimed to identify neurofunctional predictors of auditory word learning. Twenty-four native Chinese speakers were trained to learn a logographic artificial language (LAL) for 2 weeks and their behavioral performance was recorded. Participants were also scanned before and after the training while performing a passive listening task. Results showed that, compared to 'poor' learners (those whose performance was below average during the training), 'good' (i.e. above-average) learners showed more activation in the left MTG/STS and less activation in the right IFG during the pretraining scan. These results confirmed the hypothesis that preexisting individual differences in neural activities can predict the efficiency in learning words in a new language.

Tuesday, April 6, 2010

The inescapable case for extensive reading

The inescapable case for extensive reading

Rob Waring, Notre Dame Seishin University, Okayama, Japan



In his article, Dr. Rob Waring discusses the necessity for Extensive Reading and Extensive Listening in all language programs. The article reviews recent vocabulary research and shows that learners need to meet massive amounts of language to learn not only single words but also their collocations, register and so forth. The article demonstrates that neither intentional learning nor course books (especially linear-based ones) can cover the vast volume of text the learners need to meet without Extensive Reading. He shows that learners need to gain their own sense of language and this cannot be gained from only learning discrete language points, rather it must, and can only, come from massive exposure in tandem with course books.

This paper puts forward the idea that graded reading, or extensive reading, is a completely indispensable part of any language program, if not all language programs.

1. The amount of language to be learnt
Let us first look at the vocabulary. We know from vocabulary research that English is made up of a very few extremely common words which comprise the bulk of the language. In written text, we know that about 2000 word families cover about 85-90% of the running words in general texts and that 50% of any text will be function words (Nation 2001). We also know that to read a native novel, a newspaper or a magazine with 98% vocabulary coverage, a learner would need to know about 8000 word families. But how should these words be learnt? And what do we mean by “learning”?

One of the few things language researchers can agree about is that learners can learn words from reading provided the reading is comprehensible. They may though, disagree over the uptake rates and types of texts to be used. Determining uptake rates is a vital component in the overall picture of vocabulary learning because these rates affect how much text learners need to meet, and over what time period the learning should take place.

One of the main factors affecting learnability includes the ratio of unknown to known words in a text. The more dense a text is (more unknown words it has), the less likely incidental learning can occur. Liu Na and Nation (1985) and Hu and Nation (1999) suggest the optimal known word coverage rate be about 95-99% of known words for there to be a good chance that learning can take place.

Laufer (1989) and Nation (2001), and many others have shown that unless we have about 98-99% coverage of the vocabulary of the other words in the text, the chance that an unknown word will be learnt is minimal. This means that at minimum there should be one new word in 40, or 1 in 50 for the right conditions for incidental vocabulary learning. The figures for learning from listening appear to be even higher due to the transitory nature of listening (Brown – Waring – Donkaewbua 2008).

Uptake rates also depend on the opportunities for learning that is, the number of times an unknown word appears in a given text and how closely spaced the unknown words are, so that knowledge can be retained in memory before it is lost. It is pertinent to look at the opportunity that learners have for learning from natural text because this can tell us how how words are spaced in the language. Moreover, this data combined with the uptake rates stated above, can help us determine whether incidental learning of vocabulary from reading is efficient enough to be a major vocabulary learning strategy.

Table 1 shows the frequency at which words occur in a 50 million word sub-corpus (both written and spoken) of the British National Corpus (BNC) of English.

The most frequent word in English (the) covers 5.839% of any general English text (i.e. it occurs once in every 17 words) (see (1) in the table). The 2000th most frequent word in English covers 0.00432% of any general English text and occurs once every 23,103 words (2). Note that when the learner meets the 2000th most frequent word in English, this means that all the previous 1999 words have also been met at least once.

If we set the uptake threshold whereby a word become “learnt” at 10 recurrences, 85,329 words need to be read to “learn” all the 1000 most frequent words in English (3).To “learn” all the 500 most frequent words in English at an uptake threshold of 20 times, 80,732 words need to be read (4) and 2.6 million words need to be met to meet the most frequent 5000 words at 20 recurrences (5).

Many researchers argue that learners can build a huge vocabulary simply from reading. However, even at the 10 meeting recurrence rate for learning to occur, Table 1 clearly shows that a huge amount of text needs to be met to facilitate the learning of vocabulary incidentally from reading. It also shows that as one’s vocabulary level increases, there is a huge increase in the amount of text that one needs to be read in order to meet unknown words because each new or partially-learnt word is met more and more infrequently.

Considerable evidence (e.g. Nation 2001, Waring – Takaki 2003) suggests that our brains do not learn things all in one go, and we are destined to forget many things we learn and especially recent knowledge is quite fragile. We also tend to pick up complex things like language in small incremental pieces rather than in whole chunks. We know for example, that it takes between 10-30 or even 50 or more meetings of a word receptively for the form (spelling or sound) of an average word to be connected to its meaning (Waring, forthcoming).

The BNC data in Table 1 are for word families based on type. In other words the data states that meeting any of the family members 20 times (use, then uselessness, then user) means the whole family will be learnt after those 20 meetings. This is obviously a gross simplification as many derivations are easy to learn (wind/windy or teach/teacher), whereas other are complex and late acquired (govern/ungovernable or excuse/inexcusable). Moreover, the analysis does not account for polywords, not the thousands of lexical chunks and set phrases such as I’d rather not; If it were up to me, I’d…; We got a quick bite to eat; What’s the matter?; The best thing to do is … and so on. Nor does it take into account polysemy (multiple meaning senses of words), phrasal verbs, idioms and metaphor because the analysis was done by type. All these need to be learnt in addition to the single words.

Table 1 also does not take into account the volume of text needed to learn the collocations and colligations either. If we assume that the learning of a meaning and its form is a precondition for the learning of its collocations (we need to know calm and sea to know the collocation calm sea), we can conclude that these ‘deeper’ aspects of the learning of a word will take far longer than just learning the word as a single unit i.e. its form-meaning connection only. But how many collocations does each word have, on average? Here is a sample of some of the main collocations and colligations for the very common word idea (taken from Hill – Lewis 1997).

Verb collocations of Idea. e.g. abandon an idea

abandon, absorb, accept, adjust to, advocate, amplify, advance, back, be against, be committed/dedicated/drawn to, be obsessed with, be struck by, borrow, cherish, clarify, cling to, come out/up with, confirm, conjure up, consider, contemplate, convey, debate, debunk, defend, demonstrate, develop, deny, dismiss, dispel, disprove, distort, drop …………………….

These are just a small part of the verb collocations and colligations of one word – idea. And most of them were not given. This list only goes up to the letter d and there are about 100 more! And that doesn’t count the adjective uses (e.g. an abstract idea, an appealing idea, and arresting idea and so on) of which there are also several dozen. Not all words have this number of collocational partners and no one would suggest that learners need to know them all. Learners do however, need to know a good proportion of these to even approach native-like control and fluency over a given word and its collocations, thus the vocabulary task becomes even more arduous than that painted in Table 1.

The density of a text is a property of the learner, not the text itself. Thus a given text could be easy for one learner but impossibly hard for another. The above clearly suggest that language EFL learners who are trying to read fluently (extensively) who have not yet reached an advanced level (i.e. they know fewer than 5000 word families) should meet language which has been controlled and simplified so they are not overwhelmed by dense texts that prevent them from reading fluently. L1 texts (especially literary texts) typically are very dense lexically which would make them difficult to read and learn from and almost impossible to read fluently for all but the most highly advanced learners of English. Learners reading native texts that contain a high would make the reading slow and intensive and change the reading task into a linguistic (study) one rather than one for building fluency. This is not bad necessarily, but learners should be aware that unless they read a lot, they will not have the opportunity to meet the unknown words they need to strengthen their partially-known vocabularies. Therefore, EFL learners would need to use graded readers initially to help even out the density issues by systematizing the vocabulary load. Only when the learners can cope with more advanced texts, should they be exposed to them. Nevertheless, the volume of text needed to be met is immense and far beyond that of most normal courses. What this means is that far more than one book a week at the learner’s level will be required as was recommended by Nation and Wang (1999).

Unless the volume of reading is increased, it is likely that any partial knowledge of a given word will be lost from memory especially as each individual occurrence of words above this level appears so randomly and unpredictably in ungraded text. These data together suggest that it is unlikely much learning will occur from only reading above the 3000 word level unless several thousands of words are read per day.

To this point we have examined the vocabulary task at hand. If we now turn to the grammar, we can see a similarly massive task ahead of our learners. These examples of the present perfect tense, in its various guises, mask various forms and cannot be seen in the same way words can be, as the tense is abstract which makes it even harder to acquire.

The tense appears with differing subjects and objects, as both yes/no and wh- question forms, in the negative as well as declarative. It can be active or passive, continuous or simple, with have or has and that does not count the myriad regular and irregular past participle forms and the short answer forms. There are about 75 different possible variations of the form of the present perfect tense – and that does not count the different uses...

We have a fairly good idea about the uptake rates for words, but what about grammatical features? It is sad to say that after an exhaustive search for the uptake rates of grammatical features it appears that in the whole history of language research there is no data at all. None. This is amazing given that the vast majority of language courses taught today have a grammatical focus at least in part. How can we, as an industry, create courses and write learning materials without at least some idea of how frequently grammatical items need to be met for learning to occur? That said, it is clear that it typically takes several years after learners have been introduced to language features that they finally feel comfortable enough with them to start to use them at all, let alone correctly.

The above would seem to be a damning indictment on the benefit of incidental learning from fluent reading because it could be said that the time expended on the reading might be more fruitfully spent on intentional learning.

Indeed, recent research (Nozaki 2007) has shown that direct and intentional learning of vocabulary is faster than from incidental learning (i.e. from reading)... Nozaki found that the words met with word cards were learnt not only 16 times faster (words per hour of study), but were also retained longer than words learnt incidentally from reading.

Additionally, a case study of a learner in a study by Mukoyama (2004) showed that 30 minutes a day of learning Korean-Japanese word pairs for 30 days lead to 640 words being attempted and partially learnt. At the end of 30 days, 468 words were learnt (all the words were tested by L1-L2 translation) and two months later 395 words were still known, and at 7 months 310 words were retained all without any further meetings. These two studies together clearly show the power of intentional learning over incidental learning.

One might easily conclude from the above that we should not ask learners to learn vocabulary incidentally from reading, but rather adopt a systematic and intensive approach to direct vocabulary learning such as with word cards. One might even go further to conclude that by doing so, learners would not need to “waste” time reading, because they can learn faster from intentional learning and free up valuable class / learning time. However, this would be a grave mistake and a fundamentally flawed conclusion because language learning is far more complex than the extremely simplistic picture given above.

To really know a word well, learners need to know not only meanings and spellings, but the nuances of its meanings, its register, whether it is more commonly used for speaking or writing, which discourse categories it is usually found in, as well as its collocations and colligations, among many other things. The above studies see words as single stand-alone objects rather than words that co-exist and are co-learnt (and forgotten) with other words. They vastly underestimate what might be learnt because they only look at a partial, though very important, picture of word learning – the learning of single meanings.

One might be tempted to suggest given the rather slow rate at which vocabulary is learnt from incidental reading, that the multiple meanings, colligations, collocations, register, pragmatic values and so forth could be learnt intentionally. While this may be possible in theory and even in practice, we have to then ask where is the material to do this with? Where are the books that systematically teach this “deeper” vocabulary knowledge and recycle it dozens or hundreds of times beyond the form-meaning relationship (collocation etc.) for even the 1000 most frequent words? A few books exist but do not even come close to more than random selection of a choice few collocations, whereas as we have seen, learners need vastly more. In short, these materials do not exist. Even if they did, it would take a monumental amount of motivation to plough through such books intentionally and I doubt few, if any, learners have this stamina.

No learner has the time to methodically go through and learn all the above. No course book, or course, can possibly hope to teach even a tiny fraction of them. There is too much to do. But our course books were not designed to teach all of this. Let us look at what course books and course typically are designed to do. Our course books concentrate on introducing new language items with each appearing in new units or lessons, with new topics all the time.

The structure of course books and linear courses in general, shows us that they are not concerned with deepening knowledge of a given form, only introducing it or giving minimal practice in it beyond a token review unit, or test. They do not concentrate on the revisiting, recycling and revising necessary for acquisition. The assumption underlying most courses and course books is that our learners have “met” or “done that now” and we do not need to go back to it, so we can move on. Adopting this default view of language teaching (that “teaching equals learning” implicit in these materials) is a massive mistake if that is all we do because it undersells what our learners need – which is massive language practice with the things taught in course books but under the right conditions.

These data suggest that course books do not, and cannot by their very design provide the recycling of vocabulary needed for acquisition. This should not in any way be seen as an attack on course books. Course books are very useful and powerful but because of their design, they can only do half the job. They are good at introducing new language features in a linear way, but are not good, because of their design, at recycling this language and are poor at building depth of knowledge. If learners only use course books, and endless intensive reading books, they will not be able to pick up their own sense of how the language works until very late in their careers (i.e. until they have met the language enough times).

This, we can suspect, is one of the reasons teachers and learners alike complain that even after several years of English education, many learners cannot make even simple sentences even though they can get 100% on grammar and reading tests but can hardly say a word in anything other than faltering English. The reason for this should now be clear. Simply put, they did not meet enough language to fully learn what they were been taught. Their knowledge is abstract, and stays abstract, because it was taught abstractly because the course books and courses tend to break down the language into teachable units. This atomistic knowledge is useful for tests of discrete knowledge (e.g. selecting a tense from choices or completing gap fills) because this knowledge was learnt discretely, which allows them to do well on discrete point tests. However, because their knowledge is held discretely, it is no wonder when the learners are called upon to use it in speaking or writing they don’t know how to put their discrete knowledge together fluently.

So, how are the learners going to deepen their knowledge if they do not have time to learn these things intentionally, and our course books do not re-visit the features / words they teach? Where is the recycling of language we need for real learning? The answer lies with graded or extensive reading used in tandem with a taught course such as the course book shown above. The two must work together. The course book would introduce and give minimal practice in the language features and vocabulary while the reading of graded readers consolidate, strengthen and deepen that knowledge.

Therefore to gain fluent control over the language, the learners also must meet these items in real contexts to see how they work together, to see how they fit together. In other words learners must get a “sense” or “feeling” for how the language works. This sense of language can only come from meeting the language very often and by seeing it work in actual language use (i.e. from their reading or listening). This depth of knowledge gives learners the depth of language awareness and confidence to feel comfortable with the language that will enable them to speak or write. And this exposure comes from graded readers and extensive reading and extensive listening.

An oft-asked question is “Why can’t my learners speak? They’ve been learning English for years now. I teach them things, but they just don’t use them. It’s so frustrating.” Learners will only speak when they are ready to. That is, they will speak once they feel comfortable enough using the language feature or word.

But where does this comfort come from? It comes from experience with the language. The more times they meet a word, a phrase, a grammatical feature, the more chance it has to enter their comfort zone and the greater chance there is for it to become available for production. It is no wonder then that research into extensive reading that show gains for speaking only from extensive reading (e.g. Mason – Krashen 1997).

One-Year Brain Atrophy Evident in Healthy Aging

One-Year Brain Atrophy Evident in Healthy Aging

An accurate description of changes in the brain in healthy aging is needed to understand the basis of age-related changes in cognitive function. Cross-sectional magnetic resonance imaging (MRI) studies suggest thinning of the cerebral cortex, volumetric reductions of most subcortical structures, and ventricular expansion. However, there is a paucity of detailed longitudinal studies to support the cross-sectional findings. In the present study, 142 healthy elderly participants (60–91 years of age) were followed with repeated MRI, and were compared with 122 patients with mild to moderate Alzheimer's disease (AD). Volume changes were measured across the entire cortex and in 48 regions of interest. Cortical reductions in the healthy elderly were extensive after only 1 year, especially evident in temporal and prefrontal cortices, where annual decline was 0.5%. All subcortical and ventricular regions except caudate nucleus and the fourth ventricle changed significantly over 1 year. Some of the atrophy occurred in areas vulnerable to AD, while other changes were observed in areas less characteristic of the disease in early stages. This suggests that the changes are not primarily driven by degenerative processes associated with AD, although it is likely that preclinical changes associated with AD are superposed on changes due to normal aging in some subjects, especially in the temporal lobes. Finally, atrophy was found to accelerate with increasing age, and this was especially prominent in areas vulnerable to AD. Thus, it is possible that the accelerating atrophy with increasing age is due to preclinical AD.

How many words do we need? Corpora, frequency and language learning

This is a good continuation of my previous posts.

Corpus size, word frequency and extensive reading

Corpora comparison by frequency

Word frequency and incidental learning

Native speaker language input

The inescapable case for extensive reading

The nicest thing about it is that I didn't have to work on this one myself.

In linguistics a language corpus is a machine-searchable collection of examples of written and spoken language use. Corpus linguistics aims to discover patterns of authentic language use through analysis of actual usage. A good corpus will give a snapshot of modern language use. This will eventually result in new, comprehensive corpus-based reference grammars, textbooks and dictionaries. Teachers may improvise their own corpora for particular purposes. Yawn. Anyway, that's a short intro. I'm mostly looking at corpora as adequate, accessible and entertaining snapshots of language use. I am concerned with word frequency as a factor in practical vocabulary and grammar acquisition. So, a corpus as an entertaining snapshot of language, large enough to provide enough repetitions of the most frequent items across a variety of fields and facilitate natural acquisition. Or shoud we grab a dictionary and a grammar and plough through a smaller sample? Or maybe both? Obviously, learning something involves internalizing it, you do not want to simply end up with a passive version of a native speaker's active vocabulary. What's the number of repititions for active vocabulary? Is actual language use more important than input at this stage?

But I digress.

A note on corpus and lexicon size
Johanna Nichols, UC Berkeley



How much material is needed for minimal, normal, or optimal documentation of a language? How large should a text corpus be?

Cheng 2002, 2002 (and earlier works) shows that, for both English and Chinese, any given author uses a maximum of about 4000-8000 lemmatized words. A book of about 100,000 running words of text reaches this maximum in some cases. Cheng quotes Francis & Kucera 1982 to the effect that the Brown corpus (1,000,000 running words) contains over 30,000 lemmatized words of which just under 6000 occur more than 5 times.

Cheng's interpretation is that 8000 words is about the maximum for an individual's actively known vocabulary. Of the authors he has surveyed only Shakespeare reaches an exceptional 10,000 words. Other sources cite vocabulary sizes ranging up to 80,000 words for the individual; but this is passive vocabulary, some of it known only in the sense of being understood in context. Cheng 2000 shows that, over time from 93BC to the present, the size of Chinese dictionaries increases regularly but the size of the individual author's vocabulary remains at a constant 4000-8000 characters.
Cheng's results suggest a measure of adequacy for lexical documentation: it should reach the range of an individual's active vocabulary, and it should be compiled from extensive enough materials to include the entire active vocabulary for at least one good speaker and preferably several good speakers.

I note that a fair amount of Cheng's English corpus appears to be literature for young adults. The sources with the higher lemma figures are writers writing for a full-fledged audience, e.g. Mark Twain; nonfiction writers appear to mostly fall in here. So Cheng's figure of 8000 may be a minimum, in that inclusion of more varied genres would almost certainly expand it. Also, the figure probably includes distinctly fewer technical terms than the average user knows actively. Finally, what Cheng surveys in this paper is not the given author's whole oeuvre but just one large work or (e.g. for Mark Twain) a collection of short works. That said, evidently one needs close to 100,000 words per individual to have any chance of capturing that individual's entire active lexical range. That would be about 17 real-time hours:

Running words of text

The Uppsala corpus of Russian (1,000,000 words) yields zero or near-zero frequencies of some morphological forms. Timberlake 2004:6 searched for the two attested instrumental singular forms of Russian tysjacha 'thousand' and got zero returns for the less common one, while searching the entire Internet returned thousands of hits for each.

Therefore, a desideratum for corpora to be used for close syntactic work would be at least a million words, preferably at least ten million. Ten thousand or even less will suffice to attest the basic patterns. However, anything at all - even just a few sentences - is enormously valuable.

Monson et al. 2004 measure the rate at which new wordforms show up in running text and find that, for the polysynthetic language Mapudungun, there is no fall-off in the steep rate of increase of wordforms even after 1,000,000 running words. In the Spanish translation of this corpus, by contrast, the rate is much flatter.
Thus it appears that the number of words of running text needed for work on inflection and other morphology varies with the inflectional complexity of a language.

4. Real-time value of corpus sizes

Based on the Berkeley Ingush corpus... and on the corpus size reported by Monson et al. 2004, I calculate that an hour of transcribed recorded speech contains about 6000 words. (The figure might be somewhat higher for languages with less inflected and therefore shorter words.)

(Comment: this matches my estimates for some TV sitcoms)

Transcribed recorded hours needed at this rate for various corpus sizes:

1 million words (Brown size): 170 hours
10 million words: 1,700 hours
100 million words (BNC size): 17,000 hours

Recommended corpus sizes in running words.

Figures recommended here are for quality recordings, transcribed, glossed, and adequately commented -- that is, provided with fluent speaker judgments on the meaning of the material and the identity of the lexical items, and additional judgments on the kind of question that is likely to arise as a linguist works on the material.

Minimal documentation: Something like 1000 clauses excluding those with the most common verb (if any verb is substantially more common than others, as 'be' is in medieval Slavic texts). To be safe, 2000 clauses (this more than provides for excluding the most common verb). This would be several thousand to ten thousand running words. This appears to be minimally adequate for capturing major inflectional categories and major clause types, in moderately synthetic languages; for a highly synthetic or polysynthetic language more material is needed.

Basic documentation: About 100,000 running words, which appears to be the threshold figure adequate for capturing the typical good speaker's overall active vocabulary.
Good documentation: A million-word corpus. 150-200 hours of good-quality recorded text, up to about 20 hours per speaker, from a variety of speakers on a variety of topics in a variety of genres.

At 20 hours/speaker this is 10 speakers. Also, by Cheng's criteria, 100,000 words/speaker is 10 speakers for a million-word corpus. In reality, though, it is highly desirable to get more than 10 speakers (and also highly desirable to get the full 20 hours or 100,000 words from each of several speakers).

Excellent documentation: At least an order of magnitude larger than good; i.e. at least 10,000,000 words (1500-2000 recorded hours).

Full documentation: The sobering examples of the research experiences of Timberlake and Ruppenhofer (mentiolned above) show that even 100,000,000 words is at least an order of magnitude too small to capture phenomena that, though of low frequency, are in the competence of ordinary native speakers. That would represent at least 20,000 recorded hours, and it is too low by an order of magnitude.

Assuming that a typical speaker hears speech for about 8 hours per day, the typical exposure is around 3000 hours per year. Assuming that full ordinary linguistic competence (i.e. not highly educated competence but ordinary adult lexical competence) is reached by one's mid-twenties, that would represent 75,000 hours. For written languages, add to that some unknown amount representing reading. Extraordinary linguistic competence -- that of a genius like Shakespeare or a highly educated modern reader -- requires wide reading, attentive listening to a wide range of selected good speakers, and a good memory.

On these various criteria it would take well over a billion (a thousand million) running words, and over 100,000 carefully chosen recorded hours, to just begin to approach the lifetime exposure of a good young adult speaker. Unfortunately, field documentation cannot hope to reach these levels.

Sunday, April 4, 2010

Same Language Subtitling Indian TV Literacy Initiative

I ran into this a couple of days ago and I forgot about it. Thanks for the reminder, Greg. This could be a great resource to anyone studying Hindi or other Indian languages.

Let a Billion Readers Bloom: Same Language Subtitling (SLS) on Television for Mass Literacy

Reading Out of the “Idiot Box”: Same-Language Subtitling on Television in India

Clinton Global Initiative: 'Same Language Subtitling' on TV for Mass Literacy in India

Saturday, April 3, 2010

Foreign-Language Acquisition by Watching Subtitled Television Programs

Foreign-Language Acquisition by Watching Subtitled Television Programs

by Géry d’Ydewalle


The visual image (not including the subtitle) and the sequence of events in the movie typically provide abundant information which makes sometimes either understanding the spoken language or reading the subtitle superfluous. Moreover, it has been claimed that people
unconsciously lipread to a certain extent.

Moreover, the time spent in processing the subtitle did not change when reading the subtitle was made either more important for understanding the program(by switching off the soundtrack)or less compelling(when the subject knows the foreign language very well). Therefore, it was concluded that reading the subtitle at its onset presentation is more or less obligatory; it is unaffected by major contextual factors such as the availability of the soundtrack and important episodic characteristics of actions in the movie.

In Experiment 1, American subjects watched an American movie with English soundtrack and subtitles. Despite their lack of familiarity with subtitles, they spent considerable time in the subtitled area. Accordingly, subtitle reading cannot be due to habit formation from long-term experience. In Experiment 2, a movie in Dutch with Dutch subtitles was shown to Dutch-speaking subjects. They also looked extensively at the subtitles, suggesting that reading the subtitles is preferred because of efficiency in following and understanding the movie.

Although the attention pattern of fourth- and sixth-grade children did not differ from the pattern of adults, the pattern of secondgrade children depended largely on the movie shown. For example, second-grade children watched a subtitled “Garfield”(a heavily verbally loaded cartoon) exactly as adults did, but they did not read the subtitles in “Popeye”(an action-oriented cartoon). This suggests that reading subtitles is not yet completely compulsory for young children, although they are well able to read them(as evidenced by their behavior when watching “Garfield”).

The Dutch-speaking subjects were divided into four conditions: a Dutch film, a German film, a Dutch news broadcast, and a German news broadcast, all provided with Dutch subtitles. The results can be summarized as follows. With news broadcast, subjects had a greater need for subtitles as they started to look at the subtitles at a faster pace and read them for longer periods, even when the spoken news broadcast was in their own language. Elderly people complain more about subtitles than other age groups.

...with longer subtitles, younger people looked longer at the subtitle than the older people. As younger people read faster than older people and therefore finish reading earlier, younger people start re-reading the subtitles and therefore, linger longer in the subtitles.


While so far it is clear that reading the subtitles does occur, and switching the attention from the visual image to reading the subtitles happens to be effortless and almost automatic, the next question is whether the soundtrack is also processed to a certain extent simultaneously.

In Sohl(1989)with adults, a double-task technique was used. Apart from watching a television program, the subjects had to react to flashing lights(+ a sound beep)as fast as possible. The reaction time to the flashing lights was taken as a measurement for the amount of processing done with the first task, which was the viewing of a television program.

The results showed that the presence of subtitles consumes resources, and independently, the presence of voice also slows down the reaction times. The slowest reaction times with adults were obtained whenever both a speaker and a subtitle were present, which suggests that the adult participants do make an effort to follow the speech.

Since both subtitles(in the native language) and sound track (in the foreign language) are
processed almost in parallel, there may be language acquisition in such a context. Simultaneous viewing of the subtitles and listening to the soundtrack may be a factor in language acquisition.

(Comment: In this scenario the foreign language voice, the most difficult component, receives the least amount of attention. Adults do make an effort to follow the speech, however, partly because they're aware that subtitles do not tell the whole story and partly because the brain is wired to pay attention to human speech. The thread of the story is perceived in the mother tongue and the foreign language serves as a backup for additional information and a mood-setting background. The most easily discernible elements in this situation would be individual words and phrases. Learning occurs, it is beneficial but incomplete and inefficient. However, one important factor missing from everyday viewing activities is conscious language analysis.)

The standard condition is of course when the foreign language is in the soundtrack and the mother language in the subtitle; reversed subtitling refers to the condition where the mother language is in the soundtrack and the subtitles are in the foreign language. The adult participants were shown the subtitled cartoons for about 15 min long; immediately thereafter, foreign language acquisition was tested. The findings established without any doubts that there is considerable incidental language acquisition simply by watching a short subtitled movie.

(Comment: People are able to pick up more vocabulary reading foreign subtitles because they're proficient readers and they do not have to struggle with foreign sounds. They're following a simplified version of the story and reading it with their own inner voice. This defeats the main purpose of watching a foreign movie. Following the same line of reasoning it is far more efficient to read parallel texts.)

Surprisingly, there was not necessarily less foreign language acquisition when the foreign and mother languages were vastly different.

The study showed real but limited foreign-language acquisition by children watching a
subtitled movie... There was not more acquisition by the children in the present study than by the adults in the former studies, and again, acquisition was largely restricted to the vocabulary.

German is obviously more similar to Dutch and also sounds more familiar to us (the Dutch) then the Swedish language. Recognizing spoken words and sentences in such an unfamiliar language might just be too hard for children. Adults performed significantly better then children in the standard condition... In contrast to children, who showed an overall poor performance, they seemed to posses a mental processing capacity that required them to attend both information channels, at least partly or alternatively.


With the next two experiments we wanted to investigate under which circumstances foreign
language acquisition is most likely to occur, and whether or not children are in advantage when it comes to acquiring a foreign language in such an informal way as watching a television program.

(Comment: The participants watched a subtitled television program, for a very short period of time. Both circumstances are worth pointing out).

Some participants in the second experiment ...were explicitly instructed to draw attention to the foreign-language soundtrack and to the endings of the words especially, in order to search in what way a movie could help in acquiring the grammar of the foreign language.

The beneficial nature of explicit language learning is a matter of dispute. Ellis and Laporte (1997) distinguished implicit learning and explicit-selective learning, the former proceeding incidentally and unconsciously, and the latter intentionally and consciously through searching for hidden rules in the data.

All performance averages in conditions without advance rule presentation were at, or close to chance level, either with or without movie presentation. When the rules were presented in advance (explicit rules), performance on items where the explicit rules were to be applied was best when no movie had been watched.

According to a strong interface view, explicit learning is of great importance in further acquiring new, implicit rules. In Experiment 2, there was no such evidence...

Grammar, contrary to vocabulary, may be too complicated to acquire from a rather short
movie presentation. Pienemann... pointed out that large mental or grammatical complexity could prevent rules from being learned through simple presentation of the language. Upheld attention and sufficient motivation are necessary and basic ingredients for foreign-language grammar learning to occur, even in real-life situations.

According to Reber... the most appropriate instructions for more complex and non-salient rules would be incidental, but intentional for less complex and salient rules. Moreover, acquiring less salient rules incidentally could require exercise instead of merely observation. Possibly, a sequence of several movies, spread over a longer period of time, could solve both problems and provide conclusive evidence that vocabulary acquisition due to subtitled television programs is supplemented with grammar acquisition.

Incidental Foreign-Language Acquisition by Children Watching Subtitled Television Programs

Incidental Foreign-Language Acquisition by Children Watching Subtitled Television Programs

Abstract Previous research on adults has demonstrated incidental foreign-language acquisition by watching subtitled television programs in a foreign language. Based on these findings and the literature about the sensitive period for language acquisition, we expected the acquisition to be larger with children. A short subtitled cartoon was presented to Dutch-speaking children (8–12 years old). We varied the channel in which the foreign and native languages were presented (sound track and subtitles); we also looked at the effects of the existing knowledge of the foreign language (due to formal teaching at school) and the linguistic similarity between the native and the foreign language (using Danish and French as foreign languages). We obtained real but limited foreign-language acquisition and in contrast to the sensitive language-acquisition hypothesis, the learning of the children was not superior to that of adults investigated in prior studies. The acquisition here does not profit from the more formal language learning at school. Contrary to the adults, the children tend to acquire more when the foreign language is in the sound track than in the subtitles.

Children's vocabulary acquisition in a foreign language through watching subtitled television programs at home

I wonder if the results were affected by the fact that the children were apparently assigned a task and were actively analyzing the language.

Children's vocabulary acquisition in a foreign language through watching subtitled television programs at home


Subtitled television programs seem to provide a rich context for foreign language acquisition. Moreover, viewers are generally quite motivated to understand what is shown and said on television. The present study investigated whether children in Grades 4 and 6 (N = 246) learn English words through watching a television program with an English soundtrack and Dutch subtitles. Children were randomly assigned to one of three experimental conditions: (a) watching an English television program with Dutch subtitles, (b) watching the same English program without subtitles, and (c) watching a Dutch television program (control). The study was carried out using a 15-min documentary about grizzly bears. Vocabulary acquisition and recognition of English words were highest in the subtitled condition, indicating that Dutch elementary school children can incidentally acquire vocabulary in a foreign language through watching subtitled television programs.

Educational Technology Research and Development
Springer Boston
Volume 47, Number 1 / March, 1999

Foreign Subtitles Help and Native Language Subtitles Harm

Foreign Subtitles Help but Native-Language Subtitles Harm Foreign Speech Perception

Holger Mitterer1*, James M. McQueen1,2

1 Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands, 2 Behavioural Science Institute and Donders Institute for Brain, Cognition & Behaviour, Centre for Cognition, Radboud University Nijmegen, Nijmegen, The Netherlands


Understanding foreign speech is difficult, in part because of unusual mappings between sounds and words. It is known that listeners in their native language can use lexical knowledge (about how words ought to sound) to learn how to interpret unusual speech-sounds. We therefore investigated whether subtitles, which provide lexical information, support perceptual learning about foreign speech. Dutch participants, unfamiliar with Scottish and Australian regional accents of English, watched Scottish or Australian English videos with Dutch, English or no subtitles, and then repeated audio fragments of both accents. Repetition of novel fragments was worse after Dutch-subtitle exposure but better after English-subtitle exposure. Native-language subtitles appear to create lexical interference, but foreign-language subtitles assist speech learning by indicating which words (and hence sounds) are being spoken.


'Imagine an American listener, fluent in Mexican Spanish, watching El Laberinto del fauno [Pan's Labyrinth]. She may have considerable difficulty understanding the European Spanish if she is unfamiliar with that language variety. How might she be able to cope better?'

'We argue here that subtitles can help. Critically, the subtitles should be in Spanish, not English. This is because subtitles in the language of the film indicate which words are being spoken, and so can boost speech learning about foreign speech sounds.'

"Perceptual learning studies show that speech processing in the listener's native language can be retuned by lexical knowledge."

"listeners were using lexical knowledge to retune phonetic perception."

"Native-language subtitles help recognition of previously heard words but harm recognition of new words; (3) Foreign-language subtitles improve repetition of previously heard and new words, the latter demonstrating lexically-guided retuning of perception."

"We asked two questions. First, we tested whether audiovisual exposure allows listeners to adapt to an unfamiliar foreign accent. Second, we asked whether subtitles can influence this process. Our results show that this kind of adaptation is possible, and that subtitles which match the foreign spoken language help adaptation while subtitles in the listener's native language hinder adaptation.

The differences between the experimental and the control conditions speak to our first question. They show that listeners can adapt to an unfamiliar regional accent in a second language after only brief audiovisual exposure."

"Two points follow from the conclusion that the subtitle effects reflect lexically-guided retuning of perceptual categories. First, it would appear that lexically-guided learning can occur with real speech in a naturalistic setting. The phenomenon seems not to be restricted to the psycholinguistic laboratory. Second, this conclusion is consistent with the claim that lexically-guided retuning contributes to the way native listeners adapt to foreign-accented speech [9]. Importantly, however, the present findings show for the first time that this kind of perceptual learning is not restricted to native listening: It also occurs in second-language listening.

Our demonstration of perceptual learning about speech sounds in a second language has implications for both theory and practice in second-language acquisition. It has been suggested that certain aspects of language acquisition are fundamentally different in a second as opposed to a first language [28], [29]. With respect to speech recognition, however, the same perceptual-learning mechanism appears to apply in first- and second-language processing."

"As we used real subtitles, our results also have practical implications. Although the use of real subtitles meant that the listeners did not get a word-by-word transcription of the dialogue, it allows us to generalize our results to visual media exposure outside the laboratory. It appears that the largest benefit from this kind of real-world exposure, in the recognition of regional accents in a second language, comes from the use of subtitles in that language. But foreign-language subtitles are not what television viewers and filmgoers are familiar with. In many European countries (e.g., Germany) there is considerable public concern about international comparisons of scholarly achievements [e.g., 32]. Yet viewers are denied access to foreign-language speech, even on publicly-financed television programs. Instead, foreign languages are dubbed. In countries which use subtitles instead of dubbing (e.g., the Netherlands), only native-language subtitles are available, so again listeners are denied potential benefits in speech learning. Native-language subtitles are obviously essential for listeners who do not already speak a second language, and may thus be the only practical solution in cinemas. With the advent of digital television broadcasting, however, it is now possible to broadcast multiple audio channels and multiple types of subtitles. We suggest that it is now time to exploit these possibilities. Individuals can already take matters into their own hands, however. It is often possible to select foreign subtitles on commercial DVDs. So if, for example, an American speaker of Mexican Spanish wants to improve her understanding of European Spanish, we suggest that she should watch some DVDs of European Spanish films with Spanish subtitles."

Alcohol, inhibitions and pronunciation

In an experiment Guiora et al. (1972) attempted to mitigate the empathy level of their subjects by administering increasing amounts of alcohol. They found that the subjects’ pronunciation of the target language improved and then decreased as they drank increasing amounts of alcohol. Dull and Scovel (1972) demonstrated that very small amounts of alcohol (1 ounce) improved pronunciation in half of the subjects.

"Replications of this experiment were subsequently carried out using hypnosis and valium. In the first case - reported in Schumann et al. (1978) - it was found that deeply hypnotised subjects performed significantly better than less hypnotised subjects on an L2 pronunciation test, which may indicate that a willingness "to let go" is a good predictor of L2 pronunciation performance....

The valium study - reported in Guiora et al. (1980) - produced less clear-cut results, but did yield further evidence that relaxation, anxiety reduction and disinhibition, in this instance brought about by the administration of a psychotropic drug, were associated with improved L2 pronunciation."

Language acquisition: the age factor By David Michael Singleton, Lisa Ryan

Don't drink and drive and don't try to prove or disprove this by experimenting with alcohol. The point is that you need to relax, get into it and "let go".

Thursday, April 1, 2010

Watching TV ads is good for your brain

Watching TV ads is good for your brain

Commercial world enhances wellbeing of kids
The Best of the Best - Cadbury adverts

Watching television adverts is good for your brain as taking a break in the middle of programs improves concentration, according to new research from the Journal of Consumer Research.

As TV bosses plan to put even more adverts on the box, the research reveals that commercial interruptions often enhance our enjoyment of watching TV.

Even non-commercial interruptions had the same positive effect as commercials.

Leif Nelson, an assistant professor of marketing at the University of California, San Diego, and a co-author of the new research, said, “The punch line is that commercials make TV programs more enjoyable to watch. Even bad commercials.

“When I tell people this, they just kind of stare at me, in disbelief. The findings are simultaneously implausible and empirically coherent.”

Ads such as Cadbury's 'Gorilla' and Sony Bravia's 'Paint' have proved that consumers enjoy entertaining ads, and furthermore, are willing to pass them on to their friends.

Advertisers also are now making more of a point of telling us the exact time and program in which a campaign breaks. This week, Guinness announced it would be rerunning some of its most iconic and popular ads from the past 80 years. On of which breaks next Friday (9 March).

The research from the Journal of Consumer Research attributes the findings to a behavioral trait called adaptation.

Adaptation predicts that even positive experiences become less enjoyable over time. The study claims that the longer viewers continuously watch a TV program, the less intensely they enjoy the experience.

Therefore, an annoying set of commercials will make the show more enjoyable to the viewer once it comes back on.

Your Brain on Google: Patterns of Cerebral Activation during Internet Searching

Your Brain on Google: Patterns of Cerebral Activation during Internet Searching

Small, Gary W. M.D.; Moody, Teena D. Ph.D.; Siddarth, Prabha Ph.D.; Bookheimer, Susan Y. Ph.D.


Objective: Previous research suggests that engaging in mentally stimulating tasks may improve brain health and cognitive abilities. Using computer search engines to find information on the Internet has become a frequent daily activity of people at any age, including middle-aged and older adults. As a preliminary means of exploring the possible influence of Internet experience on brain activation patterns, the authors performed functional magnetic resonance imaging (MRI) of the brain in older persons during search engine use and explored whether prior search engine experience was associated with the pattern of brain activation during Internet use.

Design: Cross-sectional, exploratory observational study

Participants: The authors studied 24 subjects (age, 55-76 years) who were neurologically normal, of whom 12 had minimal Internet search engine experience (Net Naive group) and 12 had more extensive experience (Net Savvy group). The mean age and level of education were similar in the two groups.

Measurements: Patterns of brain activation during functional MRI scanning were determined while subjects performed a novel Internet search task, or a control task of reading text on a computer screen formatted to simulate the prototypic layout of a printed book, where the content was matched in all respects, in comparison with a nontext control task.

Results: The text reading task activated brain regions controlling language, reading, memory, and visual abilities, including left inferior frontal, temporal, posterior cingulate, parietal, and occipital regions, and both the magnitude and the extent of brain activation were similar in the Net Naive and Net Savvy groups. During the Internet search task, the Net Naive group showed an activation pattern similar to that of their text reading task, whereas the Net Savvy group demonstrated significant increases in signal intensity in additional regions controlling decision making, complex reasoning, and vision, including the frontal pole, anterior temporal region, anterior and posterior cingulate, and hippocampus. Internet searching was associated with a more than twofold increase in the extent of activation in the major regional clusters in the Net Savvy group compared with the Net Naive group (21,782 versus 8,646 total activated voxels).

Conclusion: Although the present findings must be interpreted cautiously in light of the exploratory design of this study, they suggest that Internet searching may engage a greater extent of neural circuitry not activated while reading text pages but only in people with prior computer and Internet search experience. These observations suggest that in middle-aged and older adults, prior experience with Internet searching may alter the brain's responsiveness in neural circuits controlling decision making and complex reasoning.

Teachers on fossilization

I have been looking for any PRACTICAL methods for dealing with 'fossilization.' I am already aware of most of the theory behind it, but I have not come across any activities that try to deal with it. It seems like most linguists seem to think adults with fossilization problems can't be helped. Any suggestions?




"I have never read any literature about 'fossilisation', but I do encounter this phenomenon daily in my job."

(Comment: Good for you, Roger)

What's to do?

(Uh, shouldn't you sort of stop right now? But hey, I've never read anything about the common cold and yet I know how to blow my nose. Let it rip, Roger!)

It is a phenomenon of the learner not being aware of how different their English (or any SL) is.

(No it's not that easy, Roger. Yes, sometimes they are not aware of particular issues. They are however often aware, or were aware but can't help it.)

In my view, they must first of all learn to identify their own fossilised pronunciation and what distinguishes it from standard English.

(German class: eek eek eek! Identified. He knows. He's still doing it. And what if the student can't hear the more subtle differences?)

My advice to Chinese speakers of English is to read aloud a well-rehearsed text and to tape-record it and to listen to it later.

(Now, sure, why not? Just don't repeat it too often.)

I also they need to speak the language with each other, in order to become aware of how CHinese mispronounce English, so as to be able to appreciate a native English speaker's English.

("Appreciate?" You snooty bastard. Yeah, nonnatives speaking to nonnatives in a foreign language. It's been done and they're doing it daily all over the globe. It does NOT work. It can only reinforce the phenomenon.)

Of course, ideally some changes to how English is being taught in China would take care of many problems we encounter here.

(What about Korea? And JARmany?)

For instance chorussing seems to be a major contributing factor to fossilisation!

(It contributes to the uniformity of Chinese English. In a way it's helpful, lol.).

Also the habit of Chinese to read aloud for mere oral practice seems to be rather counterproductive - students often don't know how to pronounce in the first place.

(So how would their talking to each other or listening to their own recording help?)

They don't learn to seek phonetic information from dictionaries either - they simply ask a teacher to say things aloud.

(That's actually a very good idea. Teacher quality is not their fault.)


Hello Maurice:

I agree with Roger that it is important to help students become aware of their mistakes and the places where their language may have fossilzed. They will never be able to change set learned patterns until they know what it is that they must change.


Keen observations, moonlight!

(It's noonlite, Roger)

I can only say what I have observed in Chinese classrooms, and here, unfortunately, neither Chinese teachers nor their students ever learn to become aware of faulty speech patterns! The prime reason is that the "communitarian" approach to teaching instills in the individual learner no such concept as awareness or responsibility for one's own progress.

(now that's a stinker!)

Read the rest at eslcafe.com