Saturday, July 18, 2009

Some unconventional language learning approaches

« J’ai connu une personne qui a appris le français de manière non conventionnelle, nous apprend Abderrezak Amara, de l’université de Mostaganem (Algérie). Il s’agit d’un vieux voisin qui, par des pratiques quotidiennes du français en situation de communication avec des locuteurs français présents en Algérie durant l’époque coloniale, a su intérioriser cette langue et a fini par la manier sans aucune difficulté. L’exercice de sa profession de marchand de légumes, la discussion avec les clients et le contact quotidien avec les autres marchands français, dans les halles et au marché, lui ont permis d’acquérir d’abord un professionnalisme, ensuite un savoir-faire langagier lié au jargon commercial. » Bel exemple d’apprentissage du français par immersion ! Une immersion qui peut aussi être aidée par l’école. C’est le cas pour Marina, fille de Maria Cristina Coelho, Brésilienne venue vivre à Nantes dans le cadre des accords de coopération France-Brésil : « Elle avait presque 4 ans. Elle est arrivée juste pour la rentrée des classes. Chaque soir, elle rentrait contente à la maison, sans rien comprendre du français que la gestuelle et les bonnes intentions des enseignantes de l’école maternelle. Et nous avons quitté la France avec une fille parlant un français parfait, appris de manière uniquement orale ! »

En bruit de fond

Cas de figure totalement inverse, Soline Vaillant, lectrice à Split (Croatie), nous parle, elle, d’une « personne qui a appris le français avec la méthode Assimil exclusivement. Son niveau est étonnant. Il a passé toutes ses soirées à écouter les cassettes d'Assimil tout en faisant autre chose en même temps. Il dit que ce « bruit de fond » entre dans l'inconscient. Il consacre également chaque jour environ quinze minutes à lire une leçon et à l’étudier. » Un peu similaire est le cas évoqué par Marie-Thérèse Barrès, de l'Université de Toulouse Le Mirail : « J’ai eu une étudiante dans ma classe, qui avait appris le français uniquement avec un dictionnaire ! Elle possédait les mots, il suffisait de les dire... C’était une réfugiée politique afghane. Arrivée en France, elle s'est efforcée d'imiter, de communiquer. Pendant les cours, elle manifestait une telle motivation que son apprentissage a été très rapide. » La pratique de la lecture en solitaire revient souvent dans les réponses. Horst-Juergen Herbert, de Baunschweig (Allemagne) évoque son propre cas : « Depuis une année, j'essaye d'apprendre le français en lisant le journal Le Monde sur l'internet. J’ai 50 ans et j'aime beaucoup le français, une langue que je n'ai pas pu apprendre dans ma jeunesse car j’ai vécu derrière le rideau de fer et dû apprendre le russe. Comme je parle aussi le roumain et un peu d'espagnol, je n'ai pas de grandes difficultés à lire en français. C'est de l'entretien passionnant et pas trop cher. » Monica Bustamante, de Valladolid (Espagne), nous parle, elle, de son mari : « Il est, et a été depuis son plus jeune âge, passionné de musique. Ne connaissant pas un seul mot de français, il s'est abonné à des magazines (d'abord Rock&Folk, puis Les Inrockuptibles) et, peu à peu, il est arrivé à presque tout comprendre dans des articles qui sont difficiles pour moi, professeur de français. Son objectif, la compréhension écrite, est atteint, et de manière autodidacte ! »
Mais il n’y a pas que la lecture, dans la vie ! « Je connais deux personnes qui ont appris le français... en regardant la télé, grâce aux images et au support de l'histoire (fiction) ou du message (pub), écrit Mireio Pradel. Un camarade italien et une jeune épouse migrante originaire du Maghreb, un peu cloîtrée, qui a mis à profit son isolement pour apprendre la langue de son pays d'accueil. »

Apprendre le slovaque

Et si l’on inversait les rôles ? C’est ce qui est arrivé à Magali Boursier, du SCAC de Bratislava, en Slovaquie : « Je n’avais aucune base de slovaque en arrivant. J’ai maintenant un niveau conversationnel correct, tout en ayant pris très peu de cours. Il se trouve que c’est au cours des « soirées francophones » que j’ai le plus progressé en slovaque. Le principe était simple : réunir autour d’un verre des personnes désireuses de parler français. J’ai gardé les sous-bocks de bière où ont été griffonnés des mots slovaques… » L’étape décisive a été une rencontre amicale : la femme d’un Slovaque francophone, qui prenait des cours de français. « Puisqu’elle progressait en français, je décidai d’aller vers elle et de progresser en slovaque : l’amitié fait des miracles… » Et de conclure : « Apprenez les langues ! Allez à la rencontre des langues des autres ! Enrichissez-vous ! »

Jean-Claude Demari


Note

1. La question posée sur www.fdlm.org était la suivante : « Connaissez-vous quelqu’un qui a appris le français de manière non-conventionnelle ? Et comment ?

Le français dans le monde est la revue de la fédération internationale des professeurs de français.

Link

Deux petits Oranais ont appris l’allemand grâce à la télé

Agé de dix ans à peine, Adel Bendaha est un enfant qui parle couramment l'allemand. Sa petite sœur Sara, 6 ans, à force de l'imiter, a acquis la même capacité que lui, celle de manier avec aisance un parler aussi complexe que celui de la langue de Goethe. Ce don fait de Adel l'attraction de toute la ville d’Oran. Ses parents sont pourtant des Algériens qui n'ont jamais appris l'allemand. Cette facilité qui permet à Adel de corriger la prononciation de certains adultes qui ont pourtant vécu longtemps en Allemagne, son père l'explique par une «addiction» aux programmes (pour enfants) des chaînes allemandes, les seules qu'il a pu capter, il y a quelques années, avec sa parabole à cause d'une bizarrerie technique. «Au début, je ne comprenais rien à ce qui se disait dans les émissions de ces chaînes, mais par la suite, j'ai pu saisir le sens des mots et construire ensuite des phrases», dira Adel. En constatant que son fils usait de mots d'allemand, son père l'a cru un moment «possédé» et a même voulu le présenter à un «taleb». Naturellement, Adel supporte la Mannschaft, l'équipe nationale allemande, qu'il espère voir couronnée du titre de champion de l'Euro-2008...


link


"Ce que les enfants disent
apprendre à la télévision

Les enfants disent regarder la télé pour passer
le temps, se divertir, s’informer et apprendre.
Apprendre quoi? Les enfants qui ont participé
à la présente enquête ont été invités à nommer
trois choses qu’ils avaient apprises à la
télévision. Il est ressorti de leurs réponses des
« savoirs » de plusieurs ordres : habiletés,
information, connaissances, attitudes et comportements4.
Par exemple, les enfants ont dit avoir appris à
patiner, fabriquer des animaux en papier,
coudre, cuisiner, dessiner, etc. Des néo-
Québécois ont affirmé avoir appris le français
à la télévision (et des Québécois francophones,
l’anglais). Des enfants ont fait référence
à des sujets d’actualités comme le
scandale des commandites ou la guerre en
Irak; à des intérêts intellectuels comme la vie
des animaux, le mouvement des planètes, la
langue (mots nouveaux et compliqués); à des
gestes louables comme se montrer gentils
avec les autres, faire rire les personnes tristes,
ou ne pas répéter les mêmes erreurs que
d’autres ont faites. Selon les goûts manifestés
par les enfants, il est possible de tracer différents
profils de téléspectateurs : le sportif,
le bricoleur, l’artiste, etc. – voire le rebelle,
qui a appris des « gros mots » et des
« affaires violentes », ou le frimeur, qui prétend
soit avoir tout appris à la télévision, soit
n’y avoir rien appris du tout parce que c’est
« bébé » ou « des affaires de fille»…"


link

Wednesday, July 1, 2009

Native speaker language input

Pre-teen to young adult yearly language input estimate:

TV 4.5 HRS 27,000 words per day, 9,828,000 words per year
The equivalent of 98 books.

Other screen time: 4 hours

2 hrs reading, movies, surfing, working etc. Counted as reading @ 150 wpm 18,000 words per day, 6,552,000 words in a year.

Two hours subtracted for video games and other non-productive screen time and accounting for the fact that some videogames do contain spoken language etc. This is fair especially since the remaining two hours are counted as reading. Pre-Teens play computer and video games 3 hours per day, more than teens (2 hours per day) or young adults (1 hour per day). Screens are also used for viewing pictures etc.

Studying 2 hours @ 150 wpm 18,000 WORDS 6,552,000 words in a year

VERBAL COMMUNICATION 30,000 words per day 11,000,000 words per year (your daily life is 5 hours of sitcom)

Total: Approximately 90,000 - 100,000 words per day or 34-36 million words of input per year


Approximate native learning path through input

YRS 1-3 10,000,000 words (max. 30,000,000)

YRS 4-7
TV 35,000,000 words
verbal communication 15,000,000 words
Total: 50,000,000 words

YRS 8-10
TV 26,208,000 words
reading 5,000,000 words (assuming that 5th-grade children actually do read about 5,300 words per day)
verbal communication 10,000,000
Total: 41,000,000 words

Yrs 1-10 easily over 100,000,000 words (1,000 books)

YRS 11-17 250,000,000 words

Total 350,000,000 words (the running text of 3,500 average books)

Sunday, June 28, 2009

All's well in TV land

A while back we took a glance at how much information is out there (the information exaflood). Now let's take a look at how is this information consumed. Since I like to write here about language learning let's call it native-level language input. How much "input" do native speakers get, how much do they speak? Later we'll see if we can work out an input to fluency ratio.

TV and other types of electronic infotainment

It is a well-known fact that the average American spends over 4 hours per day watching television.

The global average daily TV viewing time in 2003: three hours and 39 minutes. TV fiction or drama is traditionally the world's most popular type of show (2003: 41 percent), but it has edged down four points from 2002 by entertainment programs, comprising games, variety, music, reality and theatre shows.

Number of spoken words per sitcom/drama: around 6,000 per 60 minutes of pure programming (no credits, bare script without actor instructions). My own estimate based on one 22 minute sitcom and a 50-minute drama.

“Viewers in Japan remain the world's top TV watchers, with a viewing time of four hours and 29 minutes per person per day, just ahead of the United States where time spent in front of the box was four hours and 25 minutes.”

USA! USA! We can do it, people!

British people watch TV 148 minutes a day, 900 hours a year or 2944 days. This average TV watching is spread across the entire lifetime. Source: The Human Footprint Channel Four Documentary.

Where’s my Visine?

The average American adult spends 8 1/2 hours a day staring into all types of screens. Link

“…the observers recorded — in 10-second increments — consumer exposure to visual content presented on any of four categories of screens: traditional television (including live TV as well as DVD/VCR and DVR playback); computer (including Web use, e-mail, instant messaging and stored or streaming video); mobile devices such as a Blackberry or iPhone (including Web use, text messaging and mobile video); and "all other screens" (including display screens in out-of-home environments, in-cinema movies and other messaging and even GPS navigation units).
All told, the VCM study generated data covering more than three-quarters of a million minutes or a total of 952 observed days. This is the largest and most extensive observational study of media usage ever conducted. the VCM study found the average for all other age groups to be "strikingly similar" at roughly 8 1/2 hours”

College students may be watching less TV but they’re still glued to a screen. They are also "multitasking". Haha

“The findings confirm other recent reports concluding that college students are heavy online users. For instance, earlier this month Alloy Media + Marketing reported that students spend 3.5 hours a day e-mailing, instant messaging and Web surfing, and 6.5 hours a week on social networking sites. Burst also found that a large number of students spend minimal time with either the TV or the radio. Around 30 percent of respondents reported spending less than three hours per week watching television, while almost half--46 percent--said they devote less than three hours a week to radio listening. What's more, many college students who watch TV or listen to the radio are multi-tasking at the time. Around 64 percent of respondents report using the computer when viewing TV, while around 60 percent use a computer while the radio is on.”

The average college student spends a very small percentage of his or her time in class or studying. Link

"Students spend 1.7 hours in class per day, on average, and another
1.6 hours studying. The rest of the day - when they're not working
(2.6 hours) or sleeping (6.8 hours) - is up for grabs. Essentially,
that leaves 11.3 hours for students to party (or do whatever else
college students do). So what do they do? More than anything, they surf the Net. Fully 99 percent of college students go online at least a few times per week; 90 percent do so daily. ”

TV and children

Number of TV commercials viewed by American children a year: 20,000
Age by which children can develop brand loyalty: 2
Time per day that TV is on in an average US home: 7 hours, 40 minutes
Hours of TV watching per week shown to negatively affect academic achievement: 10+
Hours per year the average American youth spends in school: 900
Hours per year the average American youth watches television: 1023

Music: Americans spend 3.7 hours per week listening to music on CDs/iPods
Source: PubTrack Consumer


Speaking

A five-year old child’s typical verbal output is 10,000 – 15,000 words per day of grammatically correct, meaningful communication, drawing on a vocabulary of about 5,000 words.

The most comprehensive study of this kind involving college students found that both men and women used an average of 16,000 words each day or about 15 words per waking minute, assuming a person sleeps 7 hours.

The researchers also pointed out that there are "very large individual differences around this mean," or average. For instance, one of the most talkative males spewed out 47,000 words a day (nearly 1 per second) compared with just more than 500 daily words for the least talkative male.

Keep in mind that the “silent type” college student speaking about 500 words per day is also a native speaker.

British people speak on average 4,300 words a day, more for women, less for men. This average figure is spread across the entire lifetime:

“The vocabulary of the average UK citizen is just 25,000 words, only 4% of the 616,000 words in the English Oxford Dictionary. We speak on average 4,300 words a day - more for women (6,400-8,000), less for men (2,000-4,000). That's 123,205,750 words in a lifetime.” If we compiled a book from all the words spoken in a lifetime, it would add up to some 5 volumes. Not nearly as impressive as the student chatterbox.

This information is from a transcript of “Human footprint” a Channel Four (British) documentary. There is also a US National Geographic version but I don't know if it is as detailed.

Women vs men (somewhat humorous, most figures apparently not based on solid or at least traceable research)

20,000/7,000; 30,000/15,000; 7,000/2,000; 30,000/12,000; 50,000/25,000; 25,000/12,000

Sarah and Tim experiment

Hannah: 12,329 words
Tim: 11,279 words

Hannah "accidentally" turned off her recorder for two hours so her real total could be 14,000. A likely story.

The average American spends at least 622 minutes a month on the cell phone which amounts approximately to at least 20 minutes of cell phone use per day. Minorities speak more.

Reading

The Britons will read 533 books (8 books per year) and 2,455 newspapers during their lifetimes. This will consume some 24 trees per person. Some 3% of people can't read in the U.K., 40% choose not to read, and more households own two cars than two novels! (The Human Footprint).

Americans spend 3.9 hours per week reading books. Some forty-five percent of Americans over the age of 13 read a book in 2008 and one in three of them were over the age of 55, according to Bowker.

Americans also spend four hours per week reading newspapers/magazines.

These figures vary. One newspaper headline: “3 out of 4 Americans read books each year.” Another: “One in four Americans read no books last year”.

According to 2007 US Census Bureau figures, the average US adult spent

65 days in front of the TV
41 days listening to the radio
A little over a week on the Internet
A week reading a daily newspaper
A week listening to recorded music

Per day:

4.27 hours watching TV
2.7 hours listening to the radio
And roughly 30 minutes per day
surfing,
reading a newspaper
and listening to music

The Internet statistic seems to contradict some other published research. The likely reason is that places like Bowker or Nielsen track “nationally representative” panels of U.S adult men, women and teens. In other words, if you’re reading this you’re also more likely to be “wired” and “representative”.

Listening

Now, this is a tough one. Obviously TV, radio etc. needs to be counted but what about daily verbal communication? Some good figures can be found about children:

According to some studies, parents should speak at least 17,000 words per day to children before age 3 in order to ensure academic success:

This "groundbreaking" 1995 study apparently caught on with the rest of us in 2008 thanks to ABC news.


"By age 3, children from privileged families have heard 30 million more words than children from underprivileged families. Longitudinal data on 42 families examined what accounted for enormous differences in rates of vocabulary growth. Children turned out to be like their parents in stature, activity level, vocabulary resources, and language and interaction styles. Follow-up data indicated that the 3-year-old measures of accomplishment predicted third grade school achievement."

Hart and Risley (2003)

Number of words heard at home per hour by 1- and 2-year-olds learning to talk:

low-income child: 620
middle-income child: 1,250
high-income child: 2,150

Number of words heard by age 3:

low-income child: 10 million
middle-income child: 20 million
high-income child: 30 million

Source: Hart & Risley, 1995. Meaningful Differences in the Everyday Experiences of Young Children

The New York Times article What It Takes to Make a Student has more on this:

"They found ... that vocabulary growth differed sharply by class and that the gap between the classes opened early. By age 3, children whose parents were professionals had vocabularies of about 1,100 words, and children whose parents were on welfare had vocabularies of about 525 words. The children’s I.Q.’s correlated closely to their vocabularies. The average I.Q. among the professional children was 117, and the welfare children had an average I.Q. of 79...

When Hart and Risley then addressed the question of just what caused those variations, the answer they arrived at was startling. By comparing the vocabulary scores with their observations of each child’s home life, they were able to conclude that the size of each child’s vocabulary correlated most closely to one simple factor: the number of words the parents spoke to the child. That varied greatly across the homes they visited, and again, it varied by class. In the professional homes, parents directed an average of 487 “utterances” — anything from a one-word command to a full soliloquy — to their children each hour. In welfare homes, the children heard 178 utterances per hour. What’s more, the kinds of words and statements that children heard varied by class. The most basic difference was in the number of “discouragements” a child heard — prohibitions and words of disapproval — compared with the number of encouragements, or words of praise and approval. By age 3, the average child of a professional heard about 500,000 encouragements and 80,000 discouragements. For the welfare children, the situation was reversed: they heard, on average, about 75,000 encouragements and 200,000 discouragements. Hart and Risley found that as the number of words a child heard increased, the complexity of that language increased as well. As conversation moved beyond simple instructions, it blossomed into discussions of the past and future, of feelings, of abstractions, of the way one thing causes another — all of which stimulated intellectual development."

More about this here

Parent anxiety has helped create some curious products:

Device Counts Amount of Baby Talk a Day
Experts Say Children Should Hear 25 Million Words By 4 Years Old

Lena, created by Infotorture er, Infoture, Inc., is a "verbal pedometer" that helps parents gauge how vocal they are with their child by measuring the number of distinct words a baby hears.

A little bit about vocabulary development in small children

By age 3, a child is forming simple sentences, mastering grammar, and experiencing a “vocabulary explosion” that will result, by age 6, in a lexicon of more than 10,000words.

Ross A. Thompson, Developmental Psychologist, University of California, Davis - National Scientific Council on the Developing Child. Link

“Children continue to learn new words throughout the preschool period and into the school years. Anglin (1993) has estimated that, on average, children’s vocabularies grow from 11,000 words at age 6 to 20,000 words at age 8 to 40,000 words at the age of 10 years….

Towards the end of the second year of life children begin to acquire morphology. The acquisition of morphological knowledge marks an important milestone in language development because it is an essential component of mastering grammatical rules and also in the development of vocabulary.”

Developmental psychology by Margaret Harris, George Butterworth

Friday, June 26, 2009

The TV "method" or how I learned Italian

...while watching TV. The “TV method” explained. How I learned Italian from (seemingly) "incomprehensible" input. Prompted by some other blogs that mostly use DVDs and therefore seriously corrupt the methodology. Haha.

How I learned German using a similar aproach: link

This eventually resulted in a very advanced fluency in Italian. The “method” consisted in consuming massive amounts of TV programming during summer holidays. The student was a 5-year-old kid. After about three summers I remember being able to claim that I completely understood a show, after several more I could be described as very advanced. My language skills actually blossomed (and wilted) several times before I even started high school.

I saw my first cartoon in Italian when I was five. I was alone and bored and I remember playing with an old-fashioned TV-tuner, the antenna was weak and pointed in the wrong direction but I managed to catch an Italian-language cartoon about a family of bears. I believe it was a Japanese cartoon, perhaps Orso Misha, perhaps something else. It blew my mind. I had to see more. That’s the last I saw of it for a while. My dad’s apartment was facing the sea. I remember begging him later to point the antenna directly towards Italy. I spent a lot of time at my dad’s fine-tuning the channels and watching cartoons on an old black-and-white TV. The picture was not great but the sound was usually excellent. I remember he once got angry when I replaced all his local channels. I also remember him teasing me about watching something I could not understand and how I proved him wrong. This has helped me to roughly pinpoint the exact time when I was able to follow Italian programming without any great difficulty. I was around 9 years old and I remember watching "La Principessa Zaffiro" and "Kimba" in Italian. I remember the story line and plot points. I remember being able to watch it with full understanding. It therefore took about three, maybe four summers or some 12 months of intensive TV watching in order to reach this point. I did go to the beach and play with friends etc. but I still managed to spend about 8 hours per day watching Italian TV. It took me about 3,000 hours to achieve excellent passive understanding of Italian. That's about 18 million words of written text or some 180 books.

When I was 10 or thereabouts I started bustling around antennas and disassembling amplifiers. I needed my entertainment. Now maybe a scientifically oriented person can tell me why I wasn't able to watch terrestrial TV during most of the rest of the year, even on very nice days. Refraction? If that had been possible, I would have truly gone native. How do I account for some 8-9 months of not doing anything Italian-related? How fast would my progress have been? I had massive exposure but also very long periods without doing anything related to a particular language.

My mom taught Italian and French in high school. I think it’s fair to mention this but as far as Italian is concerned, this only meant that I had access to dictionaries and a few interesting magazines. She was a Francophile and she insisted on teaching me French. I used her Italian dictionaries only a few times. I do remember looking up a few things and getting a kick out of it. For some reason I still remember looking up “basil”. I remember reading a few articles about sharks, about a 19th century brigand and a girl who later became a saint (she was awfully pretty). I was 10-11 when I discovered magazines but my interest did not last. I do remember underlining a few words and looking them up. I also remember playing with an old encyclopedic Italian dictionary, looking at the pictures and reading randomly. I also remember listening to Italian radio a few times when I was especially bored during the winter months but I quickly lost interest.

The types of TV programming I consumed:

Movies: American movies, Italian movies, other foreign movies (sci-fi, horror, comedy etc.)
Series, Italian and foreign: crime, sitcoms, drama etc. Little house on the Prairie, The Dukes of Hazzard, Battlestar Galactica, Piovra…

Japanese cartoons about: fishing, giant robots, Japanese history, basketball, golf, football, baseball, judo, aliens destroying Japan, aliens falling in love with earthlings, earthlings falling in love with all sorts of things, Buddhism, imaginary competitions, student life, animals, insects, daily routine (sitcom), historical drama, romance … AND all sorts of other stuff - normal and weird.

US cartoons
Brazilian telenovelas
Documentaries about nature, history, outer space
News, local news (mafia stuff)
Teletext (later on)
TV Shows (talk, comedy)

Commercials galore, commercials, commercials, commercials. I think I learned the basics through commercials. Commercials and reruns are a sort of an evil spaced repetition. No escape, the same commercials would hound you on many channels simultaneously (especially if you were following cartoons on local networks)

My weakest point: speaking. The reason is obvious. I never had any opportunity to speak. I was never completely “fluent” in the sense of effortless native proficiency. My Italian “performance” if staged well and prepared in advance could have been described as near-native fairly early.

I was able to study Italian in high school. I usually did my homework five minutes before the class. I never prepared for any exams. This resulted in one embarrassing situation after prolonged illness but I always ended up with a final A. This was the time when I discovered satellite TV and English and German language programming. I neglected Italian but I never abandoned it. My skills were very broad but declining and if someone probed a particular area they could certainly find some shortcomings.

My high school leaving exam paper (yeah, there is such a thing) was in Italian and I defended it in Italian. I chose a famous Italian historical novel :) The smiley is for people who know what I’m talking about. I didn’t have any problems reading it.

I chose to study Italian at university. Unlike some Italian courses in the US, this was serious stuff. I never spoke with anyone – except for fake conversations in class. Most of my university exams were straight A’s. This obviously included oral exams. The few B’s were due to non-linguistic circumstances.

I never had to learn grammar in order to pass a language test. I was required to read and write a lot and this is where I benefited tremendously.

It's been seven years since I did anything remotely challenging that involved Italian.

Saturday, June 20, 2009

Writing in a language that's not one's own

French-language authors whose mother tongue is not French. In no particular order:

Casanova, Madame de Noailles (Romanian), Anna Moi (Vietnamese), François Cheng , Tierno Monembo , Aki Shimazaki (Japanese), Seymus Dagtekin, (a Turkish Kurd), Samuel Beckett (Irish), Julien Green, Eugène Ionesco (Romanian, mother French), Milan Kundera (Czech), Camara Laye, Léopold Senghor (Senegalese), Cioran (Romanian), Tristan Tzara (Romanian), Elie Wiesel, Atiq Rahimi (from Afghanistan), Eduardo Manet (Cuban), Brina Svit (Slovenian), David I. Grossvogel (American), Tahar Ben Jelloun, Amin Maalouf, Andrei Makine, Jonathan Littell (American), Hector Bianciotti Argentinian), Silvia Baron Supervielle (Argentinan), Vassilis Alexakis (Greek), Andrei Makine (Russian), Anne Weber (Germany), Bjorn Larsson (Swedish), Ying Chen (Chinese), Fouad Laroui, (born in Morocco, based in the Netherlands writing in French and Dutch), Andrei Vieru (Russian-Romanian), Arthur Adamov (Russian, of Armenian origin), Henri Troyat (Lev Tarassov), Michel Del Castillo (Spanish, father French), Julia Kristeva (Bulgarian), Oscar Wilde (one play – Salomé).

The Phenomenon of Authors Whose First Language Isn’t French Writing In French

English-language authors whose mother tongue is not English.

Achebe, Chinua
Arlen, Michael
Asimov, Isaac
Bellow, Saul
Brodsky, Joseph
Bronowski, Jacob
Broumas, Olga
Budrys, Algis
Codrescu, Andrei
Conrad, Joseph
Dinesen, Isak
Heym, Stefan
Ishiguro, Kazuo
Kakuzo, Okakura
Kerouac, Jack
Kingston, Maxine Hong
Koestler, Arthur
Kosinski, Jerzy
Lewis, Saunders
Limonov, Eddie
Lin Yu-tang
Lowe, Adolph
Lundwall, Sam
Malinowski, Bronislaw
Milosz, Czeslaw
Mukherjee, Bharati
Nabokov, Vladimir
Narayan, R. K.
Nin, Anais
Rand, Ayn
Sabatini, Rafael
Seth, Vikram
Skvorecky, Josef
Smirnov, Yakov
Soyinka, Wole
Stoppard, Tom
van Gulik, Robert
Vincinzey, Stephen
Wertenbaker, Timberlake
Wongar, Banumbir
Zukofsky, Louis

German

Some winners of Adelbert von Chamisso Prize, a German award to foreign writers recognized for their contribution to German culture.

Adel Karasholi, (Syrian), Galsan Chinag, (Mongolian), Yoko Tawada (Japanese), Maria Cecilia Barbetta (Argentinian), Asfa-Wossen Asserate, Franco Biondi, Gino Chiellino, Zehra Cirak (Turkish), György Dalos, Dante Andrea Franzetti, Zsuzsanna Gahse, Yüksel Pazarkaya, Ilma Rakusa, Luo Lingyuan (Chinese), Tzveta Sofronieva (Bulgarian) Michael Stavaric, Saša Stanišić.

Monday, April 27, 2009

word frequency and incidental learning

The most frequent 4000 word families from the BNC provide 95% coverage of new texts which translates into “adequate comprehension” (1 in 20 words per 2 lines unknown) with “some learners” (Hu and Nation). Most, however, do not have adequate comprehension even with 95% coverage. For most learners, 98 % coverage was necessary to achieve adequate comprehension of fiction. For reading to be considered a pleasurable activity some researchers (Hirsh and Nation, 1992) suggest that 98-99% coverage may be necessary (one unknown word in every 50-100 running words).

80% 1 in 5
90% 1 in 10 words per line
95% 1 in 20 words per 2 lines
98% coverage (eight unknown words per 400 word page)
7000 words are needed for 98% coverage (Nation, 2006).

A collection of excerpts regarding word frequency and vocabulary acquisition through incidental learning

“The results showed that knowledge of the most frequent 3,000 word families plus proper nouns and marginal words provided 95.76% coverage, and knowledge of the most frequent 6,000 word families plus proper nouns and marginal words provided 98.15% coverage of movies. Both American and British movies reached 95% coverage at the 3,000 word level. However, American movies reached 98% coverage at the 6,000 word level while British movies reached 98% coverage at the 7,000 word level. The vocabulary size necessary to reach 95% coverage of the different genres ranged from 3,000 to 4,000 word families plus proper nouns and marginal words, and 5,000 to 10,000 word families plus proper nouns and marginal words to reach 98% coverage.”

The Lexical Coverage of Movies
Stuart Webb1 and Michael P. H. Rodgers2
This item requires a subscription to Applied Linguistics Online.

“A corpus of one million words would probably have over 60,000 instances of the word the but is unlikely to include any of the following: gastronomic, plagiarism, incoherent, reassuring, preach all of which have a frequency rating of well under one-hit-per-million-words, yet could hardly be described as obscure.”

HLT Magazine

“The source text consisted of three months (approximately 5 million words) of Le Monde

#sentences 167,359
#words (total) 4,244,810

Less than 20% of the distinct words account for over 95% of all word occurrences. In fact, 40% (about 35,000 words) occurred only once in the text, and 60% of the words appeared at most 3 times. This effect is even more pronounced for syllables, where the roughly 20% most common syllables account for 98% of all syllable occurrences.”

http://www.limsi.fr/~lamel/euro91.pdf

“Given this enormous amount of material, you might expect to find a lot of frequent idioms. If so, you would be disappointed. Simpson and Mendis found only 8 idioms that occurred more than 10 times (ranging from 10-17 times) in their corpus of nearly 2 million words/197 hours. Another 107 occur 1.2-2.4 times per million words. Liu, with an even larger corpora (roughly 6 million words) and a more generous definition, found only 47 items with a frequency of 50 or more tokens per million words. Another 107 had a frequency of 11-49 per million words and the other 148 had a frequency of 2-19 per million words. That’s a total of only 302 idioms, which strikes me as not only a relatively limited number, but also a very teachable number. The lack of many common idioms, makes the task of teaching idioms both easier and harder. It is easier because we can focus our teaching on those idioms that are fairly frequent”.

http://www.nystesol.org/pub/idiom_archive/idiom_summer2005.html

The effect of frequency of occurrence on incidental word learning.

“It should also be pointed out that the volume of text that would need to be read to meet an unknown word increases with reading ability level. This is because rarer words are met less frequently and thus more text has to be read to meet an unknown word the required number of times. This also has implications for the amount of text that needs to be read.”

http://nflrc.hawaii.edu/rfl/October2003/waring/waring.html

“The frequency of words in the language as a whole was also investigated; Brown (1993) found overall frequency to be a better predictor of incidental vocabulary growth than frequency in the specific texts her subjects read. The third explanatory variable was learner vocabulary size. It was assumed that knowing more words would assure better global comprehension of the text and, as a result, more incidental word acquisition. Laufer (1989, 1992) found evidence of a strong relationship between measures of learner vocabulary size and text comprehension.”

Beyond A Clockwork Orange: Acquiring Second Language Vocabulary through Reading

The Mayor of Casterbridge listening/reading experiment

"Unfortunately, the experimental support for incidental vocabulary acquisition through reading in a second language is weak and plagued by methodological flaws..."

"The first study claiming to show that second language vocabulary learning occurs incidentally through reading is a well known experiment by Saragi, Nation and Meister (1978). They tested native speakers of English who had read Anthony Burgess's A Clockwork Orange on their understanding of many of the Russian-based slang words that occur in the novel. They found that the subjects were able to correctly identify the meanings of most these nadsat words on a surprise multiple-choice test , especially the frequently occurring ones. But it seems strange to equate the circumstances of this study with second language learning. Here, native speakers of English used contexts which they must have fully understood to infer, for example, that droog meant friend; but making such connections is probably much harder for readers in a foreign language for whom many words in the context may be unknown or only partially known.

The mean number of words subjects acquired in the experiment was 68.4, amounting to about three quarters of the 90 words tested. But replications of this study with second language learners have not managed to reproduce these impressive results (see Table 1 below). For instance, Pitts, White and Krashen (1989) report a mean score of just two nadsat words correctly identified after subjects read A Clockwork Orange for an hour and took a test on 30 items. Other studies using a Clockwork methodology (Day, Omura & Hiramatsu 1991, Hulstijn 1992) report similar gains of just one, two or three words. Dupuy and Krashen (1993) report a larger gain of almost seven words, but this higher than usual result may have little to do with reading since their experiment also involved viewing a video..."

"The (Mayor of Casterbridge) novel is one of a series of simplified classics published by Nelson for learners of English who know approximately 2000 basewords.” …”21,232 words of the simplified Mayor of Casterbridge text: subjects followed along in their books while the entire text was read aloud in class by the teacher... The remaining 34 (students) appeared to be absorbed by the story of secret love, dissolution and remorse, and tears were shed for the mayor when he met his lonely death at the end...The knowledge gain of five of the 23 means that about 22 per cent of the words that could have been learned were learned; in other words, there was an average pick-up rate of about one new word in every five...To examine the relationship between the number of times a word appeared in The Mayor of Casterbridge and the extent to which that word was learned through reading, each of the 45 words in the experiment was assigned a frequency rating and a learning gain score. Frequency ratings, which were determined by the computer analysis discussed above, ranged from 2 to 17 occurrences... Generally, the text frequency data suggest that sizable learning gains can be expected to occur consistently for items that are repeated eight times or more. With fewer than eight repetitions, growth is much less predictable and the role of other factors becomes more apparent...

"Laufer (1982, 1989) claims that readers need a sight recognition of at least 95 percent of the words in a text for it to be comprehensible enough for meanings of unknown words to be inferred.”

"As far as implications for vocabulary learning are concerned, the experiment makes a stronger case for incidental acquisition than was made in the earlier Clockwork Orange replication studies. Subjects who read a full-length book recognized the meanings of new words at a higher rate than in previous studies with shorter texts, and built associations between new words as well... ‘Cobb (1997) found that encountering new words in multiple contexts resulted in a deeper, more transferrable knowledge of words than the usual strategy of studying short definitions.

"...But even though it may be possible to develop better resources for incidental learning, the study suggests that extensive reading is not a very effective way for learners who have a mean vocabulary size of around 3000 words to expand their lexicons. After completing the whole 21,000-word book, the subjects in the experiment managed to recognize meanings of an average of only five new words and to make new associations between just three. Also, learning was never fully guaranteed; even with items that occurred eight times or more, gains averaged around 50 percent. In other words, after reading an entire novel and encountering a word many times, only half of the learners who did not already know the word were able to recognize a correct definition in a multiple choice format. In brief, the experiment indicates that teachers of low intermediate learners of English can expect vocabulary growth from reading a simplified novel to be small and far from universal… In the last two decades, it has often been assumed that incidental acquisition was a sufficient strategy to take care of learner's lexical needs, to the point that explicit vocabulary instruction effectively disappeared from many coursebooks and vocabulary acquisition became "a neglected aspect of language learning" (Meara 1980:221). The present study suggests that the the power of incidental acquisition may have been overestimated. The findings support Meara's (1988) argument that since reading in a second language takes a great deal of time, few learners are able to read in sufficient volume to make it the vocabulary enriching experience it has proved to be for first language learners. Nagy, Herman and Anderson (1985) propose that for children learning English as their first language, school reading can account for the acquisition of thousands of new words each year. Even though the incidental pick-up rate was found to be low, large gains occur, they argue, because children encounter millions of words annually. But this is hardly applicable to beginning second language learners; for the subjects of this study, encountering one million words would entail reading fifty graded readers the size of The Mayor of Casterbridge - a worthy but unattainable goal for most learners at this level.”
“The results of this study point to several things. Firstly, the data support the notion that words can be learned incidentally from context. However, these data suggest that few new words appear to be learned from this type of reading, and half of those that are learned are soon lost....Assuming an optimistic scenario in which reading fifty novels per year was possible, at the rate of five words per novel established in this study, annual gain would amount to only 250 words. At this rate, even if yearly gains increased marginally with increased vocabulary size, it would take many years to acquire incidentally the 5,000 words most frequent word families of English, the figure which has been proposed as the minimum knowledge base needed for learners of English to be able to infer the meanings of new words they encounter in normal, unsimplified texts (Hirsh & Nation 1992, Laufer 1989)... Since most learners have a limited amount of time to devote to second language acquisition, vocabulary growth needs to proceed more rapidly. For learners at the level of the subjects in this experiment, it seems likely that an efficient way to reach the point of lexical independence is through explicit and systematic instruction that focuses on high-frequency vocabulary, a recommendation made repeatedly by Nation (1990). That is not to say that low intermediate learners should never read, but that teaching decisions should be based on an adequate account of what they can gain from their reading. Through reading extensively, they will probably enrich their knowledge of the words they already know, increase lexical access speeds, build network linkages between words, and more, but as this study has shown, only a few new words will be acquired. Therefore, it seems clear that in the early stages of their second language acquisition, learners should direct a considerable portion of their energies to using intentional strategies to learn high frequency vocabulary, in preparation for the day when they will know enough words and can read in enough volume for more substantial incidental benefits to accrue.”

http://www.er.uqam.ca/nobel/r21270/cv/Casterbridge.html

Incidental vocabulary acquisition from reading, reading-while-listening, and
listening to stories

"The results showed that new words could be learned incidentally in all 3 modes, but that most words were not learned. Items occurring more frequently in the text were more likely to be learned and were more resistant to decay. The data demonstrated that, on average, when subjects were tested by unprompted recall, the meaning of only 1 of the 28 items met in either of the reading modes and the meaning of none of the items met in the listening-only mode, would be retained after 3 months...

...The subjects, it seems, displayed a critical lack of familiarity with spoken English. As they listened to the story, they had to pay constant attention to a stream of speech whose speed they could not control. Because they were incapable of processing the phonological information as fast as the stream of speech, they may have failed to recognize many of the spoken forms of words that they already knew in their written forms."

http://nflrc.hawaii.edu/rfl/October2008/brown/brown.pdf

“In the long run, most words in both first and second languages are probably learned incidentally, through extensive reading and listening (Nagy, Herman, & Anderson, 1985). Several recent studies have confirmed that incidental L2 vocabulary learning through reading does occur (Chun & Plass 1996; Day, Omura, & Hiramatsu, 1991; Hulstijn, Hollander & Greidanus, 1996; Knight, 1994; Zimmerman, 1997). Although most research concentrates on reading, extensive listening can also increase vocabulary learning (Elley, 1989). Nagy, Herman, & Anderson (1985) concluded that (for native speakers of English) learning vocabulary from context is a gradual process, estimating that, given a single exposure to an unfamiliar word, there was about a 10% chance of learning its meaning from context. Likewise, L2 learners can be expected to require many exposures to a word in context before understanding its meaning...The incidental learning of vocabulary may eventually account for a majority of advanced learners' vocabulary; however, intentional learning through instruction also significantly contributes to vocabulary development (Nation, 1990; Paribakht & Wesche, 1996; Zimmerman, 1997). Explicit instruction is particularly essential for beginning students whose lack of vocabulary limits their reading ability. Coady (1997b) calls this the beginner's paradox. He wonders how beginners can "learn enough words to learn vocabulary through extensive reading when they do not know enough words to read well" (p. 229). His solution is to have students supplement their extensive reading with study of the 3,000 most frequent words until the words' form and meaning become automatically recognized (i.e., "sight vocabulary"). The first stage in teaching these 3,000 words commonly begins with word-pairs in which an L2 word is matched with an L1 translation... Translation has a necessary and useful role for L2 learning, but it can hinder learners' progress if it is used to the exclusion of L2-based techniques. Prince (1996) found that both "advanced" and "weaker" learners could recall more newly learned words using L1 translations than using L2 context. However, "weaker" learners were less able to transfer knowledge learned from translation into an L2 context. Prince claims that weaker learners require more time when using an L2 context as they have less developed L2 networks and are slower to use syntactic information... “Understanding of a word acquired from meeting it in context in extensive reading is ‘fragile knowledge’, and may not be internalized longterm if there are no further encounters with it; but it is still useful...Vocabulary lists can be an effective way to quickly learn word-pair translations (Nation, 1990). However, it is more effective to use vocabulary cards, because learners can control the order in which they study the words (Atkinson, 1972). Also, additional information can easily be added to the cards. When teaching unfamiliar vocabulary, teachers need to consider the following:

1. Learners need to do more than just see the form (Channell, 1988). They need to hear the pronunciation and practice saying the word aloud as well (Ellis & Beaton, 1993; Fay and Cutler, 1977; Siebert, 1927). The syllable structure and stress pattern of the word are important because they are two ways in which words are stored in memory (Fay and Cutler, 1977).
2. Start by learning semantically unrelated words. Also avoid learning words with similar forms (Nation, 1990) and closely related meanings (Higa, 1963; Tinkham, 1993) at the same time... Likewise, words with similar, opposite, or closely associated (e.g., types of fruit, family members) meanings may interfere with one another if they are studied at the same time.
3. It is more effective to study words regularly over several short sessions than to study them for one or two longer sessions. As most forgetting occurs immediately after initial exposure to the word (Pimsleur, 1967), repetition and review should take place almost immediately after studying a word for the first time.
4. Study 5-7 words at a time, dividing larger numbers of words into smaller groups.
5. Use activities like the keyword technique to promote deeper mental processing and better retention (Craik and Lockhart, 1972). Associating a visual image with a word helps learners remember the word. “

“Provide opportunities for developing fluency with known vocabulary.
Fluency building activities recycle already known words in familiar grammatical and organizational patterns so that students can focus on recognizing or using words without hesitation. “

http://www.jalt-publications.org/tlt/files/98/jan/hunt.html

WHAT DOES FREQUENCY HAVE TO DO WITH GRAMMAR

“Another reason, as Larson-Freeman (2002) and Ellis (2002b) point out, is that if second language learning were simply a matter of acquiring the most frequently occurring patters of target language (TL), then English language learners (ELLs) would be proficient in their uses of the definite and indefinite articles, the most frequently occurring free morphemes in English. This, of course, is not the case. It is clear that the frequency of input is not the only factor involved in learning a second language; however, we believe it plays a significant role. Ultimately, we hope to show that high-frequency constructions provide more exemplars for L2 learners to make generalizations than low-frequency constructions and that this directly relates to the number and kind of L2 learner errors.”

“ELLs will tend to produce more errors with low frequency constructions…”

http://w3.coh.arizona.edu/AWP/AWP14/AWP14%5BSchwartz%5D.pdf

"The results show that words can be learned incidentally but that most of the words were not learned. More frequent words were more likely to be learned and were more resistant to decay. The data suggest that, on average, the meaning of only one of the 25 items will be remembered after three months, and the meaning of none of the items that were met fewer than eight times will be remembered three months later. The data thus suggest that very little new vocabulary is retained from reading one graded reader, and that a massive amount of graded reading is needed to build new vocabulary...

...This suggests that it is far more difficult to pick up words from listening-only than from either the reading-only or reading-while-listening
modes. There was, however, no significant difference between reading-only and reading-whilelistening modes.

…This suggests that meanings are lost faster than other the types of word knowledge tested here.”

Number of meetings needed to learn a word

“As we saw in the introduction, previous estimates of the number of times it takes to learn a word from reading varied considerably. It is clear from this research that it is very difficult to pin a number on this age-old question. It seems much more complex than a simple single figure. From the results of this experiment, it seems that to have a 50% chance of recognizing a word form again three months later, learners have to meet the word at least eight times. Similar results could be said for prompted recognition. However, for unprompted form-meaning recognition (i.e., word learning) there is only a 10% to 15% chance that the word's meaning will be remembered after three months even if it was met more than 18 times. If the word was met fewer than 5 times, the chance is next to zero. This is rather disappointing because it suggests that we do not learn a lot of new words from our reading even with a 96% coverage rate. There are several reasons why this might be so. Firstly, the learners are presumably focused on comprehending and enjoying the story rather than on the words themselves. The words were not made explicit by bolding or highlighting the words in any way, as is the case in natural reading. Because of this, the learners are not being forced to notice them and their awareness of the words is not being raised. Some recent research has suggested the noticing of a form is an essential step in word learning (Schmidt, 1990)... Thirdly, the reason for low vocabulary rate retention may have simply been that there were too few chances to learn the words. As we have seen, it takes much more than one meeting of a word to learn it from reading. Moreover, even words met more than fifteen times in the text still have only a 40% change of being learned. This seems to suggest that it would take well over 20 or even 30 meetings for most of those words to be learned.”

http://nflrc.hawaii.edu/rfl/October2003/waring/waring.html


“A number of studies have shown that second language learners acquire vocabulary through reading, but only relatively small amounts. However, most of these studies used only short texts, measured only the acquisition of meaning, and did not credit partial learning of words.”

“The results showed that knowledge of 65% of the target words was enhanced in some way, for a pickup rate of about 1 of every 1.5 words tested. Spelling was strongly enhanced, even from a small number of exposures. Meaning and grammatical knowledge were also enhanced, but not to the same extent. Overall, the study indicates that more vocabulary acquisition is possible from extensive reading than previous studies have suggested.”

“There is no frequency point where meaning acquisition is assured, but by about 10+ exposures, there does seem to be a discernable rise in the learning rate. However, even after 20+ exposures, the meaning of some words eluded G, echoing Grabe and Stoller's (1997) point that some words simply seem hard to learn.”

“As a whole, the results are consistent with those of Schmitt (1998), who found that it is possible for L2 learners to have other kinds of word knowledge without having acquired knowledge of the word's meaning.”

“...the role of frequency of occurrence in the texts in the enhancement of the three types of word knowledge... As mentioned before, it seems that spelling knowledge can be gained with even a few exposures. Meaning does not seem to be as affected by frequency as much as one might expect, with 2-19 text occurrences yielding uptake rates ranging between 16-36% when we take the nouns and verbs together. Only at the extremes of frequency do we see a noticeable effect. Single encounters produced hardly any learning of meaning at all (3.4%), while it took 20+ occurrences to lead to a noticeable increase in uptake rates (60%). Only in the case of grammar (when articles and prepositions are considered together) was there a relatively steady increase of learning along the frequency scale. Overall, only when words were seen twenty or more times was there a good chance of all three word knowledge facets being enhanced.”

http://nflrc.hawaii.edu/rfl/April2006/pigada/pigada.html

“Chun and Plass' (1996) study of American university students learning German found that unfamiliar words were most efficiently learned when both pictures and text were available for students. This was more effective than text alone or combining text and video, possibly because learners can control the length of time spent viewing the pictures.”

http://www.jalt-publications.org/tlt/files/98/jan/hunt.html


Beyond raw frequency: Incidental vocabulary acquisition in extensive reading:

“However, words of lower frequency were better learned than words of higher frequency when the meanings of the lower frequency words were crucial for meaning comprehension.”

"...a richer sense of a word is learned through contextualized input. Furthermore, the incidental acquirer not only acquires word meanings but also increases his or her chances to get a feel for collocations and colligations that are not easily learned by learners of English as a foreign language (Bahns & Eldaw, 1993); therefore, learning can be facilitated by repeated exposure to words that go together (cf. Lewis, 1993; Nattinger & DeCarrico, 1992, for the importance of learning lexical phrases)...

“It does not seem feasible to define a number of exposures that is sufficient for
successful acquisition, such as at least 10 exposures (Saragi et al., 1978) or 5–16 exposures (Nation, 1990). As Henriksen (1999, p. 314) pointed out, word acquisition seems to be able to range “over continua of lexical knowledge” from partial recognition knowledge to productive use ability, depending on how many and what kinds of exposures are needed for successful acquisition. The observation that some words that do not appear frequently, but are nevertheless acquired and retained, apparently because they are salient and significant to a story, is highly interesting. We suggest that the rate of incidental vocabulary learning is not simply related to the raw frequency of specific words in the language. We further propose that learning is a consequence of noticing and the conscious learning of words that are important in the narrative. (Schmidt, 2001).“

http://nflrc.hawaii.edu/rfl/October2008/kweon/kweon.pdf

corpora comparison by frequency

The Brown University Standard Corpus of Present-Day American English (Brown Corpus) was compiled by Henry Kucera and W. Nelson Francis at Brown University as a general corpus in 1961. The corpus contains 1,014,312 words sampled from 15 text categories: press (politics, sports culture, financial – 44 texts) editorial (letters to the editor etc). theatre and book reviews, religious texts, skills and hobbies, “popular lore” (48) Biography, Memoirs (75); government documents (30 texts); learned (natural science, medicine math, humanities, technology -80 texts) fiction – general (29 texts); Mystery and Detective Fiction (24 texts); Adventure and Western (29 texts); Romance and Love Story (29 texts); humor (9 texts). The Brown Corpus is made up of 500 texts of about 2000 words each. The first American Heritage Dictionary (1969) was based on the Brown Corpus. This was the first dictionary to be compiled using evidence gleaned from corpus linguistics.

"The" constitutes nearly 7% of the Brown Corpus. About half of the total vocabulary of about 50,000 words are words that occur only once in the corpus.

NCFWD - a corpus of nineteenth-century fiction written between 1830 and 1870 (approximately 2.2 million words)

The Dickens Corpus – some 4.6 million running words

NCFWD and Dickens corpus data taken from: Investigating Dickens’ style by
Masahiro Hori.

SUBTLEXUS compiled by Brysbaert & New on the basis of American subtitles (51 million words in total). A corpus of 8,388 films and television episodes with a total of 51 million running words (16.1M from television series, and 14.3M from films before 1990, and 20.6M from films after 1990).
USA films from 1900-1990 (2046 files)
USA films from 1990-2007 (3218 files)
USA television series (4575 files)

There are 4,554 examples of gentleman in the Dickens Corpus (4.6 million words) 825 in the NCFWD (2.2 MILLION WORDS), 2,777 examples in the entire Cobuild (200,000,000 words) and 2,135 in SUBTLEXUS. Per million:

Dickens: 968
NCFWD 375
Cobuild 13.9
SUBTLEXUS: 42

Dickens Oliver Twist: 332
Thackeray's Vanity Fair: 269
Jane Austen’s Emma 36
Bronte sisters
39 in Jane Eyre
13 in Wuthering Heights
22 in Agnes Grey

19th>20th century sign o’ the times

DICKENS, NCFWD, BROWN and SUBTLEXUS compared (Frequency per million words)
Brown: frequency rank number in parentheses

Man
Dickens 2037
NCFWD 1587
Brown 1210(no 81)
SUBTLEXUS: 1099

Old
Dickens 1973
NCFWD 1335
Brown: 660 (no. 140)
SUBTLEXUS: 609

Hand:
Dickens 1289
NCFWD 871
Brown: 431
SUBTLEXUS: 280

Head
Dickens 1212
NCFWD 616
Brown: 404 (no 201)
SUBTLEXUS: 371

Face
Dickens 1075
NCFWD 765
Brown 371 (no 245)
SUBTLEXUS: 289

Eyes
Dickens: 985
NCFWD: 816
Brown : 401 (no 214)
SUBTLEXUS: 221

Dear
Dickens 1284
NCFWD 790
BROWN 54 (no 2040)
SUBTLEXUS: 223

LIFE
Dickens 711
NCFWD: 854
BROWN 715 (no 127)
SUBTLEXUS: 797

Room
Dickens 954
NCFWD 981
BROWN: 384 (no 232)
SUBTLEXUS: 440

LADY
Dickens 834
NCFWD 1284
BROWN: 80 (no 1328)
SUBTLEXUS: 217

Another
Dickens 829
NCFWD 566
BROWN 684 (no 133)
SUBTLEXUS: 509

Night
Dickens 1079
Ncfwd: 649
BROWN 411 (no 209)
SUBTLEXUS: 866

Door
Dickens 986
Ncfwd 614
BROWN 312 (no 295)
SUBTLEXUS: 292

Boy
Dickens 563
NCFWD: 333
BROWN 242 (no 384)
SUBTLEXUS: 530

Manner
dickens 547
ncfwd 285
BROWN 124 (no 831)
SUBTLEXUS: 12

Child
Dickens 538
Ncfwd 338
BROWN 213 (no 435)
SUBTLEXUS: 158

Seemed
Dickens 535
Ncfwd 569
BROWN 332 (no 274)
SUBTLEXUS: 54

Yet
Dickens 590
Ncfwd: 864
BROWN 419 (no 202)
SUBTLEXUS: 342

Let
DICKENS 656
NCFWD 726
BROWN: 384 (no 231)
SUBTLEXUS: 2,419

DONE
DICKENS: 656
NCFWD 597
BROWN 320 (no 283)
SUBTLEXUS: 485

Half
Dickens 618
Ncfwd 580
Brown 275 (no 337)
SUBTLEXUS: 199

People
Dickens 592
Ncfwd 668
Brown 847 (no 106)
SUBTLEXUS: 1103

Love
Dickens 420
Ncfwd 775
Brown 232 (no 397)
SUBTLEXUs: 1,115

Only
Dickens 978
Ncfwd 1502
Brown 1747 (no 62)
SUBTLEXUS: 1084

Returned
Dickens: 846
NCFWD 264
Brown: 115 (return: 180)
SUBTLEXUS: 25 (return: 92)

Replied
Dickens: 823
NCFWD: 299
Brown: 57 (reply: 42)
SUBTLEXUS: 1 (reply: 5)

Slowly
Dickens: 178
NCFWD: 117
Brown 115 (no.900) slow: 60 (no.1817)
SUBTLEXUS: 25 slow: 76

Softly
Dickens: 101
NCFWD: 36
Brown: 31 (no. 3425)Soft: 62
SUBTLEXUS: 5 Soft: 1126

Easily
Dickens: 100
NCFWD: 79
Brown 106 (no. 981) Easy: 125
SUBTLEXUS: 23 Easy: 266

Gradually
Dickens: 94
NCFWD: 49
Brown: 51 (no. 2125)

Quickly
Dickens: 92
NCFWD: 70
Brown: 89 (no.1169) Quick: 68
SUBTLEXUS: 57 Quick: 109

Hastily
Dickens: 87
NCFWD: 45
Brown: n/a not in the top 5,000 (less than 19)
SUBTLEXUS: 1 (haste: 2)

Gently
Dickens: 83
NCFWD: 59
Brown: 31 (no.3441) Gentle: 27
SUBTLEXUS: 9 Gentle: 17

Quietly
Dickens: 78
NCFWD: 85
Brown: 48 (no.2250) Quiet: 76
SUBTLEXUS: 12 Quiet: 117

Carefully
Dickens: 65
NCFWD: 56
Brown: 87 (no.1213) Careful: 62 care: 162
SUBTLEXUS: 24 Careful: 109 Care: 485

Heartily
Dickens: 54
NCFWD: 26
Brown: not in the top 5,000
SUBTLEXUS: 1

Steadily
Dickens: 47
NCFWD: 19
Brown: 22 (no 4499) Steady: 41
SUBTLEXUS:: 1 Steady: 23

Frequently
Dickens: 42
NCFWD: 52
Brown: 91 (no.1146) Frequent: 34
SUBTLEXUS: 3 Frequent: 2

Thoughtfully
Dickens: 39
NCFWD: 5
Brown: not in the top 5,000; neither is thoughtful (less than 19)
SUBTLEXUS: 1 Thoughtful: 8

Eagerly
Dickens: 37
NCFWD: 49
not in the top 5,000 Eager: 27 (no. 3772)
SUBTLEXUS: 1 Eager: 7

Freely
Dickens: 35
NCFWD: 24
Brown: 22 (no 4476) Free: 260 (no.358)
SUBTLEXUS: 4 Free: 178

Happily
Dickens: 32
NCFWD: 27
Brown: 20 (no 4836) Happy: 98 (no1069).
SUBTLEXUS: 10 Happy: 333

Cheerfully
Dickens: 32
NCFWD: 18
not in the top 5,000 neither is cheerful
SUBTLEXUS: 1 Cheerful: 4

Sharply
Dickens: 31
NCFWD: 25
Brown: 38 (no.2827) Sharp: 72
SUBTLEXUS: 1 Sharp: 24

Silently
Dickens: 30
NCFWD: 30
Brown: not in the top 5,000 Silent: 49 (no. 2229)

Seriously
Dickens: 27
NCFWD: 45
Brown: 46 (no.2368) Serious: 116 (no.883)

Angrily
Dickens: 26
NCFWD: 12
Brown: not in the top 5,000. Angry: 45 (no.2430)
SUBTLEXUS: 0.4 Angry: 59

Sternly
Dickens: 26
NCFWD: 12
Brown not in the top 5,000 Stern: 23 (no.4295)
SUBTLEXUS: 0.1 Stern 6

Timidly
Dickens: 26
NCFWD: 19
Brown not in the top 5,000 (neither is “timid”)
SUBTLEXUS: 0.1 Timid: 2


SUBTLEXUS VS Brown

This 7,979 vs 5,146
Now 3202 vs 1314
Be 5746 vs 6376
Was 5654 vs 9815
Been 1737 vs 2473
In 9,773 vs 21,345
Out 3865 vs 2096
Me 9,242 vs 1183
My 6763 vs 1319
Mine 251 vs 59
Can 5,247 vs 1,772
Could 1629 vs 1599
Should 1062 vs 888
Will 2124 vs 2244
Would 1768 vs 2715
There 4348 vs 2725
But 4,418 vs 4381
By 1340 vs 5307
He 7,637 vs 9,542
Him 3484 vs 2619
So 4244 vs 1985
Go 3793 vs. 626
Goes 217 vs 89
Going 2123 vs 399
Went 411 vs 507
Gone 297 vs 195
Like 3,999 vs 1290
Likes 76 vs 20
Liked 79 vs 58
How 3056 vs 836
If 3541 vs 2199
Just 4,749 vs 872
Get 4583 vs 749
gets: 223 vs 66
Got 3306 vs 482
Gotten 54 vs n/a -less than 19
Had 1676 vs 5,131
Come 3141 vs 630
comes 229 vs 137
came 464 vs 622
Coming 527 vs 174
They 4102 vs 3619
See 2557 vs 772
saw 403 vs. 352;
seen: 385 vs 279
Time 1959 vs 1601
Let 2419 vs 384
Did 2341 vs 1044
From 2039 vs 4370
Want 2759 vs 329
Wants 307 vs 71
Wanted 502 vs 226
Think 2691 vs 433
thinks 103 vs 23
Thought 809 vs 516
thinking 281 vs 145
Take 1891 vs 611
Took 342 vs 426
Taken 281 vs 139
Look 1947 vs 399
looks: 311 vs 78
looked 121 vs 361
Some 1727 vs 1617
Then 1490 vs 1377
Why 2248 vs 404
Where 1830 vs 938
Too 1372 vs 833
More 1299 vs 2216
Down 1490 vs 895
Yes 1997 vs 144
Tell 1724 vs less than 19
Little 1446 vs 831
Thing 1088 vs 333
Mean 1244 vs 199
Said 1109 vs 1961
Sure 1100 vs 264
First 840 vs 1361
Put 829 vs 437
Please 1101 vs 62
Mexico 31 vs 19
Wildlife 2 vs 19
victims 23 vs 19
Father 555 vs 183
Mother 480 vs 216
English 74 vs 195
hasn't 91 vs 20
Tuesday 24 vs 59
January 7 vs 53
Halloween 13 vs n/a
Keith 0 vs 21
Economical 0.33 vs 22
Arrested 35 vs 19
Run 350 vs 217
Court 101 vs 230
Office 2O4 vs 255
Planet 39 vs. 21
Planets 4 vs 22
Political 22 vs 258
Theoretical 2 vs 21
sixty 5 vs 21
Troops 19.3 vs 53
College 85 vs 267