Can the rate of lexical acquisition from reading be increased? An experiment in reading French with a suite of on-line resources.



Tom Cobb,

Université du Québec à Montréal


Chris Greaves

Polytechnic University of Hong Kong


Marlise Horst

Concordia University


Please site as translated chapter from P. Raymond & C. Cornaire (2001), Regards sur la didactique des langues secondes. (pp. 133-153). Montréal: Éditions logique. [WWW Pre-publication; translation from French.]



The field of second language (L2) reading has always seemed a good opportunity to demonstrate the value of computers in language learning, especially since the Internet has shown its capacity to multiply the amount and variety of texts and contexts available to language learners and teachers. However, the usefulness of computers to the development of reading ability remains largely an argument in principle.


In pre-Internet days, computer programs for the development of reading ability followed mainly a skill-development model. The reading skills targeted included tracking pronoun reference, finding main ideas in paragraphs, inferring word meanings from context, and so on--the type of skills that could be coded in multiple-choice questions following a brief text. However, there was little evidence that such skills transferred positively to the reading of full length texts, as indeed there had not been when such exercises were done on paper. There was even occasional evidence that they transferred negatively (Oppenheimer, 1997). The reading research, whether L1 or L2, was rarely brought to bear on the development of such courseware, as may not be surprising given its often inconclusive nature. However, one reasonably clear finding in the research literature is that the development of reading ability depends on the learner logging some volume, whether pages or screens, which was not really possible under the isolated skills approach where there was "too little to read" (Reinking & Bridwell-Bowles, 1991).


But things have changed, at least in some ways, with the arrival of the Internet. "Too little to read" is no longer the problem it was. A standard web-based reading activity nowadays is for learners to search from site to site for some specific piece of information, such as, 'How old was Napoleon when he was exiled to Elba?' It is assumed this search will pass through numerous reading opportunities en route to the final disclosure. However, it is not clear that the type of reading needed for such activities, unless carefully provisioned, involves more than scanning at best, string-matching at worst. Some task reduction may be an inevitable part of reading on the web, given that the Internet is a vast repository of authentic text, most of it beyond the unaided reach of most second language learners. In other words, 'too little to read' has been replaced by 'too much to read,' but with a similar reduction in the quality of the text-learner interaction that can be expected.


In this chapter we aim to provide an alternative to these two approaches to computer-assisted reading. The model we propose is 'resource-assisted reading of extensive authentic documents' or R-READ. The idea is that learners will be able to Read Extended and Authentic Documents with comprehension if they are aided by a carefully chosen suite of helper Resources of a kind that are becoming increasingly available on the Internet or capable of Internet delivery. The hope is that R-READ will allow learners to log the volume they need to clear the threshold to independent reading in their L2. This account of R-READ is grounded in research findings, and it involves the development and testing of a preliminary hands-on implementation--but it begins in the personal experience of one of the authors.



1  A linguistc consultant


Almost 20 years ago, a monolingual Manitoba Anglophone had to pass a French translation test to finish a Master's degree. With weak high school French, Tom wondered where a quick reading ability in the other national language was going to come from. A bilingual friend proposed that anyone can cross the first and biggest hurdle into a second language by somehow managing to read a complete book in it, any decent book being a microcosm of the language as a whole, by looking up every word, parsing the syntax, and so on--or in this case by using her as an expert reading partner (decoder, explainer, pronouncer, hypothesis confirmer and denier). Together with this resource person Tom embarked on several readings of Voltaire's Candide, stopping for discussion and note-taking a good deal in the beginning but noticeably less within just a few hours. By the end of the first reading all the words seemed familiar, and by the end of five days and three readings he was ready for the translation test. On a subsequent visit to Paris, Tom reported finding that much of the ambient signage and advertising could be worked out on the basis of what had been learned. He had apparently crossed the threshold to independent learning.


However, not everyone is fortunate enough  to know a resource person willing and able at the moment they have the time and motivation to make their move for a second language. But if having such a resource peson is so effective for the learner, is there some way it would be possible to recreate a resource person in a tutorial computer program?  This idea was a fantasy until recently, but now the Internet has the bandwidth, distribution, and quality of resources including streaming audio to make such a program possible. In this chapter we describe such a Web-based reading expert, outline the theoretical basis for this approach in the reading literature, and then present a case study of a learner using the program.



2  An electronic linguistic consultant: De Maupassant by R-READ


The reading experience  provided for learners of French is a parallel version and further development of one that was developed previously for students learning English. For the English site, the Jack London novel Call of the Wildwas chosen, for the following reasons: it is a novel rich in vocabulary, with appeal to children as well as adults, has extensive but not excessive length, is out of copyright, is available on the Internet as a text file for download, and has several good cassette read-aloud renditions. The operation of the site is quite simple. Readers can read and/or listen to Call of the Wild in the sequence they choose. When reading, they can pause the sound recording and click on any word in the text that they are curious about, and a concordance is generated showing all the rest of the word's occurrences throughout the novel. From the concordance frame, readers can then click up a dictionary definition provided by Princeton University's Wordnet site. If readers wish to make a note about any word, they can type or cut and paste any of this information into a personal database which they can later retrieve and print or save to their hard disk. The site can be visited at An alternative version of this implementation may also be seen at, which also features adaptations of Alice in Wonderland and other works implemented using the same model.


The French parallel site is based on de Maupassant's Boule de Suif, and the interactions envisaged are identical. The construction of the site was straightforward in that many of the technical challenges had already been dealt with in constructing the English site. However, the French site presented other challenges. First, there are far fewer French novels in text file format ready for download on the Internet, and fewer still with a corresponding sound recording whether on-line or off (and none at all that could found in the case of French-Canadian authors). Second, there are far fewer on-line dictionaries to choose from than there are in English, and the French ones that exist tend to be both unsuitable for learners and difficult to access by command line from an outside site. The choice of Boule de Suif was somewhat of a compromise, in that the text while extensive but not excessively long (13,418 words) and very rich in vocabulary is neither particularly modern nor suitable for children (dealing with sexual predation and middle class hypocrisy). However, the choice of a de Maupassant story solved one major problem observed with Call of the Wild. This was that readers often clicked for further contexts for a word they were trying to work out the meaning of, only to find that the word occurred only once in the entire text and hence there were no other contexts. Indeed, research by Horst (2000) found that for texts of intermediate size (5,000 to 15,000 words), between  5 and 10 per cent of the lexis is one-off. Kucera (1982) determined that it was precisely these least used words in a text that often carry most meaning.  The solution to this problem was to generate the concordances from the entire de Maupassant oeuvre of more than 1 million words, which had fortuitously been made available by Thierry Selva, a Belgian colleague.  The concordance engine runs slightly more slowly with such a volume of text to search through, but to date there has been no  case of a word appearing only once in the entire corpus. Figure 1 gives screen shots of the Boule de Suif website, displaying the main interactions it proposes.



Figure 1
  The main interactions proposed by R-Read



The text is selected from the black sidebar on the left, with a dramatic recitation of the text if desired (either text or sound can be selected without the other, so for example the text could be heard before it was read). Any interesting word (such as lambeaux in the figure) can be clicked upon to produce a concordance in the lower window, which as mentioned is drawn from the entire corpus of de Maupassant texts. Also generated with the concordance is a link in the upper right corner to a bilingual learners' dicitionary, which will take the learner either to the exact target word or else to a list of words in the alphabetical vicinity of the target word. These interactions have produced the computer screen as it appears in Figure 1.


Three features of the proposed interaction may not be readily noticeable. First, a classic problem with click-on dictionaries is that the word clicked on is a plural or other variant which, when sent to the dictionary's search engine, returns a "Not Found." In a paper dictionary, the search for chats brings the reader into the vicinity of chat, and the problem does not occur. The expensive solution to this problem on-line is to lemmatize the search process, so that all the morphologies for each word family are grouped together. The cheap solution and the one adopted by Coffey, the creator of the dictionary accessed here, is to present instead of "Not Found" all the words in the vicinity of the search word (char, chat, chaud) on the assumption that learners will recognize one of them as the base form of the word they are looking for.  This is basically a simulation of what happens when looking up a word in a paper dictionary, where you can see the rest of the page. In Figure 1, the learner has clicked on lambeaux and been sent to a list including lambeau, which when clicked produces the requested information. Second, the keywords in the middle of the concordance lines are hypertext links, which when clicked expand the amount of context to roughly the size of a small paragraph. Third, the site links to a historical backgrounder on the Franco-Prussian War of 1870, the setting of the events in the story.


When learners have found a word and related information that they wish to keep, they can record it in a database that they access by clicking on "Lexique Utilisateur" in the top left corner. This database is the e-quivalent of those little notebooks that language learners love so much to write their new words in, except that this is neater, contains more and richer information, and occupies much less of their time (since it can be assembled from the text, the concordances, or the dictionary on a copy-paste basis). This database can be viewed for all learners' entries or for just those of the current user, and can be downloaded for assembly into a personal glossary or lexicon which can be further assembled, edited, and sorted using Excel on the learner's machine.  Figure 2 shows an example of the database. In this case "Tom" has requested just his own entries, but with "Tout Voir" he can see the entries of all users of the site.





Figure 2 Electronic vocabulary notebook


So here is a website where students who want to read an extended French text can do so, and in addition listen along, see all instances of interesting words or phrases ever composed by the same author, look up words in a learner's dictionary, and keep a record of interesting findings. But are, one might ask, all of these interesting options mainly amusements, or is there any sign that learning will be aided by them in any interesting way?


3  Research base


It often seems that the development of computerized language learning materials is dominated by either commercial software companies or CALL hobbyists, who both have the complexities of ever changing technologies to keep up with which may limit their interest in the additional complexities of language acquisition (LA) research. Still, over the years CALL and LA research have come together on occasion, usually where acquisition scholars have undertaken studies with media implications or opined on the media implications of their other findings. These convergences will be outlined briefly categorized according to the main interactions offered on the Boule de Suif site. The main research of interest concerns lexical acquisition and handling, which is uncontroversially the beginning reader's major hurdle.


3.1  Listen and Read


Stanovich (1986) is one of the foremost investigators of first language (L1) reading problems. In his well known paper on Matthew Effects (1986) he ventured some remarks on the type of CALL  programs he thought might aid learners having difficulty learning to read. He cited some rather old software (Draper & Moeller, 1971, which is almost certainly not in existence any longer) which had produced very strong learning effects simply by giving learners the opportunity to click on words in a reading text and hear them spoken. The idea was that in the L1 situation, many words are not recognized in writing that are in fact known in speech. This would normally be the case less often in L2, where new words other than very high frequency items are more likely to be met in text before speech. Still, L2 research supports a strong role for reading and listening along. Lightbown (1992 ) looked into the acquisition of English of young Francophone New Brunswickers who had not been provided with classroom instruction but instead had read and listened to cassettes of self selected materials at their own pace. The surprise finding was that learners (at least in the early stages of L2 acquisition) seem to gain as much from reading and listening as they do from being in a classroom. The irony of this finding is that it comes at a time when many schools are abandoning their facilities for listening to cassettes in favour of computer labs. Fortunately, new Internet technologies like streaming audio can make a listening lab out of any computer lab, and at the same time deliver additional advantages like on-line dictionaries.


3.2  Concordance


A major limitation on the instructional use of authentic texts is that learners apparently are much less able to infer the meanings of new words from context (Laufer & Sim, 1985; Haynes, 1983; Huckin, Haynes et Coady, 1991) than was once believed (e.g., by Smith, 1971; Goodman, 1976; Krashen, 1989, and their many followers). However, research by Cobb (1997; 1999 ) has shown that contextual inference can be substantially supported by multiplying the number of contexts available for a given word with the aid of a computer. The specific program which does this is called a concordance, which assembles all the contexts available for a given word or phrase throughout a text or corpus. The support for learning is thus: When several contexts are available, many of them will be opaque, but one or more of them is likely to have the mix of linguistic and semantic support that provides the learning conditions needed by a particular learner to build an initial stable representation for a new word. If learners can be persuaded to examine several contexts, they will make better inferences than if they merely examined one. In other words, we propose concordances as a means to computer-aided contextual inference. Another benefit to concordances is that beginners need to meet words in some frequency if they are to learn them (Zahar, Cobb, & Spada, in preparation), more frequently than is actually possible for to meet them without some artificial means of boosting the number of encounters.


3.3  Database


Another of Cobb's findings finding of (1997) is that the enormous time that learners are willing to spend writing down lexical entries while reading can be made more efficient through the judicious use of the computer, with look-up and write-down time redeployed to reading more and hence meeting more new words and old words more often. Further, it was found that this computerization could facilitate collaborative use of lexical look-ups, and that the prospect of having their entries seen by others encouraged learners to spend more time sifting through concordances for good examples of words.


3.4  Dictionary


While much research casts doubt on the value of dictionary work for beginning readers in an L2, there is no point in trying to prevent the use of dictionaries. One can merely try to encourage the use of decent dictionaries that learners can comprehend and yet which do not encourage them in the belief that all L1-L2 mappings are one to one (Bland, Noblitt, Armstrong & Gray's, 1990, "naïve lexical hypothesis").  We should also try to to ensure that concordance work precedes dictionary work, following the constructivist principle that learning is more about building generalized knowledge than about receiving it (Cobb, 1999). This sequence has recently received support from a study by Fraser (1999), who found that contextual inference combined with dictionary look-up supported more lexical acquisition than either alone, but also that the sequence of these strategies was important: attempted inference first, dictionary confirmation second, is the more effective sequence. To her finding we would add computer-aided contextual inference first, dictionary second.


3.5  Click-on Interface


An important study of resource based or "instructionally enhanced" reading has recently been published by Hulstijn, Hollander, et Greidanus (1996). One of the findings of this study is that many learners who are aided by lexical lookups while reading nevertheless do not take the time do so, presumably out of some form of laziness, but that they will do so if the lookup is made sufficiently easy. The click-on resources in the Boule de Suif website could hardly be more easy, and yet type-in resources are also available for more sophisticated searches or for testing hypotheses about French not directly stimulated by the immediate text.


3.6  Approach


The resource-based approach exemplified in the Boule de Suif site is one of three main approaches to treating the lexical demands of L2 reading. One approach is that outlined by Krashen (1989), Nagy (1997) followers to the effect that reading itself will teach learners all the words they need to know to be able to read. This approach has run into problems, as already mentioned. The approach at the other extreme is the direct pre-teaching of vocabulary learners will need in order to read particular types of texts successfully, for example by working their way through the wordlists that Nation and colleagues (Nation, 1990; Nation & Waring, 1997; Sutarsyah, Nation & Kennedy, 1994) have identified as comprising the vast majority of lexis in average texts. Between these two approaches and not really excluding either is vocabulary enhanced reading (Hulstijn, Holander, & Greidanus, 1996), where learners are left to make their own way through texts but it is assumed they will need support resources to do so successfully. R-READ, resource assisted reading of extended authentic documents, is intended to be a substantial test of this middle approach.



4  Pilot test of R-READ


4.1  Background and experimental method


To investigate the usefulness of concordancing with easy access to full concordances, dictionary definitions, and easy data storage, we asked a learner of French to use the experimental materials. The research question of this pilot case study was this:


How do the vocabulary learning results of reading with the online tools described above compare to the results of reading without these tools?


The baseline for comparison to "normal" unassisted reading comes from a series of case studies by Horst (2000). In one of these, she investigated the amounts of new vocabulary learners acquired through reading texts resembling Boule de Suif in that they were nineteenth-century literary classics. R, an adult intermediate learner of German, agreed not to consult a dictionary while he read a German novella. A few days later he rated his knowledge of 300 words that occurred in the story only once. When he rated his knowledge of these target words again a few days later, the difference was modest. After reading the 9500-word literary text (which took about three hours), R rated only five more words "definitely known" than he had on the pretest. Thus we can conclude that he had learned about two words per hour of unassisted reading.


Our test of the computerized lexical resources follows the same design that Horst used in this and a series of similar experiments. This time the participant was J, an adult intermediate learner of French. Six weeks before reading the R-Read version of Boule de Suif, J was pre-tested on 400 words that occurred only once in the text, assigning each word a knowledge rating according to the following scheme (Horst & Meara, 1999):


0 = I don't know what this word means

1 = I am not sure what this word means

2 = I think I know what this word means

3 = I definitely know what this word means


She assigned a 0 (“don't know”) rating to 180 words, so there was clearly ample opportunity for new learning through use of the experimental materials.


After a brief training session, J began reading Boule de Suif following the prescribed procedure of clicking on unknown words and looking at contexts provided by the concordance. In most cases, she requested dictionary definitions as well. Progress was marginally slower than normal or unassisted reading; it took her about six hours to complete the entire14,500-word text (compared to R’s three hours for 9,500 words).


4.2  Results


Three days after completing Boule de Suif, J rated her knowledge of the 400 target words. The number of words rated 3 (definitely known) amounted to 137, a 59-word increase over her pretest total of 78 words definitely known. Since J had spent about six hours using the program, we can conclude that about ten new words per hour had entered the “definitely known” category, considerably more R’s two words per hour.


A week later, J read Boule de Suif for a second time, this time using the sound option (the pace of the oral narrative proved to be too fast the first time around). Again, she rated her knowledge of the target words a few days after completing the story. Then, seven days later, there followed yet another round of reading, listening and testing. J spent about four hour using the materials in each of these later sessions; thus the total number of hours amounted to 14 (6 + 4 + 4). The numbers of words assigned to the various knowledge categories after each reading are shown below in Table 1.


Table 1

Word knowledge ratings before reading and after each of three readings



Posttest 1

Posttest 2

Posttest 3

0 (not known)





1 or 2 (unsure)





3 (known)






Table 1 shows that by the end of the experiment, J "definitely knew" 202 words, up 124 from her starting point of 78. The table also confirms growth in other ways: the figures show that the number of words rated 0 (not known) decreased substantially over the course of the experiment, and that many unknown words became partially known.


The unassisted reader of German, R, had also read his novella repeatedly, and this allows for convenient comparison to the results of J's three sessions with the experimental materials. The results for both learners after three rounds of reading are shown in Table Y. Their pretest starting points are very similar — both participants rated 45% of the targets unknown at the outset, and were also similar in their ratings of known words at the outset (27% of targets for J, 20% for R).  After three reading sessions, the advantage for the assisted process seems clear. The number of words rated “definitely known” has remained constant in the case of R, but has increased dramatically in the case of J. Although R eventually learned dozens of new words over the course of several additional readings, his initial progress was slow, and even after ten exposures his responses were less accurate than J's. At the end of the repeated readings (four repetitions in the case of J, ten in the case of R), both participants were asked to provide translation equivalents for items they had rated "definitely known." About 94% of J's responses were judged to be accurate, while R identified correct translation equivalents in only 77% of cases. The comparative data is summarized in Table 2.



Table 2

Percentage of targets in each category at outset and after three readings, assisted and unassisted


Results for R (unassisted)
n=300 words

Results for J (assisted)
n=400 words



3rd posttest


3rd posttest

0 (not known)





1 or 2 (unsure)





3 (known)









The comparison between R’s and J’s rates of vocabulary acquisition seems to confirm the usefulness of the R-READ approach, and to indicate a middle way in vocabulary growth through reading. Resource-based reading seems able to render irrelevant the choice between incidental acquisition and direct vocabulary instruction. The pace and accuracy of J’s acquisition is reminiscent of the best results of direct instruction (Nation, 1982), but with the enjoyability and possibly deeper learning that comes with meeting words in rich contexts.


Both readers started their readings with 45 per cent of target words unknown in their respective texts. Working from unadorned context, R managed to reduce his number of unknown words by only 7 per cent, while J reduced hers by 38 per cent. At the “definitely known” end, R did not manage to increase his holdings in this category at all with three readings, while J increased hers by 250 per cent. Further, on the translation post-test, a greater number of J’s answers were correct after three readings than R’s were after 10 readings.


None of this is surprising, in itself, except for that fact that time invested to achieve the greater learning was only marginally greater, thanks to the effective use of on-line tools.


Vocabulary acquisition from reading has always been a major problem in the development of literacy in a second language. Contexts are rich but unreliable, definitions are precise but incomprehensible, and the number of words to be acquired is daunting. Resource-assisted reading seems a promising approach to making vocabulary acquisition through reading possible and even efficient. And given the expected increase in the use of the World Wide Web in coming years, it may be even an approach that can reach many of the people who want and need it – providing an effective “linguistic consultant” for those not lucky enough to know one.







Bland, S., Noblitt, J., Armstrong, S, & Gray, G. (1990). The naive lexical hypothesis: Evidence from computer assisted learning. Modern Language Journal 74, 440-450.


Cobb, T. (1997). Is there any measurable learning from hands-on concordancing? System 25, 301-315.


Cobb, T. (1999). Applying constructivism: A test for the learner-as-scientist. Educational Technology Research and Development, 47 (3), 15-31.


Coffey, N. (2000). On-line French-English and English-French dictionary. Available at  .


Draper, A.G. & Moeller, G.H. (1971). We think with words (therefore, to improve thinking, teach vocabulary). Phi Delta Kappan 52, 482-484.


Fraser, C. (1999). Lexical processing strategy use and vocabulary learning through reading. Studies in Second Language Acquisition, 21 (2), 225-241.


Goodman, K.S. (1973). Psycholinguistic universals in the reading process. In F. Smith (Ed.), Psycholinguistics and reading (pp. 21-29). New York: Holt, Rinehart & Winston.


Haynes, M. (1983). Patterns and perils of guessing in second language reading. In J. Handscombe, R.A. Orem, & B.P. Taylor (Eds.), On TESOL '83: The question of control (pp. 163-176). Washngton DC: TESOL.


Horst, M. (2000). Text encounters of the frequent kind: Learning L2 vocabulary from reading. Unpublished PhD dissertation, University of Wales (UK), Swansea.


Horst, M., & Meara, P. (1999). Test of a model for predicting second language lexical growth through reading. Canadian Modern Language Review, 56 (2), 308-328.


Huckin, T., Haynes, M., & Coady, J. (1991). Second language reading and vocabulary learning. Norwood, NJ: Ablex.


Hulstijn, J. H., Hollander, M., & Greidanus, T. (1996.), Incidental vocabulary learning by advanced foreign language students: The influence of marginal glosses, dictionary use, and reoccurrence of unknown words. Modern Language Journal, 80, 327-339.   


Krashen, S. (1989). We acquire vocabulary and spelling by reading: Additional evidence for the input hypothesis. Modern Language Journal 73, 440-464.


Kucera, H. (1982). The mathematics of language. In The American Heritage Dictionary, Second College Edition. Boston: Houghton Mifflin.


Laufer, B. & Sim, D. (1985). Taking the easy way out: Non-use and misuse of clues in EFL reading. English Teaching Forum, April, 7-10.


Lightbown, P.M. (1992) Can they do it themselves: A comprehension-based ESL course for young children. In R. Courchêne, J. St-John, C. Thérien, & J. Glidden (Eds.), Comprehension-based language teaching: Current trends. Ottawa: University of Ottawa Press.


Nagy, W. (1997). On the role of context in first- and second-language vocabulary learning. In Schmitt, N. & McCarthy, M. (Eds.) Vocabulary: Description, acquisition, pedagogy (pp. 64-83). Cambridge: Cambridge University Press.


Nation, P. (1982). Beginning to learn foreign vocabulary: A review of the research. RELC Journal, 13 (1), pp. 14-36.


Nation, P. (1990). Teaching & Learning Vocabulary. Rowley, MA: Newbury House.


Nation, P., & Waring, R. (1997). Vocabulary size, text coverage, and word lists. In Schmitt, N. & McCarthy, M. (Eds.) Vocabulary: Description, acquisition, pedagogy (pp. 6-19). Cambridge: Cambridge University Press.


Oppenheimer, T. (1997). The computer delusion. Atlantic Monthly, July. [On-line]. Available:


Reinking, D., & Bridwell-Bowles, L. (1991). Computers in reading and writing. In R. Barr, M.L. Kamil, P. Rosenthal, & P.D. Pearson (Eds.), Handbook of reading research, Vol II (pp. 310-340). New York: Longmans.


Selva, T. [On-line]. Maupassant par les textes. Available:


Smith, F. (1971). Understanding reading: A psycholinguistic analysis of reading and learning to read. New York: Holt, Rinehart, & Winston.


Stanovich, K. (1986). Matthew effects in reading: Some consequences of individual differences in the acquisition of literacy. Reading Research Quarterly, 21 (4), 360-406.


Sutarsyah, C., Nation, P. & Kennedy, G. (1994). How useful is EAP vocabulary for ESP? A corpus based study. RELC Journal, 25 (2), 34-50.


Zahar, R., Cobb, T., & Spada, N. (In press). Conditions of second language vocabulary acquisition. (To appear in Canadian Modern Language Review, 2001).