Reading Academic English: Carrying Learners Across the Lexical Threshold

WWW Pre-publication. Submitted January 1999 as chapter in John Flowerdew and Andrew Peacock (Eds.), The English for Academic Purposes Curriculum, for Cambridge University Press.

Tom Cobb
Département de Linguistique
Université du Québec à Montréal

Marlise Horst
TESL Centre
Concordia University
Montreal, Quebec, Canada

The ESP reading problem

With the growth of English as the lingua franca of work and study, many non-English speakers find themselves needing to attain some level of proficiency in English in order to function in jobs or courses. However, they may have limited time to devote to language learning, and little interest in knowing English outside the work or study context. Responding to these circumstances, English for Specific Purposes (ESP) curriculum designers have attempted to reduce the time frame of learning through domain targeting. They attempt to identify and teach the lexis, syntax, functions, and discourse patterns most commonly used in a domain (for chemistry students, test tubes, passive voice, clarification requests, and lab reports). This approach has given waiters, tour guides and airline pilots enough English to function in their domains after relatively short periods in the classroom. But it runs into complications when the specific purpose is to read extended texts in a professional or academic domain.

It now seems clear that the cross-domain generalities of English (pronoun system, verb tenses, basic vocabulary, etc.) can be introduced and practiced within a subset of the language. Simple reading tasks such as understanding signs and instructions can be undertaken knowing only the English used in a particular job or profession. But does this hold true for reading longer texts? Consider the position of the learner who knows the grammar of English and the technical terms of a domain: text analysis shows that these terms are typically rather few (Flowerdew 1993a), roughly 5% of tokens (Nation, 1990). Function or grammar words account for about 40% more, so the density of unknown words for this learner amounts to 55%, about one word in two. Is knowing half the words an adequate basis for reading comprehension?

Reading as Guessing

Until recently, it was widely assumed that L2 reading could proceed from a minimal knowledge base, at least in the sense of linguistic knowledge. Early post-behaviourist analyses of L1 reading (e.g., Goodman's, 1967, "guessing game" theory) emphasized the naturalness of reading, and assigned a major role to inference and prediction drawing on general world knowledge. Decoding, vocabulary size, syntactic distinctions, or other aspects of linguistic knowledge previously emphasized were now assumed to play a minor role. When asked if his analysis applied equally to L2 reading, Goodman (1973) pronounced the theory "universal." It was imported into ESL methodologies by leaders in the field, including Coady (1979), and quickly assumed a dominant position at the expense of other analyses of the reading process (Weir & Urquhart, 1998). The implications of guessing theory for ESP reading seemed clear: If guessing is a major part of reading, then knowing half the words in a text should be enough to get the guessing under way.

But there were two problems with guessing theory. First, there was little evidence for it and strong evidence against it (Gleitman & Rozin, 1973; Stanovich, 1980). Second, the theory was probably harmless enough in L1, where children, whatever their teachers' theories, made their guesses from a well developed linguistic knowledge base. But if L2 readers were not taught vocabulary and syntax, then they really were guessing as they read, from whatever world knowledge they happened to possess. If L2 readers turned out not to be very good at the guessing game, reading theorists knew why: poor L2 readers had never learned to read naturally in their L1. Hence the remedy for poor L2 reading was to instruct learners in the high-level skills they should have developed reading their first languages.

Course designers in the 1980s apparently believed that many L2 readers needed such remediation, since course books were mainly devoted to skills like guessing words in context and finding main ideas in paragraphs (Bernhardt, 1990), with no evidence that learners were unable to exercise these skills adequately in L1 (Bernhardt, 1991). Or that problems exercising them in L2 were not caused by lack of L2 vocabulary and syntax. The skills approach to L2 reading relied on dubious assumptions, but without detailed comparisons of L1 and L2 reading outcomes, it was difficult to refute. The guessing approach to L2 reading, its sudden arrival and lengthy retreat, are reviewed by Grabe (1991).

Rethinking the problem

In the context of the increasing demand world-wide for English academic reading skills, and the common experience of English instructors world-wide that "most students fail to read adequately," Alderson (1984: 1) articulated the central question about L2 reading as follows: Is weak L2 reading a language problem or a reading problem? If it is a reading problem, then poor readers are poor readers in any language and there is little an English course can do for them other than remediate their general reading skills. If it is a language problem, then it stems from missing L2 knowledge related to reading (lexis, syntax, discourse patterning) that an L2 reading course could help students acquire and learn to use.

Alderson concluded that while it is possible for weak L2 reading to stem from weak L1 reading, odds are greater it stems from inadequate L2 knowledge, and furthermore that strong L1 reading is no guarantee of strong L2 reading. Evidence for this included the fact that good L1 readers do not invariably become good L2 readers even after lengthy exposure to L2 texts (Clarke, 1979; Cooper, 1984). Therefore, some form of focused instruction seems necessary to turn good L1 readers into good L2 readers, something more than exposure to L2 texts aided by inferences from world knowledge. But what form of instruction? Alderson argued, setting the research agenda for the following decade, that the next step was to specify kinds and amounts of knowledge that constitute a "threshold of linguistic competence" (Cummins, 1979) or "critical mass" (Grabe, 1986) that allow the transfer of L1 skills to L2.

The question for ESP learners and their course designers was then as follows: What kinds of linguistic knowledge underpin L2 reading, and how much of it must learners acquire to read academic texts in their disciplines? (Or, perhaps how little, given that English is a means to an end.)

Locating thresholds

Alderson's (1984) call for thresholds attracted the interest of ESL researchers. Empirical confirmation continued to pile up showing that reading skills do not magically transfer from one language to another, particularly where two writing systems are involved (Koda, 1988). Alderson's hunch that weak L2 reading is normally a language problem was confirmed in several multiple regression analyses, including one by Bossers (1991) in which L2 knowledge level predicted L2 reading level four times better than did L1 reading level at all but advanced levels. The nature of the linguistic knowledge affecting reading skill has been specified as knowledge-based ability rather than knowledge per se, for example the large, well connected lexicon underpinning rapid lexical access (Segalowitz, Poulsen, & Komoda, 1991).

Progress has also been made on defining pedagogically usable thresholds. Cummins (1979) believed that research would identify distinct thresholds for different kinds of language knowledge and language tasks. For the task of academic reading, the main knowledge type of interest is lexical. Word knowledge is the key ingredient in successful reading in both L1 (Freebody & Anderson, 1981) and L2 (Cooper, 1984), contributing more to L2 academic reading success than other kinds of linguistic knowledge including syntax (Saville-Troike, 1984).

The search for a lexical threshold to reading proceeded in two complementary directions, comparing comprehension measures to either the absolute number of words learners know or else to the proportion of tokens they know in a particular text. Looking at absolute knowledge, Laufer (1992) compared Israeli university students' recognition vocabulary sizes to their reading comprehension scores, and found that minimal comprehension correlated reliably with knowing the 3000 most frequent words of English. Looking at proportion of words known in a particular text, Hirsh and Nation (1992) determined that an unsimplified text can be comprehended when 95% of tokens are known, or there is approximately one unknown word per two printed lines. But is knowing the 3000 most frequent words the same as knowing 95% of the words in particular texts? Clearly, one frequency based threshold would be more pedagogically useful than a separate threshold for every text.

Nation and colleagues have worked long and hard to show that 3000 words account for 95% of tokens in most texts, provided they are the right 3000 words. Starting from the interesting discovery that the 2000 most frequent word families of English reliably account for roughly 80% of tokens in a text in any domain (Carroll, Davies & Richman, 1971), Nation (1990) argues that reading in English depends on learners knowing these 2000 words (like ache, admire, accuse, and advise), which are most accessible in the form of West's (1953) General Service List (GSL). But the GSL itself hardly constitutes the entire lexical threshold, since 80% of words known (two words unknown per printed line) is far from 95% (one per two lines). However, attempts to close this gap by simply moving down the frequency list become unuseful shortly after the 2000 mark (the next 1000 words merely add another 3-4%, and so on).

In a research project begun long before the call for thresholds, independent research groups in several developing countries noticed that the GSL alone did not empower students with adequate reading ability, and so searched for a specification of the additional words that would prepare learners to read academic texts in English. Xue and Nation (1984) put this research together, producing the University Word List (UWL), an 800-family list of words found in academic texts across disciplines, which when added to the GSL reliably account for 90% of tokens. Coverage at this level has been confirmed by analysis of text corpora in a variety of disciplines using the computer program VocabProfile (Hwang & Nation, 1994).

Sutarsyah, Nation and Kennedy (1994) argue that the GSL and UWL are the minimum lexical knowledge base for reading in any academic domain. These two lists constitute a general English for Academic Purposes (EAP) vocabulary syllabus that takes a learner to the outer edge of reading in a specific domain. At that point, an ESP vocabulary course would take over, whose syllabus could be identified by extracting GSL and UWL terms from a corpus of domain texts, leaving a residue of terms which characterize the domain. In the case of economics, this residue amounts to 450 word families, accounting for a further 5% of tokens. In other words, systematic study of these three lists--containing just over 3000 words--brings the learner to the 95% mark. For the remaining 5% of low frequency terms that inevitably crop up in any domain, often carrying crucial information (Kucera, 1982), learners have crossed "the threshold where they can start to learn from context" (Nation, 1997:11) as they do in reading their first languages.

So here is a strong hypothesis about the location of a lexical threshold, the point in L2 acquisition where learners will access their L1 reading skills. However, locating a threshold and carrying learners across it are two different things. The discussion to this point is about a syllabus not a method, about what to teach not how. As Sutarsyah et al. conclude, "there is a need for courses that focus on the vocabulary of these two important lists" (p. 48).

Can 3000 words be taught?

Designing a course to introduce learners to 3000 words represents a challenge, and few courses attempt to do it. If the normal pace of classroom acquisition is about 550 words per year (Milton & Meara, 1995), then learning 3000 words is a labour of some five years. Course books focus almost exclusively on the first 1000 words of English (Meara, 1993), and techniques to accelerate the pace of lexical growth have usually proven unsuccessful. For example, looking up lists of words in a dictionary has been shown to produce inert knowledge that "does not increase comprehension of text containing the instructed words" (Nagy, 1997: 73). Useful lexical knowledge which transfers to the comprehension of novel texts seems to depend on meeting new words in rich natural contexts (Mezynski, 1983). However, given the haphazard nature of acquisition from context (Haynes, 1993; Laufer & Sim, 1985), this type of learning requires a lengthy period of time (Nagy, Herman & Anderson, 1985), well beyond the time normally allotted for an EAP course.

Thus L2 vocabulary acquisition is beset by a logical problem, as noted by researchers over the years. Carroll (1964) expressed a wish that some form of vocabulary instruction could be devised that would mimic the effects of natural contextual learning except more efficiently. Krashen (1989: 450) echoed the sentiment 25 years later: "vocabulary teaching methods that attempt to do what reading does--give the student a complete knowledge of the word--are not efficient, and those that are efficient result in superficial knowledge". In Pinker's (1989) terms, it is a learnability paradox: you learn words by meeting them in natural contexts, but to make sense of the contexts you need words.

The next section will revisit these same issues closer to the ground.

ESP in Oman

One of the main test-beds for ESP concepts in 1975-1985 was the Arab Middle East, particularly the Gulf states. This was so for a number of reasons (Swales, 1984). In brief, sudden economic development fueled by oil and the need to quickly produce an educated middle class through training in English led to a number of interesting experiments in domain-targeted curriculum design. There was an ambiance of "no time to lose" that fit well with notions of accelerated learning.

One promising experiment in ESP unfolded at Sultan Qaboos University (SQU) in Oman (Adams Smith, 1984; Cobb, 1996; Flowerdew, 1993b; Holes, 1985) where the approach was to integrate English language and content instruction. Coming on-stream a decade after other countries in the area, Oman was in a position to profit from educational experiments conducted elsewhere. A curriculum development team with experience in similar projects in the region was assembled (Beretta & Davies, 1985; Scott & Scott, 1984; Stevens & Millmore, 1987) a year before the University opened in 1986. When classes started, English language instructors taught first year students in classic ESP fashion; they attended and followed up physics lectures, prepared students for the language of their chemistry experiments, and attended biology labs to help cut up frogs and write up results. Advanced courses were developed for engineering and medicine. If an ESP approach to academic study was ever going to succeed, it should have succeeded in Oman.

But language learning at SQU in its first decade was hardly a great success. One problem was that students arrived from secondary school much less proficient in English than course planners had expected. Tested on entry with the IELTS, most students were several bands below entry level for a British university (Flowerdew, 1993b). On the bright side, this was a real test of "ESP from the beginning" that would demonstrate whether it was possible to achieve competence in a foreign language through domain targeting, by-passing the lengthy route through general English.

The Omani students' main weakness was in reading English texts. Between 1987 and 1990, the scientific texts used in both language and content courses were continually simplified and shortened so the students would have some hope of comprehending them (Flowerdew, 1993b). The main source of the problem appeared to be the students' lack of vocabulary. Arden-Close (1993: 251) observed chemistry lectures extensively in this period, and reported that the professors saw the students' language problems "as almost exclusively vocabulary problems". Flowerdew (1992) observed that content-course lecturers spent an inordinate amount of class time explaining the meanings of scientific words.

However, it was not only scientific words the students did not know. Arden-Close's (1993) research indicates where the main vocabulary problems lay. He observed numerous interchanges in which science lecturers unversed in language issues attempted to communicate with students. In one discussion, a chemistry lecturer backs up further and further in a search for common ground. Attempting to get across the idea of a "carbon fluoride bond," he tries a succession of analogies: teflon pans, a tug of war, an assembly line-to no avail. Apparently, pan, war, and line (all in the GSL) were simply unknown. In another discussion, a biology lecturer discussing "hybridization" seeks an everyday example in dogs, switches to the local case of goats, realizes he does not know the names of the different breeds in Oman, and finally resorts to mixing colours. But alas, "a lot of them don't know their colours yet" (p. 258). There was apparently no common lexical ground to retreat to.

Support for this anecdotal evidence was provided by testing the recognition vocabulary sizes of first-year students with Nation's (1990) Levels Test starting in 1993. After a year of study, students typically had recognition knowledge of about 900 words at the 2000-word (GSL) level, fewer than half. Given that 80% of the words in any text derive from this level, these students faced more than one unknown word in two.

Students at SQU may have been starting from an unusually weak position, yet there is other evidence from ESP ventures in the developing world that students' main vocabulary problems are at the general not technical level. English scientific terms are often already known in the first language, as concepts awaiting new labels or even loan-words. If unknown, they are often inferable from diagrams, glossed and emphasised in lectures, and have stable meanings from one context to another. None of this is true of the high frequency words that scaffold the technical terms. Problems with general or sub-technical terms have been identified in Malaysia (Cooper, 1984), Bahrain (Robinson, 1989), Papua New Guinea (Marshall & Gilmour, 1993), and Israel (Cohen, Glasman, Rosenbaum-Cohen, Ferrara, & Fine, 1988). In the Israeli study, subjects translated both technical and general terms from a text, with 85% success for technical terms but only 32% for sub-technical (from the second thousand words of the GSL, perceive, pattern, efficient, or the UWL, assertion, variable, diversity.)

After six years of ESP-from-the-beginning, the University decided to shift policy and give the students a year of general English before starting their academic subjects. The first step was to find a suitable placement test to establish their level of general English. It was now recognized that an instrument measuring very elementary levels was required. The choice fell on the Preliminary Test of English (PET), Cambridge University's (1990) most basic proficiency test, which confirmed that the students' level was indeed very low (mainly Bands 1 and 2). It was decided also to use the PET as an exit test, which the students were required to pass at a fairly high level (Band 4) before beginning their academic subjects. To meet this objective the students would study general English for one year using such coursebooks as Headway (Soars & Soars, 1991), with supplementary vocabulary work in A Way With Words (Redman & Ellis, 1991)--general English coursebooks rather than EAP texts. But after a year, many students still had difficulty with the PET, especially its reading section.

The students' needs had been correctly identified at a more basic level, but the courses chosen to address them made no claim to meet the urgent needs of academic learners. Typically, the students would get through one coursebook in a three-volume set designed for a more leisurely pace of learning. In terms of vocabulary, the words introduced were few and not always the most useful; no coverage of the full vocabulary of general English was attempted. However, this vocabulary can be targeted no less than that of a domain, by focusing on the GSL and UWL. And the contexts in which these words are presented need not be the shopping and dating experiences of Headway; words like pan, war, and line can be presented in subject-area contexts (as the chemistry lecturer attempted to do with his carbon fluoride bonds).

The vocabulary of general English can be targeted, but delivering it is another matter. To review, 3000 words are far more than are normally learned by students in a year, and far more than can be easily contextualized by course writers, especially since stable learning requires meeting each word at least five times (as determined by Saragi, Nation, & Meister, 1978), or eight times (according to an on-site study by Horst, Cobb & Meara, 1998). Abbreviating the process by learning word lists and translation equivalents results in static knowledge unlikely to increase comprehension of novel texts. The remainder of this chapter describes the development of an instructional strategy to tackle these problems, which is to insert an EAP vocabulary course into the general course described above. Its objective is to introduce academic learners to large numbers of general English words in authentic contexts, so that they remember the words and can interpret their meanings in novel contexts. The course is a corpus-based lexical tutoring system: corpus analysis had helped frame the problem of ESP reading, so it seemed plausible that corpus-based tutoring might help solve it.

Design and test of a lexical tutor

Students were predisposed to participate in an intensive vocabulary acquisition experiment. They knew the test they needed to pass was based on a word list, the 2387-word Cambridge Lexicon (Hindmarsh, 1980), roughly equivalent to the GSL. They were aware of how few words they knew, assessing vocabulary weakness to be their main problem with English. Checks were undertaken to be sure that devoting class time to experimental vocabulary acquisition would actually address the students' immediate needs. Four versions of the PET were checked against the Hindmarsh list with VocabProfile, and found to exploit the list quite fully but contained few off-list items. Also checked against the list were the students' general English course books, none of which included even half the words on this list, even in courses three volumes long (see Cobb, 1996, for details; Meara, 1993, for a similar finding).

System design

One way to ensure that students systematically encounter hundreds of words in context and at an accelerated pace is to present the words in a computer concordance. A concordance program linked to a suitable corpus can present large numbers of words to students in ways that escape the learning paradox outlined above. First, a concordance program is essentially a word list in context, and so might blend the efficiency of list targeting with the richness of multicontextual learning. Second, a concordance makes the five-encounters requirement simple to verify. Third, a concordance might overcome the unreliability of contextual learning, since with a number of contexts available learners can search for ones that make sense, doing in minutes what takes years in natural exposure.

On other hand, a concordance is difficult to read; its texts are reduced in size and coherence; there is little opportunity for the normal flow of natural reading; the on-screen text-to-space ratio breaks established standards of text design; and there is no guarantee that the transfer of learning that has been established for meeting multiple contexts on paper (Mezynski, 1983) is replicable on a computer screen.

Several attempts to interest SQU students in learner-oriented concordances such as Microconcord (Oxford, 1993) were met with indifference. However, two studies had shown these students getting useful information from concordances, one a gap-filling activity where they worked with concordance print-outs (Stevens, 1991), the other a text-manipulation activity where they accessed concordances on-line (Cobb, 1997). It was concluded that the lack of interest in concordancing derived from difficult texts and unwieldy interfaces while the medium itself had benefits. Thus work was begun on the design of a corpus and interface for learners with an elementary level of linguistic and computational knowledge.

Corpus development involved scanning and collecting texts from the students' language course, particularly those dealing with business, and eliminating lexical items that did not fall within the PET list (involving some rewriting). Then, the PET list was checked against the corpus to ensure that every word appeared at least five times (involving additional rewriting). The result was a 50,000-word collection of texts, many of which the students were already familiar with. Corpus-building procedures are further discussed in Cobb (1996).

Interface development followed directly from observations of students' efforts to use Microconcord. Keyboard entry clearly posed an obstacle, most of the students being poor typists and spellers. The concordance output did not make different kinds of textual information visually salient, and the interface did not make it clear what learners were supposed to do in a concordancing session--which words to investigate, or what to do with lexical information they had assembled. It became clear that a concordance interface for the early stages of vocabulary acquisition should do the following:

These desiderata were realized as follows: The concordance was mouse driven, eliminating keyboard problems. Lists of to-be-learned words were linked to the interface, so that clicking on them drove the concordance searches, and different types of information appeared in different fonts and colours. Motivation for looking at several contexts was piggybacked to the desire for hard copy: students' collected and sent words to a linked database for print-out at the end of concordancing sessions, with at least one example sentence from the corpus for each word (a stipulation coded into the program). The assumption was that learners would look through concordance lines for a comprehensible example sentence for their print-outs, rather than selecting one that made no sense. The software platform chosen for this project was Apple's Hypertalk, the search engine at its centre was developed by Zimmerman (1988), and interfaces were designed by Cobb (1996). The program was called PET·2000.

Using PET·2000

The students' reading course was expanded to include the vocabulary module. The entire 2,387-word list was installed in the concordance interface, and the students were assigned roughly 200 words from the list to be learned every week (e.g., all the words starting with "C"). The list was available only on computer, and one lab-hour per week was set aside for this work for 12 weeks. About 20 of each week's words were randomly selected for weekly classroom quizzes. The students' task was to look through the 200 new words each week, decide which ones they did not know, send these to their databases with one or more examples from the corpus, add definitions if desired in English or Arabic, and print up the session's work as an installment in a growing personal glossary. Students added an average of 100 words (SD=14.5) per session. The fiction was that learners were lexicographers constructing dictionaries, following the constructivist principle of modeling learners on experts (Cobb, in press). The learner-lexicographers worked individually or in groups. They reported normally checking several contexts before selecting one for their database, and the user tracking routine confirmed that fewer than half the examples selected were simply the first one listed in the concordance.

Testing the tutor

The learning effects of PET·2000 were assessed at a point when the students had been learning English for about five months. Four intact groups at two proficiency levels were randomly selected for testing (participants had already been assigned to groups randomly by the institution). The first level consisted of two groups (n=17 each) of lower intermediate students (with an average vocabulary size of 1200 words). The second consisted of two groups (n=12 each) of intermediate students (with an average of 1500 words). At each level, groups were randomly assigned to experimental and control conditions. Experimental group participants used the concordance software as described above to search through contexts. Control group participants used a modified version of the software to send items with no examples to their databases for subsequent annotation with L1 translations from (off-line) dictionaries. All groups spent roughly the same amount of time on their PET·2000 work, 45 minutes per student per week for 12 weeks, with low and non-systematic variance according to the program's time records.


Participants were pre- and post-tested using a two-part measure. The first part was the Levels Test at the 2000 level, a test of basic meaning recognition, which asks students to select brief definitions for 18 randomly selected GSL words (also appearing on the PET list). The second was a gapped passage, which asked examinees to fit 15 supplied words from the same list to gaps in two novel GSL-constrained texts of about 250 words each. The two parts were intended to assess two kinds of lexical knowledge: definitional knowledge of decontextualized words, and the more complex knowledge required to integrate a learned word into an extended novel context. At pre-test, class means were statistically equal on both measures within each level.

This test was given two weeks after the beginning of the training period, in March 1995, and then again two weeks after the end. The pretest was given after training had already begun to allow a technology habituation period; since learning rate and volume were issues, it was important to measure learning only when the procedure was functioning smoothly. Thus any learning gains measured by the test were produced in a period of only two months. No feedback was given after either testing session, and there was no indication that participants remembered the test in any detail when they encountered it a second time. The hypothesis was that all students would make substantial but similar gains on the definitional measure, but only concordancers would make significant gains on the novel text measure.

The study followed a repeated measures factorial design, the factors being 2 (Treatments) x 2 (Skill Dimensions) x 2 (Levels) at two points in time (pre- and post-training). Pre-test and post-test scores for each treatment were entered into a separate repeated measures analysis of variance (ANOVA), with test scores as dependent variables, and level and treatment as independent variables. If the experimental prediction was borne out, there should be a significant time-treatment interaction on the novel text measure but not on the definitional measure.


On the definitional measure, there was a significant main effect for time, F(1,54)=6.74, p<.05, showing mean posttest scores (75.91) were higher than pretest (69.53) by about 6.4%. However, this effect was unrelated to treatment. On the novel text measure, there was a similar but larger main effect for time, F(1,54)=19.48, p<.001, showing mean posttest scores (74.03) were higher than pretest (64.84) by almost 10%. On this measure there was also a significant time-treatment interaction, F(1,54)=6.24, p<.05, showing differential contributions to the gain by treatment condition. Table 1 shows the components of this finding, indicating significant differences as established by a post hoc Tukey HSD test of multiple comparisons, a (1,56)=10.69, p<.001.

Table 1
Mean pre-post and gain scores by task, condition, and level
Definitions Task 
Novel Text Task 



  Lower Intermediate 
 65.24 71.94 6.7* 65.53 74.4 8.87* 60.24 62.76 2.52 60.65 74.12 13.47**
 SD 15.38 13.41   12.40 13.81   19.33 17.08   17.80 15.00  

Upper Intermediate 
 M  75.17 79.58 4.41* 75.67 79.92 4.25*  71.42 77.08 5.66 70.75 86.83 16.08**
 SD 11.18 12.23   10.80 12.00 12.14  10.66 12.35  8.90  

*p<.05  **p<.001

At the lower intermediate level, both groups made significant gains on the Levels Test, the control group gaining about 7% (representing 140 new words) and the experimental group 9% (180 words), not significantly different from each other. Nonetheless, a gain of 9% on the 2000 level represents a gain of 180 words in a period of two months, or 1080 if continued for a year. This is roughly double the 550-word baseline found by Milton and Meara (1995), and reinforces the longstanding claim of both Meara (1980) and Nation (1982) that learners in language courses may often be lexically underchallenged. Of course, learning vast numbers of words quickly would be of little use if the knowledge were not transferable to a novel context, so it is encouraging that concordancers gained 13.47% on this measure as well.

The upper intermediate students made smaller gains on the Levels Test, probably because with 75% of words known at pre-test there were few opportunities for further definitional learning at the 2000 level. However, there were still opportunities for other types of learning. Over the two-month training period, concordancers gained 16% on the transfer measure.

Complementing these quantitative measures were observational and anecdotal impressions from both students and content area instructors. In the following year, when PET·2000 graduates had started working on the UWL, a content area instructor wrote a letter to the Language Centre commending staff for whatever they were doing that for the first time enabled students to read their economics texts.

Further work

With corpus-based tutoring shown to be an effective means of accelerating EAP vocabulary growth, the next step is to build the UWL into the interface, attached to a second or expanded corpus, and following that, to incorporate domain-specific wordlists and corpora into the system. Work is under way to develop a purpose-built UWL corpus using the principles discussed above, i.e. eliminating any terms beyond the GSL and UWL, and assuring that all 800 terms of the UWL are represented at least five times.

The long term objective is to produce a set of wordlists and corpora, possibly with Internet delivery, that will allow a student anywhere to locate and cross the lexical threshold into L2 reading in a profession or subject with the smallest delay .


Adams Smith, D. (1984). Planning a university language centre in Oman: Problems & proposals. In Swales, J. & Mustafa, H., (Eds.), English for specific purposes in the Arab world. Birmingham: University of Aston Language Studies Unit.

Alderson, J.C. (1984). Reading in a foreign language: A reading problem or a language problem? In J.A. Alderson & A.H. Urquhart (Eds.), Reading in a foreign language (pp. 1-27). London: Longman.

Beretta, A., & Davies, A. (1985). Evaluation of the Bangalore project. English Language Teaching Journal, 39, 121-127.

Bernhardt, E.B. (1990). A content analysis of "methods texts" for the teaching of second language reading. Paper presented at the National Reading conference, Miami, FL.

Bernhardt, E.B. (1991) A Psycholinguistic Perspective on Second Language Literacy. In J.H. Hulstijn & J.F. Matter (Eds.), Reading in Two Languages, AILA Review 8, 31-44.

Bossers, B. (1991). On thresholds, ceilings and short-circuits: The relation between L1 reading, L2 reading, and L2 knowledge. In J.H. Hulstijn & J.F. Matter (Eds.), Reading in Two Languages, AILA Review 8, 45-60.

Cambridge University. (1990). Preliminary English Test: Vocabulary list. Local Examinations Syndicate: International examinations.

Carroll, J.B. (1964). Words, meanings, & concepts. Harvard Educational Review, 34, 178-202.

Carroll, J.B., Davies, P., & Richman, H. (1971). Word frequency book. New York: Houghton Mifflin.

Clarke, M.A. (1979). Reading in Spanish and English: Evidence from adult ESL students. Language Learning, 29 (1), 121-150.

Coady, J. (1979). A psycholinguistic model of the ESL reader. In R. Mackay, B. Barkman, & R.R. Jordan (Eds.), Reading in a second language (pp. 5-12). Rowley, MA: Newbury House.

Cobb, T. (In press.). Applied constructivism: A test for the learner-as-scientist. Educational Technology Research & Development.

Cobb, T. (1997). Is there any measurable learning from hands-on concordancing? System 25, 301-315.

Cobb, T. (1996). From concord to lexicon: Development and test of a corpus-based lexical tutor. Unpublished doctoral dissertation. Concordia University, Montreal. <>

Cohen, A., Glasman, H., Rosenbaum-Cohen, P.R., Ferrara, J., & Fine, J. (1988). Reading English for specialized purposes: Discourse analysis and the use of student informants. In P.L. Carrell, J. Devine, & D. Eskey (Eds.), Interactive approaches to second language reading (pp. 152-167). Cambridge University Press.

Cooper, M. (1984). Linguistic competence of practised and unpractised non-native speakers of English. In J.A. Alderson & A.H. Urquhart (Eds.), Reading in a foreign language (pp. 122-138). London: Longman.

Cummins, J. (1979a). Cognitive/academic language proficiency, linguistic interdependence, the optimal age question, & some other matters. Working Papers on Bilingualism, 18, 197-205.

Flowerdew, J. (1992). Definitions in science lectures. Applied Linguistics, 13, 202-21.

Flowerdew, J. (1993a). Concordancing as a tool in course design. System, 21, 231-244.

Flowerdew, J. (1993b). Content-based language instruction in a tertiary setting. English for Specific Purposes, 12, 121-138.

Freebody, P. & Anderson, R.C. (1981). Effects of vocabulary difficulty, text cohesion, and schema availability on reading comprehension. Technical Report No. 225. Urbana, IL: University of Illinois Center for the Study of Reading.

Gleitman, L.R., & Rozin, P. (1973). Teaching reading by use of a syllabary. Reading Research Quarterly, 8, 447-483.

Goodman, K.S. (1967). Reading: A psycholinguistic guessing game. Journal of the Reading Specialist, 6, 126-135.

Goodman, K.S. (1973). Psycholinguistic universals in the reading process. In F. Smith (Ed.), Psycholinguistics and reading (pp. 21-29). New York: Holt, Rinehart, & Winston.

Grabe, W. (1986). The transition from theory to practice in teaching reading. In Dubin, F., Eskey, D.E., & Grabe, W. (Eds.) Teaching second language reading for academic purposes (pp. 25-48). Reading, MA: Addison-Wesley.

Grabe, W. (1991). Current developments in second language reading research. TESOL Quarterly, 25, 375-406.

Haynes, M. (1993). Patterns and perils of guessing in second language reading. In Huckin, T., Haynes, M., & Coady, J. (Eds.), Second language reading and vocabulary learning (pp. 46-62). Norwood, NJ: Ablex.

Hindmarsh, R. (1980). Cambridge English Lexicon. London: Cambridge University Press.

Hirsh, D., & Nation, P. (1992). What vocabulary size is needed to read unsimplified texts for pleasure? Reading in a Foreign Language, 8, 689-696.

Horst, M., Cobb, T., & Meara, P. (1998). Beyond A Clockwork Orange: Acquiring second language vocabulary through reading. Reading in a Foreign Language, 11 (2).

Hwang, K., & Nation, P. (1994). VocabProfile: Vocabulary analysis software. English Language Institute, Victoria University of Wellington, New Zealand.

Koda, K. (1988). Cognitive processes in second-language reading: Transfer of L1 reading skills and strategies. Second Language Research, 4 , 133-156.

Krashen, S.D. (1989). We acquire vocabulary and spelling by reading: Additional evidence for the input hypothesis. The Modern Language Journal, 73, 440-464.

Kucera, H. (1982). The mathematics of language. In The American Heritage Dictionary. Second College Edition. Boston: Houghton Mifflin.

Laufer, B. (1992). How much lexis is necessary for reading comprehension? In P.J. Arnaud & H. Béjoint (Eds.), Vocabulary and applied linguistics (pp. 126-132). London: MacMillan.

Laufer, B., & Sim, D.D. (1985). Taking the easy way out: Non-use and misuse of clues in EFL reading. English Teaching Forum, April, 7-10.

Marshall, S., & Gilmour, M. (1993). Lexical knowledge and reading comprehension in Papua New Guinea. English for Special Purposes Journal, 13, 69-81.

Meara, P. (1980). Vocabulary acquisition: A neglected aspect of language learning. Language Teaching and Linguistics: Abstracts, 13, 221-246.

Meara, P. (1993). Tintin and the world service: A look at lexical environments. IATEFL: Annual Conference Report, 32-37.

Mezynski, K. (1983). Issues concerning the acquisition of knowledge: Effects of vocabulary training on reading comprehension. Review of Educational Research, 53, 253-279.

Milton, J., & Meara, P. (1995). How periods abroad affect vocabulary growth in a foreign language. ITL Review of Applied Linguistics, 107/108, 17-34.

Nagy, W. (1997). On the role of context in first- and second-language vocabulary learning. In Schmitt, N., & McCarthy, M. (Eds.) Vocabulary: Description, acquisition, pedagogy (pp. 64-83). New York: Cambridge University Press.

Nagy, W.E., Herman, P.A., & Anderson, R.C. (1985). Learning words from context. Reading Research Quarterly, 20, 233-253.

Nation, P. (1982). Beginning to learn foreign vocabulary: A review of the research. RELC Journal, 13 (1), pp. 14-36.

Nation, P. (1990). Teaching and learning vocabulary. New York: Newbury House.

Nation, P. (1997). Vocabulary size, text coverage & word lists. In Schmitt, N., & McCarthy, M. (Eds.) Vocabulary: Description, acquisition, pedagogy (pp. 6-19). New York: Cambridge University Press.

Oxford English Software. (1993). MicroConcord corpus collections. Oxford: Oxford University Press.

Pinker, S. (1989). Learnability & cognition: The acquisition of argument structure. Cambridge, MA: MIT Press.

Redman, S., & Ellis, R. (1991). A Way with words: Vocabulary development activities for learners of English, Vols. 1-4. Cambridge University Press.

Robinson, P.J. (1989). A rich view of lexical competence. ELT Journal, 43 (45), 274-282.

Saragi, T., Nation, I.S.P., & Meister, G.F. (1978). Vocabulary learning and reading. System, 6, 72-78.

Saville-Troike, M. (1984). What really matters in second language learning for academic achievement? TESOL Quarterly, 18, 199-219.

Scott, H. & Scott, J. (1984). ESP & Rubic's cube: Three dimensions in course design & materials writing. In Swales, J. & Mustafa, H., (Eds.), English for specific purposes in the Arab world. Birmingham: University of Aston Language Studies Unit.

Segalowitz, N., Poulsen, C., & Komoda, M. (1991). Lower level components of reading skill in higher level bilinguals: Implications for reading instruction. In J.H. Hulstijn & J.F. Matter (Eds.), Reading in Two Languages, AILA Review 8, 15-30.

Soars, J., & Soars, L. (1991). Headway (Vols. 1, 2 & 3). London: Oxford University Press.

Stanovich, K.E. (1980). Toward an interactive-compensatory model of individual differences in the development of reading fluency. Reading Research Quarterly, 16, 32-71.

Stevens, V. & Millmore, S. (1987). Text Tanglers. Stonybrook, NY: Research Design Associates. Computer program.

Stevens, V. (1991). Concordance-based vocabulary exercises: A viable alternative to gap-fillers. In T. Johns & P. King (Eds.) Classroom concordancing: English Language Research Journal, 4 (pp. 47-63). University of Birmingham: Centre for English Language Studies.

Sutarsyah, C., Nation, P., & Kennedy, G. (1994). How useful is EAP vocabulary for ESP? A corpus based case study. RELC Journal, 25 (2), 34-50.

Swales, J. (1984). A review of ESP in the Arab world 1977-1983: Trends, developments, and retrenchments. In Swales, J. & Mustafa, H., (Eds.), English for specific purposes in the Arab world (pp. 9-21). Birmingham: University of Aston Language Studies Unit.

Weir, C.J., & Urquhart, A.H. (1998) Reading in a second language: Process, product and practice. London: Longman.

West, M. (1953). A general service list of English words. London: Longman, Green & Co.

Xue, G. & Nation, P. (1984). A university word list. Language Learning and Communication, 32, 215-219.

Zimmerman, M. (1988). Texas indexer/browser, v. 0.27. Silver Spring, Maryland.