The following study looks at what is involved in importing an external, international language test into a local educational institution in the Middle East area. It is argued that the problems often accompanying such an exercise are entirely predictable, if one looks closely at the task demands of the test, the position of the testees, and the learning resources available. Computational techniques will be proposed, first for analysing task demands and resources, and then for bridging the resource gap between learner and task.
Several Gulf universities have recently begun experimenting with external tests of English language proficiency. At Sultan Qaboos University, we have used Cambridge University's Preliminary English Test ('The PET') as a placement measure since September 1991. The PET is an integrated two-hour test at about lower intermediate level, and its object is to place a testee in one of four 'bands,' Band 1 being low and Band 4 high. Now plans are under way to use the PET as an exit measure as well in several colleges. The College of Commerce and Economics (CCE) has been running a trial of this entry-plus-exit idea since its inception in September 1993, and several other colleges may soon follow suit. At CCE, students' progress through their English training depends exclusively on their PET scores, and students can be expelled from the College, at least in principle, for slow or no progress. Band 4 within a year, three terms, regardless of entry point, is the official policy. Until Band 4 is cleared, a student must take intensive non-credit English courses.
This rather experimental use of the PET has produced high failure rates and related stresses. Some examples: Several students who joined the CCE program in September 1993 have still not cleared the Band 4 requirement in five terms. In the PET testing session of December 1994, only 35% of Band 3 students moved to Band 4. Less obvious but more worrying, of the roughly 60 CCE students who moved to Band 4 in the last year, only 23 - once again, about 35% - achieved a Band 4 level in the crucial reading component of the PET, the rest managing an overall Band 4 through their listening and writing scores.
These problems have several sources that anyone can identify. The PET forces our students to work harder than they are accustomed to. PET-related courses are all non-credit, at least at CCE, and after Bands 1 and 2 they compete for time with credit courses. The PET assumes a familiarity with British culture that some of our students do not have or wish to have. The PET is skills and comprehension based, whereas the students' previous language study has been memory based. And so on.
But in spite of the extent of both problems and anecdotal references to their origins, no one that I know of has attempted to work out what is known in educational technology as a 'task analysis' - a detailed comparison of students, tasks, and resources - for all or some part of an educational problem. As instructors, we have supposedly been helping students prepare for the PET, but actually with few specific ideas about what the test actually demands of them, whether or not they are in any position to do well in it, or how well, or in how much time, or following what sort of instruction.
In response to this lack of information, I have begun a task analysis of the PET in the CCE context.
At present my analysis focuses on only one task dimension, the vocabulary knowledge required by the PET. Using tools and concepts from computational linguistics, I have compared the vocabulary sizes of our students, the vocabulary coverage of our courses, and the vocabulary base of the PET. Initial indications suggest mismatches in all three directions. In a nutshell, we have been taking students with about 400 words of English, exposing them to about 350 more, and then measuring their accomplishment against a test drawing from 2,400 words. In other words, there is a discoverable and possibly treatable reason for our students' weak PET performabce.
First a disclaimer: the notions 'vocabulary knowledge' and 'vocabulary size' raise some well-known conundrums, such as 'What is a word?' (Goulden et al, 1990), and 'What is it to know a word?' (Nagy et al, 1985). But with disparities of the magnitude I have described, the fine points may not be the most immediate issue.
An analysis of the overall PET task could be usefully undertaken into any of several dimensions - vocabulary, syntax, discourse structure, culture knowledge, or possibly others. Vocabulary, however, seems a good place to start. First, vocabulary knowledge in any case overlaps a little with syntax knowledge and a lot with culture knowledge. Second, the biggest problem our students have is in poor reading comprehension, and reading is strongly correlated with vocabulary knowledge. Third, a vocabulary study is feasible given the information and resources available.
Weak reading comprehension, as against weak listening comprehension or weak writing ability, is the main source of difficulty for our students in the PET. While the causes of weak reading are clearly numerous, the single most robust finding in reading research historically is that the lion's share of comprehension variance is accounted for by variance in vocabulary knowledge. This finding may seem obvious; what makes it interesting is that the correlation of comprehension with vocabulary size is much higher than any of the seeming contenders, such as comprehension with culture knowledge, or comprehension with syntax knowledge. This finding is the main idea behind the recent 'lexical syllabus' movement in EFL. Anderson and Freebody (1981) provide an extensive summary of L1 work in this area; and a study by Doyle, Champagne and Segalowitz (1978) makes the case for vocabulary in a specifically L2 context. A disclaimer: none of these writers argue for the sufficiency of vocabulary in language comprehension, merely for its comparative necessity.
A vocabulary study is feasible because we have access to the explicit vocabulary requirement of the PET, the advertised vocabulary offering of several courses, and we can get approximate information about the vocabulary knowledge of our students.
One reason a vocabulary study is feasible is that the PET is based on a philosophy of vocabulary control and coverage, i.e. on a frequency-based wordlist aiming to represent the 'lexical core' of English. The University of Cambridge Local Examinations Syndicate claim that every test in their PET series is based on a publicly available wordlist, the Cambridge English Lexicon. They state that although a test question may present unspecified quantities of lexical realia, 'the answers to all questions in PET never depend solely on a knowledge of the meaning of any word outside the lexicon', and that 'words not contained in the list will be largely irrelevant to the answering of the questions on the realia.'
But is the PET wordlist itself the 'real' lexical core of English? Checking a set of coursebooks against one of the many wordlists in existence would be an empty exercise if the particular wordlist itself had no claim to legitimacy other than that it might get students over an arbitrary hurdle. It will be useful to look first at the background to the core-wordlist idea and then at the Cambridge Lexicon.
Two key implications for vocabulary instruction emerge from three decades of research in computational linguistics. First, a rather small number of words, and conveniently a more or less instructable number, account for a rather large proportion of everyday English text and speech. Several major studies settle on the finding that about 2,000 high frequency words account for about 80% of normal discourse with a sharp drop-off thereafter. Nation (1992, p.17) discusses this body of research, and offers the following representative output from a large-corpus analysis:
Table 1. Word frequencies, based on a count of 5 million running words
Different words | Percent of average text |
86,741 | 100 % |
43,831 | 99.0 |
5,000 | 89.4 |
3,000 | 85.2 |
2,000 | 81.3 |
1,000 | 49.0 |
10 | 23.7 |
(Carroll, Davies, and Richman, 1971, cited in Nation, 1990, p.17).
To paraphrase, if you know the most frequent 2000 words of English, then you know 81.3% of the words in an average non-technical text; but if you know 5000 words, you only get 89.4%, a small increase in return for your trouble of increasing your lexicon by 250%.
The advantages seem obvious of having control over 80% of the words of a text you might be trying to read in a foreign language. One is that if 80% of its words are familiar, you can probably work out the meaning of the rest for yourself.
But is a spelled-out wordlist really necessary to find these 2000 words? Do not experienced native-speaker teachers know intuitively what the core of English is, to the point that they can dish up most or all of it for their students? And if they leave a few gaps, do not these lexical syllabuses plug them? Computational linguists (for example Biber et al, 1994) have produced scores of examples of the limits of intuition in our profession, as revealed when teachers' and especialloy coursewriters' definitions and grammar rules are checked against large-corpus analysis of actual English in use. As a thought experiment, ask yourself right now: What percentage of basic 2000 does a course you have taught such as the New Cambridge, Headway, or COBUILD expose a student to? An informal survey of colleagues at SQU, who had taught one or more of these courses, showed they believed the course left their students in possession of the 2000 most frequent words of English 'well enough' or 'pretty well.'
So, granting that the real 2000-list would in principle be useful, is the PET list, in fact, this list? It is almost certainly not, at least not exactly. To begin with, the PET list is 2,387 words, not 2,000. But there is a circumstantial case for thinking that it may be close enough. The basis of the PET list is Hindmarsh's Cambridge English Lexicon (1980), which in turn is based on a collation of the main frequency-based wordlists of the past half-century. These include Thorndike and Lorge's (1944) list, and West's (1953) General Service List, assembled with great labour in the era before corpus and computer; and Kucera and Francis's (1967) Computational Analysis of Present Day American English, based on computer concordances of the million-word Brown University corpus. The idea of Hindmarsh's collation was to extract a common core from several researched lists, each being itself a core sampled from a large corpus of natural text and speech. In other words, the PET list comes directly out of the main research tradition, and is probably as good a pedagogic adaptation of this research as exists, as of 1980. As will be shown, it is basically this wordlist that shows up as the 'defining vocabulary' of Longman's Dictionary of Contemporary English (1978). Since then, of course, Sinclair's (1987) COBUILD team has entered the field with a 20-million-word corpus and enough computational power to boil out its secrets, so their notion of a pedagogical core should be of interest in the analysis of coursebooks below.
Most commercial EFL courses of recent years claim some sort of membership in some version of the lexical movement. The New Cambridge English Course (1990), for example, is very much a product of the ongoing lexical syllabus movement in EFL. The introductions to all three of its volumes (pp. vi-vii) make the following claims regarding vocabulary. First, there is 'an emphasis on systematic vocabulary learning.' Second, 'students must acquire a core vocabulary of the most common and useful words in the language' Third, vocabulary is at least as important as grammar: "Obviously grammar is important, especially at the early stages of learning a language, but it can be overvalued at the expense of other areas such as vocabulary growth." Similarly COBUILD and Headway: COBUILD's credentials as a lexical syllabus are discussed at length in Willis (1990); Headway's lexical manifesto appears in the introductions to its coursebooks: 'Vocabulary is often the poor relation to structure in the language classroom' and so on, a problem they imply they have rectified.
So, a vocabulary study is feasible because, in the framework of this united front on lexis, a good deal of information should be produced by merely matching course wordlists and PET wordlist via simple computational methods. Indeed, perhaps the study is too feasible to be worth doing. Particularly in the case of the New Cambridge course: if the Cambridge course and the Cambridge PET are both based on the Cambridge lexicon, what is there to investigate? The course obviously teaches the lexicon.
However, it is worth noticing what is not claimed in the manifestoes of these courses. None of the three I have mentioned states which if any specific wordlist it actually proposes as the core of English, why anyone should think that its list is really the core of English, what the rough size of this core might be, or what proportion of it a learner should expect to know, or to what depth or extent, by the end of each course book.
Several instruments for measuring an individual's approximate vocabulary size are currently under development. Two of the most promising are Meara's Eurocentre Vocabulary Size Test (1992) and Nation's Vocabulary Levels Test (1990, p.261). Nation's test has been used in the present study.
The PET list and the three course lists were compared through computerized sampling and list matching. I selected 20 consecutive words at 10 randomly chosen points in the PET list. Then I found out which of these words also appeared on the vocabulary list of the Cambridge course, and in addition on the COBUILD and Headway lists, and the LDOCE defining vocabulary list for comparison. My criterion was extremely liberal as to whether a given course word could be counted as an instance of a PET word. Also, if a word appeared in one coursebook but never again in the series, I still counted it as present in subsequent books.
Admittedly, a crude indicator like the fact that a word is on a particular back-of-the-book list tells us little about the depth of word knowledge the course aims at, or the number and variety of occurrences it provides, where most of the interesting conditions of lexical growth lie. In the courses to be examined below, computer analysis generally correlated appearance of a word in a unit list with both some specific way of drawing attention to the word as a new item, and then at least three or four occurences thereafter. Of course, the cleanest information from a course wordlist concerns the words that are not on it: these are the words that begining students are almost certain not to have any knowledge of whatever.
The students' approximate vocabulary sizes were taken at several points during the PET process, using Nation's test, which focuses on quite rudimentary word knowledge. I am gradually putting together a profile of failing as well as passing students, and I am particularly looking for stable correlates of success and failure that may be amenable to instruction.
The big surprise is that none of the courses, even by their final book, even by the most liberal reckoning, get much over half-way into the PET lexicon. The top-level Headway 3 Intermediate does not offer any specific wordlist and so is not included. Here is a summary of the vocabulary coverage, in terms of the PET list requirements, of three well known courses (as well as the defining vocabulary of the Longman Dictionary of Contemporary English, an EFL learner's dictionary, as a baseline comparison). Full details are available in an appendix at the end of this document.
CA1 | CA2 | CA3 | CO1 | CO2 | HE1 | HE2 | LDOCE | |
Sample 1 | 6 | 9 | 11 | 6 | 14 | 4 | 10 | 19 |
Sample 2 | 3 | 6 | 10 | 4 | 9 | 3 | 9 | 13 |
Sample 3 | 6 | 9 | 11 | 4 | 13 | 4 | 9 | 14 |
Sample 4 | 2 | 6 | 8 | 4 | 9 | 1 | 5 | 10 |
Sample 5 | 8 | 8 | 10 | 3 | 8 | 7 | 8 | 9 |
Sample 6 | 4 | 12 | 14 | 7 | 11 | 6 | 8 | 16 |
Sample 7 | 6 | 9 | 11 | 6 | 10 | 8 | 11 | 16 |
Sample 8 | 2 | 5 | 10 | 3 | 10 | 6 | 10 | 11 |
Sample 9 | 5 | 5 | 10 | 5 | 11 | 5 | 10 | 13 |
Sample 10 | 8 | 11 | 12 | 7 | 10 | 10 | 10 | 12 |
TOTAL | 50 | 80 | 107 | 49 | 105 | 54 | 90 | 133 |
MEAN |
5 |
8 |
10.7 |
4.9 |
10.5 |
5.4 |
9 |
13.3 |
---|---|---|---|---|---|---|---|---|
S.D. | 2.2 | 2.4 | 1.6 | 1.5 | 1.8 | 2.6 | 1.7 | 3.1 |
% of Pet Wds |
25 |
40 |
53.5 |
24.5 |
52.5 |
27 |
45 |
66.5 |
(CA=Cambridge; CO=COBUILD; HE=Headway; LDOCE=Longman's defining vocabulary).
In other words, students who covered the first Cambridge book and knew in some fashion every word in its lexicon would be familiar with about 25% of the PET list's 2400 words, or about 600 words - in addition of course to any words they had brought with them. With two books covered, the figure would rise to 40% of 2400 = 960 words, and with three books 53.5% of 2400 = 1284 words. So we see roughly the magnitude of the task of learning 2400 words at the rate proposed by the Cambridge coursewriters: each additional book adds roughly 350 new PET words, thus getting our students to 2400 by the end of Book 6, if it existed, at the end of three years.
But perhaps these courses are teaching lots of words at a more or less basic level but just not the specific PET-list words? To check this, I counted, in the same 10 random samples of 20 consecutive PET words, how many words these courses were teaching in addition to PET words. Here are the figures on words covered in the courses but not on the PET list:
EXTRA wds (not on PET list) | CA1 | CA2 | CA3 | CO1 | CO2 | HE1 | HE2 | LDOCE |
Extras in 200 | 6 | 8 | 28 | 1 | 21 | 13 | 32 | 46 |
Mean /20 wds | 0.6 | 0.8 | 2.8 | 0.1 | 2.1 | 1.3 | 3.2 | 4.6 |
S.D. | <1 | 1.03 | 1.6 | <1 | 2.1 | <1 | 2.3 | |
% Extra Wds | 3 | 4 | 14 | 0.5 | 10.5 | 6.5 | 16.5 | 23 |
These percentages seem rather minor. These courses do not appear to be aiming at some other version of a core lexis; they are based loosely on the PET lexis, but serve up only half of it, plus a few extras.
The appendix itself should be consulted to see the extent of the between-list gaps and overlaps. It will be noticed that the gaps are extensive, to say the least; the notion is clearly mythical that these 'lexically based' courses do anything like comprehensively teaching anyone the high-frequency core English, or that there is much agreement about which words this high-frequency core actually consists of. Of course, as mentioned, none of the courses actually claim that they do this, although it is widely assumed by their customers to be the basic idea of a lexical syllabus. Now is the time to recall your thought-experiment.
At CCE the courses chosen for our PET testees are Cambridge 1 for Band 1, and Headway's 2 and 3 for Bands 2 and 3.
Do the tallies above bear any relation to reality, i.e. to what the students actually know and their actual results on the PET? Here are the vocabulary sizes of passing and failing students as measured by the Levels Test at the 2000-word level of the Levels Test at different points in the PET sequence. The groups are composed either of all passes or all fails.
PASS Band 1 | FAIL Band 2 | PASS Band 2 | PASS Band 3 | ||||
---|---|---|---|---|---|---|---|
FEB94 | FEB95 | FEB94 | FEB95 | FEB95 | FEB95 | FEB95 | |
Group | "1A" | "1B" | "2D" | "2B" | "2C" | "3A" | "3B" |
33% | 39% | 27% | 33% | 44% | 88% | 50% | |
33 | 22 | 39 | 33 | 50 | 61 | 94 | |
22 | 5 | 33 | 50 | 50 | 66 | 83 | |
22 | 33 | 33 | 33 | 66 | 77 | 77 | |
22 | 39 | 27 | 39 | 61 | 66 | 83 | |
39 | 17 | 27 | 44 | 61 | 66 | 72 | |
39 | 39 | 50 | 72 | 33 | 61 | 66 | |
16 | 33 | 27 | 44 | 33 | 88 | 83 | |
50 | 39 | 33 | 55 | 66 | 72 | 72 | |
33 | 28 | 39 | 83 | 44 | 61 | ||
39 | 27 | 33 | 61 | 72 | 72 | ||
22 | 28 | 33.5 | 77 | 61 | |||
61 | 61 | 7.1 | 55 | 44 | |||
44 | 17 | ||||||
11 | |||||||
MEAN % | 33.65 | 31 | 33.5 | 49.7 | 52.7 | 70.7 | 75.5 |
S. Dev. | 12.4 | 13.9 | 7.1 | 18.8 | 12.7 | 9.9 | 12.6 |
# Words | 672 | 620 | 670 | 994 | 1054 | 1414 | 1510 |
So, surprise number two is that our students don't know very many words. They appear to be entering their year of English with some number fewer than 600 words (30% of 2000). Successful students seem to be exiting with 1400 or 1500 words (70-75% of 2000). As mentioned, however, most of the students are not keeping up with this lexical timetable.
On the bright side, deficits do not appear to be random or untreatable. While there is clearly some within-group noise present in the data, three distinct groups nonetheless fall out of a statistical analysis (p <.01). The vocabulary size groups are at roughly 30%, 50%, and 70% of PET lexis, and they correspond to the exit points of Bands 1, 2, and 3. Further, these points appear to have some sharpness of definition about them; for example, the group who had just failed the Band 2 of the PET (column 3) were statistically equal to the group that had just passed Band 1.
Nation's size measure also lines the students up with their courses rather successfully. The Band 1 students had just completed Cambridge 1, so their 33% of 2000 = 660 Nation-test words is remarkably similar to the 25% of 2400 = 600 PET words offered by Cambridge 1 (Table 2). The Band 2 students had just completed Headway 2, so their 50% of 2000 = 1000 Nation-test words is remarkably similar to 45% of 2400 = 1080 PET words. Not amazingly, the students appear to be learning roughly what they are being taught.
So, here are some usable correlates of success and failure to help us demystify a little what it is that students have to know in order to succeed in the PET.
Recall the problem this analysis was meant to address: 65% of students have difficulty getting from Band 3 to 4. The students shown in Table 3 as knowing 1400 to 1500 words in Band 3 are of course in the 35% who do not have difficulty getting to Band 3, a minority. These students may be adept learners, have been exposed to English before beginning their course, etc. What about the more typical student - the student having trouble clearing the PET hurdles on schedule? What resources for lexical growth is this students being offered?
In Table 2 we saw that the three "lexical" language courses in fact took learners, at least in any systematic way, only up to a little beyond 1000 words. Yet the students need more like 1500 words to succeed in the PET. The typical approach in thee courses, at just after 1000 words, is to progress from systematic or direct vocabulary teaching to "strategies for independent learning." The course chosen at SQU for these students is Headway 3, Intermediate, which as noted in the discussion of Table 2 has no specific vocabulary offering.
The meaning of this should be made quite clear. It does not just mean that there are no unit wordlists at the back of the book, but that the new words the course does present will be both drawn from a range of frequency levels, and not 'taught' even in the sense of appearing more than once over the course of the book. I have run a unique-words computer check of Headway Intermediate, and found a very high proportion of medium-frequency words that appear only once or at most twice.Here is the Headway coursewriters' philosophy of vocabulary acquisition (Teacher's Book, p. v):
Teachers can adopt one of two approaches [to teaching vocabulary]:
1 Teach students a lot of new words as often as possible, providing for adequate practice and revision.
2 Show students ways of approaching their own vocabulary learning.
Both are necessary, but obviously the second is more powerful. In Headway [Int], there are many activities that introduce lexical areas. Examples of these are sports, the weather,adjectives of description, television programmes, accidents and illnesses.
The majority of the vocabulary work, however, concentrates on introducing students to the systems of vocabulary and vocabulary learning strategies.
This sounds fine, until you remember that the students, graduates of the preceeding Headway2 course, will know about 1000 words, and that (Table 1) with 1000 words you are clear on only 50% of an average text. In other words, every second word is out of focus for you. In contrast, Nation (1992) argues that inferencing can only usefully begin when a learner has about 3500 words - i.e. when merely 1 word in 20 is unknown. Writers of cloze passages will confirm that a ratio of 1 gap per 20 known words is considered 'tough' by most students. The texts in Headway 3 are effectively, for graduates of Headway 2 who can be assumed to know 1000 words, cloze passages with a gap for every two words.
Consider, for example, the following representative reading passage from Headway Intermediate (Unit 13, p 74) as an opportunity to apply word learning strategies for students who know 1000 words. I have computer checked the text against the PET list of 2387 words, so that every word at a frequency level beyond the PET list is replaced by a gap. The text, then, appears as it does to a student who knows all the words on the PET list:
MACHINE ____ PASSPORTS
A recent New Scientist article reported that within five years most Western countries will be ____ their ____ with a machine ____ passport that will carry with it the ____ of ____ ____ of ____ travellers. Says journalist Steve Connor: The new passport could mean that anyone (crossing a ____) can be stopped and checked until a computer ____ of 'No ____' allows them to go on about their business. The computerised passport allows the list of people who, for various reasons, are ____ as ____, to ____ almost without limit.'
MACHINE READABLE PASSPORTS
A recent New Scientist article reported that within five years most Western countries will be issuing their citizens with a machine readable passport that will carry with it the threat of global surveillance of innocent travellers. Says journalist Steve Connor: The new passport could mean that anyone (crossing a border) can be stopped and checked until a computer statement of 'No trace' allows them to go on about their business. The computerised passport allows the list of people who, for various reasons, are labelled as suspicious, to expand almost without limit.'
Only the vaguest sense of what is happening in the text can be constructed with so many words missing. The ratio of unknown to known here is 13 in 92 - closer to 3 in 20 than 1 in 20. How much more opaque must this text seem to a student knowing only 1000 words? The majority of Band 3 students are in no position to learn new words by struggling through such texts, and it is irresponsible of coursebook writers to suggest that they are.
The "strategies" approach to learning has clearly been begun too early. The point is, without some sort of counting-up such as I have tried to perform here, a teacher would have little way of knowing if such a passage was wildly out of step with the students' learning abilities, or the students merely unwilling to extend themselves.
The commonsense view that the Cambridge Course prepares foreign students for the Cambridge PET examination is therefore shown to be false. Looking just at lexis -- ignoring grammar, culture knowledge, and many other problematic aspects of the test -- the Cambridge courses take the learners little over half way to their objective. In this Cambridge is not unique but rather typical; even COBUILD does not present learners with anything like 2000 words. And Headway was possibly the worst choice of the three.
There is an unnannounced chasm between about 1100 and 2000 words which these "lexical" coursebook manifestoes do not mention. It takes an effort to locate this chasm, at least for teachers and course designers, but our students find it easily and fall right into it. In other words, we are punishing students for not having succeeded in learning what we have not given them the resources to learn. And rewarding students who for whatever reason did not really need our courses to do well in the PET.
Why do these 'lexically based' courses fail to carry the student from 1200 to 2000 words? Some possible reasons:
The last two reasons seem most plausible.
If we are unlikely to find a commercial course that deals with our learners' lexical problem, then some sort of supplementary vocabulary work seems to be indicated. To this end I have written a computer program for the Band 3 students called PET·2000 that seems to be having some success in rapid exposure to 2000 words within the constraints of the situation described and using little or no class time. The program is a concordance routine with a custom interface and corpus designed to bring the technology and language within the students' grasp. The description of this project is the subject of my thesis and several papers.
Anderson, R.C., & Freebody, P. (1981). Vocabulary knowledge. In J.T. Guthrie (Ed.), Comprehension and teaching: Research reviews. Newark, DE: International Reading Association.
Biber, D., Conrad, S., & Reppen, R. (1994). Corpus-based approaches to issues in applied linguistics. Applied Linguistics, 15 (2), pp. 169-189.
Goulden, R., Nation, P., & Read, J. (1990). How large can a receptive vocabulary be? Applied Linguistics, 11 (4), 341-358.
Hindmarsh, R. (1980). Cambridge English Lexicon. Cambridge: Cambridge University Press.
Kucera, H., & Francis, W.N. (1967). A computational analysis of present-day American English. Providence, RI: Brown University Press.
Longman Dictionary of Contemporary English. (1978/90). London: Longman.
Meara, P., & Jones, G. (1990). Eurocentres Vocabulary Size Test. 10KA. Zurich: Eurocentres.
Nagy, W.E., Herman, P.A., & Anderson, R.C. (1985). Learning words from context. Reading Research Quarterly, 20 (2), 233-253.
Nation, P. (1982). Beginning to learn foreign vocabulary: A review of the research. RELC Journal, 13 (1), pp. 14-36.
Nation, P. (1990). Teaching and learning vocabulary. New York: Newbury House.
Norman, S. (1982). We mean business: An elementary course in business English. London: Longman.
Redman, S., & Ellis, R. (1991). A Way with words: Vocabulary development activities for learners of English. Cambridge: Cambridge University Press.
Sinclair, J. (Ed.) (1987). Looking up: An account of the COBUILD project in lexical computing. London: Collins ELT.
Soars, J., & Soars, L. (1991). Headway. London: Oxford University Press.
Swan, M., & Walters, C. (1990). The New Cambridge English Course. London: Cambridge University Press.
Thorndike, E.L., & Lorge, I. (1944). The teacher's word book of 30,000 words. Teacher's College, Columbia University, New York.
University of Cambridge. (1990). Preliminary English Test: Vocabulary list. Cambridge Local Examinations Syndicate: International examinations.
West, M. (1953). A general service list of English words. London: Longman, Green & Co.
Willis, D. (1990). The lexical syllabus. London: Collins Cobuild.
Willis, D., & Willis, J. (1988). The Collins Cobuild English course. London: Collins Cobuild.
How many of the 2000 highest frequency words are learners really explosed to in a "lexically oriented course"? Compare typical course offerings to (a) PET List (left column) and (b) GSL (General Service List, West, 1953, right column). Also, compare Pet List to GSL to see that these lists while similar are not identical. However, courses do a poor job of covering either.
(CA=Cambridge; CO=COBUILD; HE=Headway; LDOCE=Longman's defining vocabulary)
SAMPLE 1 | ||||||||||||
PET | CA1 | CA2 | CA3 | CO1 | CO2 | CO3 | HE1 | HE2 | HE3 | LDOCE | GSL | |
able | x | x | x | x | (x) | x | x | |||||
about | x | x | (x) | x | x | (x) | x | x | ||||
above | x | x | x | x | (x) | x | x | |||||
abroad | x | (x) | x | (x) | x | x | ||||||
absent | x | x | ||||||||||
accept | x | (x) | x | x | x | |||||||
accident* | x | x | x | (x) | x | x | x | x | ||||
accommodation* | x | (x) | x | |||||||||
account (bank) | x | x | x | x | x | |||||||
ache | x | x | ||||||||||
across | x | x | x | x | x | (x) | x | x | x | |||
act | x | x | x | x | x | |||||||
actor | x | (x) | x | x | x | (x) | x | |||||
actress | x | (x) | (x) | x | x | x | x | |||||
actual(ly) | x | x | x | x | x | (x) | x | x | ||||
ad(vertisement) | x | x | x | |||||||||
add | x | (x) | x | x | ||||||||
address (n) | x | (x) | (x) | x | (x) | (x) | x | (x) | x | x | ||
admire | x | x | ||||||||||
advanced | x | (x) | x | x | ||||||||
TOTAL 20 |
6 |
9 |
11 |
6 |
14 |
17 |
4 |
10 |
19 |
16 |
---|
(x) = word does not appear in this cousebook but appeared in a previous book in the series.
SAMPLE 2 | ||||||||||||
PET | CA1 | CA2 | CA3 | CO1 | CO2 | CO3 | HE1 | HE2 | LDOCE | GSL | ||
bleed | x | x | x | |||||||||
blind | x | x | x | x | x | |||||||
blinds (n) | ||||||||||||
block (flats) | x | x | x | x | x | |||||||
blond(e) | x | (x) | ||||||||||
blood* | x | x | x | |||||||||
blouse | x | (x) | x | x | (x) | |||||||
blow | x | x | x | x | x | |||||||
blow up | ||||||||||||
blue | x | (x) | x | x | (x) | (x) | x | (x) | x | x | ||
blunt | ||||||||||||
board | x | (x) | x | x | x | |||||||
boat | x | x | x | (x) | x | x | ||||||
body | x | x | x | x | (x) | x | x | |||||
boil | x | x | x | x | ||||||||
bold | x | |||||||||||
bomb | x | |||||||||||
bone | x | x | x | x | ||||||||
book (n) | x | (x) | (x) | x | (x) | (x) | x | (x) | x | x | ||
book (v) | x | (x) | (x) | x | ||||||||
TOTAL 20 |
3 |
6 |
10 |
4 |
9 |
10 |
3 |
9 |
13 |
12 |
---|
SAMPLE 3 | ||||||||||||
PET | CA1 | CA2 | CA3 | CO1 | CO2 | CO3 | HE1 | HE2 | LDOCE | GSL | ||
excellent | x | x | (x) | x | x | x | ||||||
except | x | x | x | x | (x) | x | x | x | x | |||
exchange | x | x | x | x | ||||||||
exchange rate | x | |||||||||||
excited | x | x | x | x | ||||||||
exciting | x | x | x | (x) | x | x | x | |||||
excuse | x | (x) | x | x | (x) | x | x | |||||
exercise | x | x | x | x | x | x | ||||||
exhibition | x | (x) | ||||||||||
expect* | x | x | x | |||||||||
expensive* | x | x | x | x | x | (x) | x | (x) | x | x | ||
experience | x | x | x | (x) | x | x | x | x | ||||
experiment | x | x | ||||||||||
explain* | x | x | x | (x) | x | x | x | |||||
explode | x | x | x | |||||||||
explore* | x | |||||||||||
extra | x | x | (x) | x | ||||||||
extraordinary | x | x | x | |||||||||
extremely* | x | (x) | x | x | x | x | x | |||||
eye | x | x | x | x | (x) | (x) | x | (x) | x | x | ||
TOTAL 20 |
6 |
9 |
11 |
4 |
13 |
17 |
4 |
9 |
14 |
18 |
---|
SAMPLE 4 | ||||||||||||
PET | CA1 | CA2 | CA3 | CO1 | CO2 | CO3 | HE1 | HE2 | LDOCE | GSL | ||
inch | x | x | ||||||||||
include* | x | (x) | x | (x) | x | x | x | |||||
incorrect | x | |||||||||||
increase | x | x | x | x | ||||||||
indeed | x | x | ||||||||||
independent | x | (x) | x | x | x | (x) | x | |||||
index | ||||||||||||
individual | x | (x) | x | |||||||||
industry | x | x | x | (x) | x | x | ||||||
influence | x | (x) | x | x | x | |||||||
inform | x | x | x | |||||||||
information | x | x | x | x | (x) | x | x | |||||
inhabitant | ||||||||||||
initial | x | |||||||||||
injure | x | x | ||||||||||
ink | x | x | ||||||||||
insect | x | x | x | x | x | |||||||
inside | x | x | x | (x) | x | x | ||||||
insist | x | (x) | (x) | |||||||||
instead | x | x | x | (x) | x | x | ||||||
TOTAL 20 |
2 |
6 |
8 |
4 |
9 |
15 |
1 |
5 |
10 |
12 |
---|
SAMPLE 5 | ||||||||||||
PET | CA1 | CA2 | CA3 | CO1 | CO2 | CO3 | HE1 | HE2 | LDOCE | GSL | ||
jacket | x | x | x | x | (x) | x | (x) | |||||
jail* | ||||||||||||
jam | x | x | ||||||||||
January | x | (x) | (x) | x | (x) | (x) | ||||||
jar | ||||||||||||
jazz | x | |||||||||||
jealous* | x | x | ||||||||||
jeans | x | (x) | (x) | x | (x) | |||||||
jet | ||||||||||||
job | x | x | x | x | x | x | x | (x) | x | |||
jockey | ||||||||||||
join* | x | x | x | x | x | x | (x) | x | x | |||
join in | ||||||||||||
joke | x | (x) | x | x | x | x | ||||||
journey | x | x | (x) | x | (x) | x | (x) | x | x | |||
joy | x | (x) | x | x | ||||||||
judge | x | x | x | x | ||||||||
juice | x | x | (x) | x | x | |||||||
July | x | (x) | (x) | x | (x) | (x) | ||||||
jump | x | x | (x) | x | x | |||||||
TOTAL 20 |
8 |
8 |
10 |
3 |
9 |
10 |
7 |
8 |
9 |
8 |
---|
SAMPLE 6 | ||||||||||||
PET | CA1 | CA2 | CA3 | CO1 | CO2 | CO3 | HE1 | HE2 | LDOCE | GSL | ||
memory* | x | x | (x) | x | x | x | ||||||
mend | x | x | x | (x) | x | x | ||||||
menu* | x | (x) | ||||||||||
merry | x | x | ||||||||||
message | x | x | x | (x) | x | x | x | x | ||||
metal* | x | x | x | (x) | x | x | ||||||
method* | x | (x) | x | |||||||||
metre | x | x | ||||||||||
midday | x | |||||||||||
middle | x | x | x | (x) | x | x | ||||||
midnight | x | (x) | x | x | ||||||||
might | x | (x) | x | x | (x) | x | ||||||
mile | x | (x) | x | (x) | (x) | x | x | |||||
milk | x | x | (x) | x | (x) | x | (x) | x | x | |||
million | x | x | (x) | x | x | |||||||
mind (N&V) | x | x | x | x | x | (x) | x | x | ||||
mine (pron) | x | x | x | (x) | (x) | x | x | |||||
minute | x | x | x | x | (x) | (x) | x | x | ||||
mirror | x | x | x | x | (x) | x | ||||||
miserable* | x | x | ||||||||||
TOTAL 20 |
4 |
12 |
14 |
7 |
11 |
14 |
6 |
8 |
16 |
12 |
---|
SAMPLE 7 | ||||||||||||
PET | CA1 | CA2 | CA3 | CO1 | CO2 | CO3 | HE1 | HE2 | LDOCE | GSL | ||
post (n/v) | x | (x) | x | (x) | x | (x) | x | x | ||||
postcard | x | (x) | (x) | x | (x) | |||||||
poster | x | x | ||||||||||
postman | x | |||||||||||
pot | x | x | x | |||||||||
potato | x | x | x | x | x | x | ||||||
pound (lb/£) | x | x | (x) | x | x | (x) | x | x | ||||
pour | x | x | x | x | ||||||||
power | x | x | x | x | ||||||||
powerful* | x | (x) | x | x | ||||||||
practice | x | x | x | x | x | |||||||
practise | x | x | (x) | x | (x) | x | ||||||
pray | x | x | ||||||||||
prayer | x | x | ||||||||||
prefer* | x | x | x | x | (x) | x | ||||||
prepare | x | x | x | x | x | x | x | |||||
present | x | x | x | x | (x) | x | x | |||||
president | x | x | x | x | (x) | x | x | |||||
press (v) | x | (x) | (x) | x | x | (x) | x | x | ||||
pretty | x | x | x | x | (x) | (x) | x | (x) | x | x | ||
TOTAL 20 |
6 |
9 |
11 |
6 |
10 |
14 |
8 |
11 |
16 |
13 |
---|
SAMPLE 8 | ||||||||||||
PET | CA1 | CA2 | CA3 | CO1 | CO2 | CO3 | HE1 | HE2 | LDOCE | GSL | ||
receipt | x | (x) | x | x | ||||||||
receive | x | (x) | (x) | x | x | x | ||||||
recent | x | (x) | (x) | x | x | |||||||
receptionist | x | x | ||||||||||
recognise | x | (x) | x | (x) | x | x | x | |||||
recommend * | x | x | x | |||||||||
record | x | x | x | x | x | x | ||||||
record player | ||||||||||||
recorder | ||||||||||||
recover | x | x | ||||||||||
red | x | x | x | x | (x) | (x) | x | x | x | |||
reduce * | x | (x) | x | x | ||||||||
refrigerator | x | |||||||||||
refuse (v) * | x | (x) | x | x | x | x | x | |||||
regarding | x | x | ||||||||||
regret (v) * | x | x | x | |||||||||
regular | x | (x) | x | x | x | |||||||
relation | x | x | x | x | x | |||||||
relax | x | (x) | x | (x) | x | x | ||||||
religion | x | x | x | (x) | x | x | x | |||||
TOTAL 20 |
2 |
5 |
10 |
3 |
10 |
14 |
6 |
10 |
11 |
13 |
---|
SAMPLE 9 | ||||||||||||
PET | CA1 | CA2 | CA3 | CO1 | CO2 | CO3 | HE1 | HE2 | LDOCE | GSL | ||
same | x | x | x | (x) | (x) | x | (x) | x | x | |||
sand | x | (x) | x | x | ||||||||
satisfactory | x | x | x | |||||||||
Saturday | x | (x) | (x) | x | (x) | (x) | ||||||
saucer | x | |||||||||||
sausage | x | (x) | ||||||||||
save * | x | x | (x) | x | x | x | ||||||
say | x | x | x | x | x | (x) | x | (x) | x | x | ||
scene | x | x | x | x | x | |||||||
scenery * | x | |||||||||||
school | x | x | x | x | (x) | (x) | x | x | ||||
science | x | (x) | x | x | x | |||||||
scientific | x | (x) | x | x | ||||||||
scientist | x | (x) | x | x | ||||||||
scissors | x | x | x | x | ||||||||
score | x | x | ||||||||||
scratch | x | x | ||||||||||
scream | x | x | x | (x) | ||||||||
screen | x | x | ||||||||||
sea | x | x | x | x | (x) | (x) | x | (x) | x | x | ||
TOTAL 20 |
5 |
5 |
10 |
5 |
11 |
15 |
5 |
10 |
13 |
13 |
---|
SAMPLE 10 | ||||||||||||
PET | CA1 | CA2 | CA3 | CO1 | CO2 | CO3 | HE1 | HE2 | LDOCE | GSL | ||
translate | x | (x) | x | (x) | x | x | ||||||
translation | x | |||||||||||
transport | x | (x) | x | (x) | ||||||||
travel | x | x | x | x | x | (x) | x | (x) | x | x | ||
traveller | ||||||||||||
traveller's cheque | ||||||||||||
tree | x | (x) | x | (x) | (x) | x | (x) | x | x | |||
trip | x | x | (x) | (x) | x | (x) | x | x | ||||
trouble * | x | x | x | x | (x) | x | x | |||||
trousers | x | (x) | x | x | x | (x) | x | |||||
truck | ||||||||||||
true | x | x | x | x | (x) | (x) | x | x | ||||
trust (v) * | x | x | x | (x) | x | x | ||||||
truth | x | x | x | x | ||||||||
try | x | x | x | x | x | (x) | x | (x) | x | |||
try on | x | x | x | (x) | ||||||||
T-shirt | x | x | ||||||||||
tube | x | x | ||||||||||
Tuesday | x | (x) | (x) | x | (x) | (x) | ||||||
turn | x | x | x | x | x | (x) | x | (x) | x | x | ||
TOTAL 20 |
8 |
11 |
12 |
7 |
10 |
13 |
10 |
10 |
12 |
10 |
---|