Growing Academic Vocabulary with a Collaborative On-line Database


Marlise Horst, Concordia University, Montreal

Tom Cobb, Université du Québec à Montréal


Paper presented at IT-MELT '01, Polytechnic University of Hong Kong, June 2001.


Most educators would agree that one of the most exciting aspects of using computerized resources for second language acquisition is the potential of these resources for meeting the very diverse needs of individual learners efficiently and effectively. We would argue that nowhere is the need for individualized instruction greater than in the area of academic vocabulary for learners intending to do university work in English. The basis for this claim is the fact that many university-bound ESL learners already know the core vocabulary of English; that is, they have progressed beyond the point where an all-purpose course in the most basic thousand words would be equally beneficial for all. As proficiency and vocabulary size increase, learners vary hugely in what they know and what they need to learn (Nation, 1990). Thus, the Chinese speaking learner arriving in Canada to study computer science may already know - as a result of study in his home country - many words that will be useful for doing academic work in English. But he is likely to know a very different set of items from the student seated next to him in ESL class, for example a Francophone learner from Quebec intending to study stage design. It is also clear that the words these two students will want to study to prepare for courses in their majors will be quite different. The challenge for the course designer is to build a course that both of these students (and many others) can profit from. Our proposed answer to the challenge is to put technology tools in the hands of learners so that they can construct the courses they need for themselves. In this paper we describe how we tested this idea in an experimental ESL vocabulary course for academic learners at Concordia University in Montreal.



The course-design question the authors set out to answer can be stated as follows: How can we ensure that a vocabulary course offers academic learners of varying L1 backgrounds, L2 proficiency levels, and academic objectives a significant opportunity to focus on the words they need to know? Can one course address the needs of students preparing for study in content domains as varied as computer science and stage design? In addition to tailoring a course to meet students' vocabulary needs as effectively as possible, we were interested in investigating the role of technology in providing individualized instruction, and we wished to test a number of research claims about instructed vocabulary learning as well. The following section details the principles that guided the design of the course and outlines our research agenda.



Designing the course


The guidelines the designer/researchers arrived at were as follows:


·        Start with reading as a source of new vocabulary.

·        Provide technology tools for students to create the course they need.

·        Recognize that studying domain-specific vocabulary is important.

·        Recognize that knowing subtechnical academic vocabulary is crucial.

·        Raise awareness of proven word learning strategies.

·        Challenge learners to study hundreds of new words.


Each of these will be discussed in turn.



Start with reading


There are at least four good reasons for focusing an academic vocabulary course on reading. First, although university-bound students need vocabulary knowledge to be able to speak and write in content courses, it is clear that the ability to read and understand course content as presented in textbooks is central. People who know more words understand more of what they read than people who know fewer; indeed reading comprehension is so closely associated with vocabulary knowledge that test designers have had difficulty in distinguishing between the two (Read, 1997). Secondly, receptive knowledge represents the starting point of the word learning process. Generally, L1 and L2 learners alike begin to feel they know a new word when they can recognize its meaning when they read (or hear) it, while more active knowledge such as the ability to produce a fully correct definition of the word or to use it accurately in an original sentence tends to come later (Wesche & Paribakht, 1996).


Thirdly, analyses of large corpora of written and spoken language indicate that written texts are much more likely to contain words that would be unfamiliar to intermediate ESL learners than spoken texts. Spoken discourse makes heavy use of common words - words that most intermediate ESL learners would probably already know - and rarely presents items outside a list of the 2000 most frequent words of English (West & Stanovich, 1991). It is clear that requiring students to read widely is a good way of ensuring they have many opportunities to meet new words beyond this most basic level. Finally, reading passages are natural ready-made learning materials for vocabulary acquisition because they present unfamiliar words in authentic sentence and discourse contexts. These contexts can provide the learner with valuable grammar information, rich associative links to other words, and importantly, useful clues to meaning.


In implementing an individualized course with reading at its centre, the course designers felt it was important to give the students a role in selecting the texts that would serve as the basis for their vocabulary learning. So instead of prescribing a core text, we required students to buy (or read on-line) a quality newspaper and read any two articles of their choice each week. The newspaper that we chose for this purpose was the Focus section of the weekend issue of the Toronto Globe and Mail. This weekly supplement presents well written essays on a variety of topics written in a style that can be termed "academic." Each week students were required to prepare brief oral or written summaries of the two articles they had chosen. They were also expected to look up the meanings of new words they encountered in dictionaries.



Provide technology tools for students to create the course they need


The next question that the course designers faced was how we might use the valuable information gleaned in individual word quests to build a student-generated vocabulary course that all could access easily. While we recognized that not every student would be interested in each of every other student's dictionary findings, we reasoned that each student would belong to a number of constituencies within the class that had common vocabulary needs. That is, a new word encountered by one Francophone learner in a newspaper text might well be unfamiliar to other Francophone learners in the group. Similarly, if a student interested in biology read a piece about Nova Scotia fisheries and was curious about the meaning of crustacean, then other science majors who opted to read the same article might well be wondering about it too. If the pool of word findings was large, the chances that each student would find a useful body of new material to study would be increased. Thus the technology design challenge was to offer students a simple format for creating their own course by building up a large collection of vocabulary findings and to provide them with an easy way to share the collection with each other. Clearly, computer technology in the form of a collaborative on-line database offered a solution.


The collaborative project we opted for was an on-line Word Bank. Figure 1 shows the homepage which the second author designed for the course. The button for word bank entry appears at the top of the middle column under Focus Activities. Clicking on this button brings up a data entry template which presents the student with spaces for entering a word, followed by an example of the word used in context, word class information, a dictionary definition, and the contributor's name. Each week the group of 33 students were required to enter five new words they had encountered in their newspaper reading in the Focus Word Bank. A sample of the 165 entries made in Weeks 1 and 2 of the course is shown in Figure 2.


Figure 1

Homepage for Academic Vocabulary Development, an experimental ESL course



Figure 2

Sample entries to collaborative on-line database, the Focus Word Bank




So far our rationale for a collaborative on-line databank has emphasized advantages such as the efficient dissemination of a large body of material and the potential for individualizing instruction. But another important benefit is the word learning that is likely happen when a student creates a Word Bank entry. Hulstijn, Greidanus and Hollander (1996) have shown that the act of looking up a word in a dictionary increases the chances that the learner will remember it;  it seems likely that the act of typing out a definition and example sentence also contributes making the new word memorable.


In addition to the on-line Word Bank, an on-line dictionary and concordancer were made available to support learners' vocabulary learning. These computer tools will be described in a later section.



Recognize that studying domain-specific vocabulary is important


In the interest of living up to the name of the course (Academic Vocabulary Development), we felt it should offer participants the opportunity to study words specific to their chosen fields of study. We assumed that the idea of learning new science, economics, or art history vocabulary would appeal to students and it was clear that on-line technology could help address a range of individual interests. That is, the tools used to build a class word bank for general newspaper reading could also be used to build mini-wordbanks in specialist domains.


To implement this idea, we divided the students into special interest groups according to the academic field they intended to study. Five groups were formed around the following domains: arts, business, computers, science, and education. Each student was asked to locate a short reading on a topic in their field to share with other group members. Three times during the 13-week course (that is, once a month) students read two of these readings, summarized them, and entered five new words into the Specialist Word Bank, just as they did on a weekly basis with their newspaper reading. The entry form for this task can be accessed from the course home page by clicking on the button at the top of the third column entitled "Specialist Activities" (see Figure 1). The specialist data base allowed users to group word entries in alphabetical order, by student contributor, or by specific domain. Figure 3 shows a sample of Specialist Word Bank entries grouped by domain: here we see items contributed by members of the Business Group.


Figure 3

Sample entries to collaborative on-line database, the Specialist Word Bank



Recognize that knowing subtechnical academic vocabulary is crucial.


Perhaps one of the most useful research findings to come out of corpus analyses of English texts is the identification of a core set of about 850 word families that occur frequently and consistently in academic texts - across disciplines. (A word family is defined as a root word, e.g. produce, and its derived forms, e.g. product, production, unproductive, etc.) The importance of being able to recognize the meanings of word families on what is known as the University Word List (Xue & Nation, 1984) is made dramatically clear in two versions of an authentic textbook passage about increasing forest productivity shown below (from Nation, 1990). In the first version, all items that are not among the 2000 most common word families of English appear as blanks. Thus, reading this passage simulates the experience of ESL learners with no more than a basic knowledge of English when they are confronted with university texts. It is possible to get some sense of subject and gist in this version but much of the informational content is simply unavailable.


Version 1


....the increasing wood supplies will _______ a larger _______ force, an improved roading network, and _______ _______ and _______ _______ . If the trees are to be _______ , then certain _______ must be made. They will include _______ in: logging machinery and _______ ; logging trucks, and other _______ _______ for the _______ of _______ products; ....


Version 2

....the increasing wood supplies will require a larger labor force, an improved roading network, and expanded transport and processing facilities. If the trees are to be exported, then certain investments must be made. They will include investments in: logging machinery and equipment; logging trucks, and other _______ required for the transport of processed products; ....


In the second version of the same text, the words that are shown are both high frequency words from the 2000 list and UWL words. Reading this version simulates the experience of an ESL learner who comes to the task of academic reading armed with knowledge of all items on both lists. Now only one blank remains: the text suddenly becomes comprehensible and the missing item (vehicle) can be readily guessed from context. Interestingly, the UWL items that make the difference are not technical words specific to the domain of forestry but rather all-purpose or "subtechnical" words like require, labor, process and equipment. It is clear that university-bound learners stand to profit a great deal from studying this key set of items.


The "covering" power of the UWL is further detailed in Table X which is based on analyses by Nation and Waring (1997) and Sutarsyah, Nation and Kennedy (1994). The fourth line suggests that an ESL student who knows both the 2000 most frequent words of English and the items on the UWL will understand 90% of the running words in a typical academic text, regardless of its subject matter. Since a receptive vocabulary size of about 3000 words (2000 + UWL = 2800) has also been identified as the watershed between comprehension and noncomprehension in studies of academic reading by Laufer (refs), we felt there were compelling reasons to prioritize the study of the UWL in our experimental course.


Table 1

Frequent English words and coverage of academic texts


No. of word families

Percent coverage

Ratio unknown:known










2000 + UWL



2000 + UWL + Specialist







The fifth line of Table 1 reflects findings by Sutarsyah, Nation and Kennedy (1994). Their work indicates that knowledge of several hundred words that recur frequently in the texts of a particular subject domain can offer additional coverage such that a reader who knew these would know as many as 95% of the running words. This suggested to us that the plan to devote some attention to domain specific words was justified. Unfortunately, however, lists of high frequency core vocabulary for specific domains have not (yet) been compiled. So it was not possible to specify exactly which business, arts or science words were important for our learners to study. Nonetheless, we assumed the scheme outlined above for reading in subject domains and contributing new vocabulary to the Specialist Word Bank would represent a useful step in preparing the learners for study in their chosen fields.


To understand where the 35 students registered for the experimental course stood in relation to the word frequency zones identified as important for academic reading success, we administered an updated version of the Vocabulary Levels Test (Nation, 1990; Schmitt, 2000). This instrument uses a multiple-choice cluster format to test receptive knowledge of items sampled from each of five zones: the 2000 most frequent words, words on the 3000, 5000, 10,000 most frequent lists, and the Academic Word List (a recent update of the UWL; Coxhead, 2000). The test requires testees to match items to simply worded definitions. An example of a question cluster from the section that tests the Academic Word List is shown in Table 2.



Table 2

Sample question from the AWL section of the Vocabulary Levels Test: Version 1 (Schmitt, 2000)


1. benefit

2. labor

3. percent

4. principle

5. source

6. survey


____ work

____ part of 100

____ generated idea used to guide one's actions



Test results confirmed the expectation that in terms of vocabulary knowledge, the learners were indeed a very diverse group. Although the group means in Table 3 suggest that students could recognize the meanings of 80% of the tested words at the 2000, 3000 and Academic levels, the standard deviations reveal that individuals varied considerably. Some of the French speaking students scored high on the Academic list (which contains many words of Latin origin) but low on the 2000 list (which contains many words of Germanic origin). Many of the Asian students (mostly Chinese speakers) scored high on the test of the 2000 list but fared less well on other lists.


Table 3

Pretest means on the Vocabulary Levels Test by section; maximum score = 30





































These outcomes confirmed our intuitions about the need for an individualized course designed to meet highly diverse needs. Furthermore, it was clear that the students had plenty of work to do in the Academic/University Word List zone. Few students had full mastery of words at this level, and given its importance for academic reading success, it seemed likely that all would benefit from intensive study of these words. To implement this goal, we divided the 800 words of the UWL into twelve 65-word lists for week-by-week studying and testing. So in addition to reading and summarizing two readings each week and submitting Word Bank entries, students prepared for a UWL quiz. About 30 minutes of each of the twice-weekly 90-minute class periods were spent either on UWL learning activities or testing. An example of a weekly quiz appears in Appendix 1.



Raise awareness of proven word learning strategies


So far we have discussed the vocabulary learning materials on offer (the Word Bank entries and the UWL), ways of delivering it (class and website activities) and an evaluation component (the UWL quizzes). But as O'Dell (1997) and others point out, training students how they can learn most effectively is a key aspect of any language course. Our examination of the vocabulary learning research identified five main strategies for successful retention of form-meaning associations. These are: keyword mnemonics (Brown & Perry, 199?), word-part analysis (Sokmen, 1997), elaborative sentences (Brown & Perry, 1991; Ellis 1997), dictionary use (Hulstijn, Greidanus & Hollander, 1996) and concordancing (Cobb 1997). We set the goal of familiarizing students with each of these in class activities and, where possible, on the website.


In practice, it turned out that some of the strategies were limited in their applicability. For instance, the much acclaimed keyword imaging technique can only be applied if an English word to be learned sounds like an L1 word and represents a concept that can be pictured. Thus, one student was able to draw a picture of a disgusted teacher throwing failed papers into the air to remember the English word flunk. Since flunk sounds like the French word flanquer (throw, fling) the vivid papers-in-the-air image creates a strong link to the new word and its meaning. But it is clear that many words, especially abstract ones, do not lend themselves to this treatment.


Two strategies that could be applied to any new word were consistently supported in class activities throughout the course. These were dictionary use and concordancing. Support activities for dictionary use included exercises in identifying correct definitions of words that have different senses in different contexts, and comparing dictionaries designed for native speakers to those designed for learners. Students were also shown how to access an on-line dictionary via the Lexical Tutor button in the first column of the home page (see Figure 1).


The Lexical Tutor button also offered learners an easy-to-use on-line concordancing tool (developed by Chris Greaves of Hong Kong Polytechnic University). A concordancer searches a large body of text to find every occurrence of a particular word or phrase and displays these in a format that allows the user to see the many different instances of the word in use. A concordance for the word abandon consisting of 13 instances of the word in use is shown in Figure 4. In principle, a concordance should be a powerful resource for learning. Because the learner can examine a number of sentences containing the new word, chances are that he or she will meet at least one that is easy to understand. If the learner engages in solving the puzzle (i.e. guessing the word's meaning), the concordance offers the opportunity to test a solution in other sentences. Research by Cobb (1997; 1999) has confirmed the usefulness of learning by concordancing. He found that learners who examined concordances were more able to transfer new word knowledge to novel contexts than learners who studied definitions. In the experimental course described in this paper, students were shown on the first day how to use the concordancing tool and then again at several later points. In addition, classroom concordancing activities on paper were designed to raise student awareness of this strategy.


Figure 4

Concordance lines for the word abandon


Challenge learners to study hundreds of new words


Researchers differ on the number of words language learners need to know to readily comprehend textbooks used in university content courses. Research by Laufer (1989, 1992) points to a minimum receptive vocabulary size of 3000 high frequency word families. Work by Hazenbrug and Hulstijn (1996) suggests that the figure may in fact be higher than this but the main message of these studies is clear: the university-bound learner needs to know thousands of words. For many learners, this means acquiring hundreds if not thousands on fairly short order - in a semester or two of ESL study, if possible. Thus there are compelling reasons to make acquiring knowledge of hundreds of new words an explicit course goal, even if it means encouraging learners to resort to studying lists. Rote memorization  tasks are out of fashion in language teaching, but Nation (1982) suggests that currently popular methodologies may be neglecting a powerful learning technique. He points to memory experiments where participants have been found able to learn (and retain) as many as 50 new word and translation-equivalent pairs an hour. Admittedly , this kind of memory work cannot be expected to result in full knowledge such that a learner understands all the senses of new words or is able to use them correctly in elegant original sentences. But we reasoned that some initial engagement - however incomplete - with over a hundred words each week was potentially more useful to university-bound learners than intensive study of the dozen or so items that is more typical of ESL courses.


Unlike the simple word/translation pairs of used in the memory experiments, the on-line lists available for study in our experimental course were richly informative. In addition to offering the definition of a word and its part of speech, each Word Bank entry provided an example of the word in use. Since these sentence contexts generally came from material students had read, we assumed that many entries would also provide memory links to class discussions of newspaper articles, summary writing, and other activities. The on-line concordancing tool meant that students could also study UWL items in a wide variety of sentence contexts. To motivate students to study large numbers of words, we included two exams in the plan for the course, a midterm and a final. For each of these, students were expected to study 400 UWL items, 200 items from the newspaper Word Bank, and 50 Specialist Word Bank items in their particular subject area. These class tests focused narrowly on receptive word knowledge by presenting students with context sentences taken directly from the Word Bank and requiring them to identify the missing target items. 



Research questions


Our investigation of the experimental course focuses on two key issues: the usefulness of the on-line resources, and the amount of new vocabulary knowledge gained by the learners who used them. One way of examining the usefulness of the learning materials is to consider their quality. Indeed the claim that learning vocabulary with a collaborative on-line database is effective rests on showing that students were indeed generating accurate and useful materials for their own learning. We were also interested to see whether the quality of the entries changed as students gained experience in working with dictionaries and selecting examples during the 13-week course. Thus the first research questions we consider are as follows:


1. What was the quality of student-produced on-line course material (the Word Bank entries), and did entries improve over time?


We have argued that an important advantage of a collaborative on-line project is its potential to offer instruction tailored to individual needs. Since the pretesting had shown that needs were indeed highly diverse, we were interested in seeing whether different kinds of learners were using the resource in different ways. Two very distinct constituencies in the group were learners of Asian  and Romance language backgrounds. Romance language speakers are able to exploit cognate knowledge for clues to the meanings of the many English words of Latin and Greek origin, a strategy that is not available to Asian language speakers. Thus, as Laufer (1997) points out, words of Latin origin like perspective or anticipate may look opaque to one learner but totally transparent to another. Exploring how these two groups of learners used the on-line database seemed likely to be a useful initial indicator of how well the on-line resources served the needs of  different types of learners. This prompted the following research question:


2. Did students of Asian and Romance language background enter different types of words? 


The remaining questions address the important issues of amounts of new word knowledge gained in the experimental course and factors that might explain growth results


3. Did learners increase their vocabulary knowledge?


4. Which strategies (e.g. dictionary use) were associated with learning gains?





The 33 students who registered for the experimental vocabulary course at Concordia University (Montreal) represented a variety of first language backgrounds. About two thirds of the group were speakers of Asian languages (Chinese and Vietnamese)  and about one third had Romance language background (Quebec French, Spanish or Portuguese). There was also an Arabic speaker and a Farsi speaker in the group. All had been assessed as having minimal or inadequate proficiency for university studies on a placement test administered by the institution. There was a range of abilities in the group but they can be generally termed intermediate-level learners. Most had been admitted to the university on condition that they take courses to improve their English.





Word Bank entries - the quality question


We used a ratings procedure to investigate the quality of Word Bank entries (research question 1). To evaluate  students' example sentences, we began by selecting at random 40 sentences entered during the second week of the course. Forty more sentences entered during the eleventh week were also selected so that early and late entries could be compared. Next, following a method devised by Beck, McKeown and McCaslin (1983), we deleted the target words and asked a native speaker of English to draw on information in the sentences to supply the missing items. The sentences were then evaluated using the following scheme. If the rater's guess matched or nearly matched the word in the original sentence, the sentence was awarded a score of 4. If a guess showed general similarity to the missing word, the context was considered to be supportive and was awarded a 3. An example of such a sentence appears in Table 4 where we see that the rater supplied anticipating  instead of the original term yearning in the example sentence "I was ___ this trip". Sentences prompting guesses that bore little resemblance to the target words were considered neutral and were awarded a score of 2. Misleading contexts scored 1 point. See Table 4 for examples.


Table 4

Rating scheme for assessing quality of context entries




Student data base entry

Informant's guess



The theater has a seating capacity of 800.


The theater has a seating capacity of 800.





Preminger cropped Jean's hair.


Preminger cut Jean's hair



near exact


I was yearning this trip.


I was anticipating this trip.






He commit himself in writing this book



He excelled himself in writing this book







Her religions-minded parents had met at a science convention.



Her religions-shunning parents had met at a science convention.








To rate the quality of definitions entered in the Word Bank, we selected 40 definitions at random from weeks 2 and 11 of the course. Again, we used a 4-point rating scheme. Definitions that were simply and clearly worded and matched the sense intended in example sentences were awarded the full mark of 4. Definitions that were accurate but contained difficult language were awarded a 3. The wording problem is evident in the case of chutney, where the definition "pungent condiment made of vinegar and fruits" is clearly accurate but of doubtful usefulness because of the potential difficulty of the words pungent and condiment. Uninformative definitions such as the circular one for mythical shown in Table 5 were rated 2 points while definitions that did not match the sense of the example were awarded 1 point.


Table 5

Rating scheme for assessing quality of definition entries



Student data base entry



She mourned for her dead son.



to have or show great sorrow, usually for a person who has died





easy to understand


I enjoy chutney with my turkey.



pungent condiment made of vinegar and fruits





hard to understand


I was yearning this trip.


I was anticipating this trip.






Arthur and Mordred are mythical persons



of or existing in





circular or too long



You'd think Alberta would be bristling with warnings to Ottawa.



thick strong animal hair used to make brushes.





inappropriate sense




Mean scores indicated that the overall quality of context sentences was fairly high (Table x). The mean rating for early contexts amounted to 2.7 while the mean for later contexts was 3.0. That is, late entries earned the score assigned to "supportive" sentences, and the early entries closely approached this level. Thus we can conclude that in general, the student-generated material offered useful context information about new words. These ratings results are higher than the 2.5 mean ratings other studies using this methodology have found for natural texts (Horst, 2000; Zahar, Cobb & Spada, in press), so it is possible to conclude that the student entries succeeded in being more informative than ordinary sentences would be. The higher mean rating for the later data suggests that students improved the quality of their entries as the course progressed. Although, a t-test for independent samples showed that the gain was not statistically significant, it seems reasonable to assume that the rising profile would continue with more time and practice.


Table 6

Quality-of-context ratings (n = 40)



Week 1

Week 11









Definitions also appeared to be of a mainly high quality (Table 7); in fact, the definition results are similar to the sentence findings. The mean ratings of around 3 (the score awarded to accurate definitions with wording difficulties) at the both the beginning and end of the course suggest that the definitions were generally accurate throughout. Again, the data suggest that the quality of definitions improved during the course. Though the difference was not found to be statistically significant, it seems likely that definitions would continue to improve over the longer term. 



Table 7

Quality-of-definition  ratings



Week 1

Week 11









In general, we can conclude that both the example sentences and the definitions students supplied for each other in the Word Bank project were of very satisfactory quality. Interestingly, a few students complained about occasional spelling or grammar errors they spotted in the entries, but none complained about the word information on offer. Our analysis confirms that the information provided in the entries tended to be useful and accurate.  However, some qualifications are in order: The standard deviations suggest that quality was rather inconsistent, and the high but not perfect average scores suggest that students may benefit from training in how to produce clear definitions and supportive example sentences.



Word Bank entries - the individual differences question


To investigate whether students of different L1 backgrounds were using the on-line resources to meet varying vocabulary needs (research question 2), we took a close look at words entered by students in two distinct groups: Asian and Romance language speakers. To compare the words that learners in the two groups looked up, we prepared two corpora of 300 words each. The Asian corpus consisted of the 300 items entered in the Focus Word Bank during the first three weeks of the course by learners whose first language was Chinese or Vietnamese.  The Romance corpus consisted of the 300 items entered by French, Spanish and Portuguese speakers. Each corpus was analyzed using HyperVocabProfile (Cobb, 1998;  based on Nation, 1998), a computer program that groups English words into frequency bands. That is, the program allowed us to see the extent to which students in the two groups looked up common and less common words. Of special interest were the number of look ups in the UWL band (which contains many words of Greco-Latin origin). We hypothesized that the proportion of lookups in this zone would be larger in the Asian group than in Romance group.


Figure 5

Distributions of 300 looked-up words in two L1-based groups (figures in percentages)



The results shown in Figure 5 are striking. If we consider the high frequency categories (the 0 -1000 and 1000-2000 most frequent bands), we see that 12% (7 + 5) of Asian look-ups were common English words. But this category accounts for a far greater proportion of the Romance look ups; in fact, over a quarter (18 + 9 = 27%) of all the words looked up by Romance speakers were in this zone. A possible explanation is the fact that the 0 -1000 and 1000-2000 bands contain a high proportion of words of Anglo-Saxon origin, words which have no cognate equivalents in Romance languages and are therefore more likely to be unfamiliar to Romance speakers than Latin-based English words. The occurrence of common words of Germanic origin like flew, storm and height on the Romance list suggest that this was the case. The notion that learners in the Romance group directed their attention to non-cognates is also confirmed by the third column of data where we see that these learners looked up fewer of the Greek and Latin based UWL items than the Asian learners, for whom these words appear to be difficult.  In summary, it is clear that the two groups were looking up different types of words, and there is reason to think that both groups were well served by a course designed to address individual vocabulary needs.



Vocabulary learning - The growth question


To determine how much students had learned as a result of taking the course and participating in the collaborative Word Bank project (research question 3), we measured students' receptive vocabulary sizes at the beginning and end of the course by administering updated versions of the Vocabulary Levels Test (Schmitt, 2000; Schmitt & Schmitt, forthcoming). As discussed earlier, this instrument is designed to assess receptive knowledge of words sampled from lists of the 2000, 3000, 5000 and 10,000 most common words of English and the Academic Word List (a list similar to the UWL). Vocabulary learning gains were determined by calculating the differences between learners' pre- and posttest scores.


Pre-post results and gains are shown in Table 8. The maximum score possible in each section of the test was 30. While the general picture is largely one of growth, it is also evident that some of the changes are very small. Statistical analysis (ANOVA  and post hoc t-test) showed that mean scores on the Academic Word List section differed significantly (t = 2.62; p < .05). Although the gain of about two new words in this category may appear rather minor, if we extrapolate this result to the entire word list, we see that learners achieved a substantial amount of growth. The gain of 1.73 words in 30 represents a growth rate of 5.8%; when this figure is applied to all 800 words on the UWL, we arrive at a gain figure of about 46 new words (.058 x 800 = 46.13).



Table 8

Pre-and posttest means: Vocabulary Levels Test (n = 28)

























































Clearly, the learners acquired new receptive vocabulary knowledge as a result of studying in the experimental course. As we have seen, increases in knowledge of items on the Academic/University Word List accounted for most of the growth. Since these subtechnical terms are important for university ESL learners to know, we can conclude that the course achieved an important objective. But increased knowledge of UWL items is hardly surprising given the amount of attention given to the UWL in class. Every week students participated in activities to support UWL word learning and studied for a weekly quiz; students studied these items again for midterm and final tests.


This leaves unanswered the question of why evidence of growth was so slight in non-UWL zones - zones that the Word Bank activities were designed to address. One probable explanation is that the Vocabulary Levels Test was not sensitive enough to capture the incomplete but real knowledge that a learner might retain from the experience of reading an on-line definition of a word and a single illustrating example. Work by Horst (2000) has shown that the learning impact of one or two encounters with a new word can be captured but that very sensitive measures are required. Also, the sampling technique used to construct the Vocabulary Levels Test is problematic for the assessment of low-frequency words. For instance, a learner who acquired new knowledge of a word in the 5000-10,000 most frequent band through studying the Word Bank is highly unlikely to encounter that word on a test that samples only 30 items of the 5000 words in the band. Thus there is no reason to conclude that students did not profit from the on-line collaborative activities;  it is highly probable that they did. Rather, the results point to the importance of using sensitive measures to assess vocabulary learning.


Another explanation for slight growth outside the UWL is that the vocabularies of specific academic disciplines remain as yet undefined, so that it is not possible to target the characterizing terms of economics, stage design, or any other discipline for purposes of either teaching or testing. Our students may well have covered much of the lexical territory of their chosen domains in their Specialist Groups activities, but we have no way of measuring this other than to use the 10,000 word zone of the Levels Test.


Keys to success - The strategies question


Although we familiarized students with a variety of proven strategies for learning vocabulary in the course, we limited our investigation to those that met two criteria: 1) strategies that could be applied to any word (see the discussion of keyword above for an instance of a limited strategy),  and 2) strategies that students were familiar with from the first day of the course onwards (i.e. there was ample opportunity to use the strategy). Two traditional strategies (using a monolingual or a bilingual dictionary), and two computerized strategies (using an on-line dictionary and using a concordance) met these criteria.


We used a questionnaire attached to the Vocabulary Levels posttest administered in the final week of the course to explore the extent to which students made use of the various strategies. The questionnaire asked students to rate their use of each according to the following five-point scale:


1 = never

2 = once or twice

3 = fairly often     

4 = very often

5 = almost always


Then, to determine which strategy was most closely associated with learning gains, we entered student ratings of the four strategies into a multiple-regression analysis with pre-post Academic Word List gain scores as the dependent variable.


Traditional dictionary use was clearly more widespread in the group than use of the computer dictionary and concordancing tools. Figure 6 shows the mean ratings of about 3.5 for using bilingual dictionaries (e.g. English-Chinese) and monolingual dictionaries (English-English); in other words, these resources were both used often. Ratings for the non-traditional tools are lower; students appeared to prefer using the on-line dictionary to concordancing.


Figure 6

Strategy use (n = 22)




The regression analysis revealed that the strongest predictor of vocabulary learning outcomes was monolingual dictionary use. However, the use of this strategy is negatively associated with learning gains. The most reasonable interpretation of this finding is that good learners who already knew many words on the UWL (and therefore had little opportunity to register gains on the test) were consistent users of English-English dictionaries. A more interesting finding appears in the second line of Table X where we see that of the remaining predictors concordances use is most strongly associated with vocabulary learning gains (r = .38) albeit with a probability level that suggests some role for chance effects. This finding suggests that even though average concordance use ranged between the "never" and "almost" ratings in the group as a whole, students who did use this feature were likely to experience vocabulary gains.


Table 9

Summary of regression analysis for variables predicting academic vocabulary gains (n = 22)



























We are convinced that the collaborative database is a valuable tool for vocabulary acquisition for learners who have moved beyond the elementary level. The technology is clearly able to absorb the variety of lexical needs that characterize learning at this level. Our students have shown they are willing to use the tools we have developed, that they use these tools reasonably well, and that they learn some words by using them. How many words, we can not say as yet, for reasons mainly related to insensitive or non-existant vocabulary measures. Perhaps the lack of suitable tests for an experiment such as ours is not surprising. Few instructional designs in the past have attempted to instruct words in the numbers we have targeted, so it is reasonable there might be few suitable ways of measuring our results. The 5,000 and 10,000 level of the Leves Test is clearly a blunt instrument - and yet that is where many of our students acquisition needs lay. Tests that can measure fine degrees of knowedge simply, and that target the vocabularies of specific domains, are urgently needed.


We are presently feeding the findings of this experiemnt into a revamped course with an on-line component, which we expect to run again soon. At the same time we want to provide a fully independent Internet version of the course to cater to a virtual clientele worldwide. In fact, our vocabulary learning tools were inadvertently left on the Web when the course was over, and learners from various corners of the world have already started using them and adapting them to their own purposes.







Beck, I. L., McKeown, M. G., & McCaslin, E. (1983).  Vocabulary development: All contexts are not created equal.  Elementary School Journal, 83, 177-181.


Brown, T. S., & Perry, F.L. (1991).  A comparison of three learning strategies for ESL vocabulary acquisition  TESOL Quarterly, 25(4), 655-670.


Cobb, T. (1998).  HyperVocabProfile  [computer program] University of Quebec at Montreal. Adapted from Hwang, K., & Nation, P. (1994).


Cobb, T. (1999). Applying constructivism: A test for the learner-as-scientist. Educational Technology Research & Development, 47 (3), 15-33.


Cobb, T. Is there any measurable learning from hands-on concordancing? System, 25, 301-315.


Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34 (2), 213-238.


Ellis, N. (1997).  Vocabulary acquisition: Word structure, collocation, word-class, and meaning.  In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description, acquisition and pedagogy (pp. 122-139). Cambridge: Cambridge University Press.


Hazenberg, S., & Hulstijn, J. (1996).  Defining a minimal receptive second language vocabulary for non-native university students: An empirical investigation.  Applied Linguistics, 17, 145-163.


Horst, M. (2000).  Text encounters of the frequent kind: Learning L2 vocabulary through reading.  Unpublished doctoral dissertation. University of Wales, Swansea.


Hulstijn, J. H.,  Hollander, M., & Greidanus, M. (1996).  Incidental vocabulary learning by advanced foreign language students: The influence of marginal glosses, dictionary use, and reoccurrence of unknown words  The Modern Language Journal, 80, 327-339.


Hwang, K., & Nation, P. (1998).  VocabProfile. [computer program].  English language Institute, University of Victoria, Wellington, NZ.


Laufer, B. (1989).  What percentage of lexis is necessary for comprehension?  In C. Lauren & M. Norman (Eds.), From humans to thinking machines, pp. 316-323.  Clevedon: Multilingual Matters.


Laufer, B. (1992).  How much lexis is necessary for reading comprehension?  In P.J.L. Arnaud & H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 12-132).  London: MacMillan.


Laufer, B. (1997).  What's in a word that makes it hard or easy: some intralexical factors that affect the learning of words.  In N. Schmitt & M. McCarthy, (Eds.), Vocabulary: Description, acquisition and pedagogy  (pp. 140-155). Cambridge: Cambridge University Press.


Nation, I (1982).  Beginning to learn foreign vocabulary: A review of the research.  RELC Journal, 13(1), 14-36.


Nation, I.S.P. (1990).  Teaching and learning vocabulary.  Boston: Heinle & Heinle.


Nation, P., & Waring, R. (1997).    In N. Schmitt & M. McCarthy, (Eds.), Vocabulary: Description, acquisition and pedagogy  (pp. 6-19). Cambridge: Cambridge University Press.


O'Dell. F. (1997).  Incorporating vocabulary into the syllabus.  In N. Schmitt & M. McCarthy, (Eds.), Vocabulary: Description, acquisition and pedagogy  (pp. 258-278). Cambridge: Cambridge University Press.


Schmitt, N. (2000).  Vocabulary in language teaching.  .Cambridge: Cambridge University Press.


Schmitt, N., & Schmitt, D. (in press).  The Vocabulary Levels Test — Versions 1 and 2.


Read, J. (1997).  Vocabulary and testing  In N. Schmitt & M. McCarthy, (Eds.), Vocabulary: Description, acquisition and pedagogy (pp. 303-320). Cambridge: Cambridge University Press.


Sokmen, A. J. (1997).  Current trends in teaching second language vocabulary.  In N. Schmitt & M. McCarthy, (Eds.), Vocabulary: Description, acquisition and pedagogy  (pp. 237-257). Cambridge: Cambridge University Press.


Sutarsyah, C., Nation, P., & Kennedy, G. (1994).  How useful is EAP vocabulary for ESP? A corpus based study.  RELC Journal, 25(2), 34-50.


Wesche, M., & Paribakht, T.S. (1996).  Assessing vocabulary knowledge: Depth versus breadth.  Canadian Modern Language Review, 53, 13-39.


West, R. F., & Stanovich, K. E. (1991).  The incidental acquisition of information from reading.  Psychological Science, 2(5), 325-329.


Xue, G., & Nation, P. (1984). A university word list. Language Learning and Communication 3, 215-229.


Zahar, R., Cobb, T. & Spada, N. (in review).  Conditions of vocabulary acquisition.



Appendix 1


UWL Quiz 8: linguistics-outcome


Name: ..........................................


 A. Write the number of the word next to its definition (12 points):


1. magic

2. magnetic    __ strange

3. moist          __ wet, damp

4. odd __old, out of date, useless

5. obsolete

6. mature


1. margin

2. orbit                        __ reason to do something

3. null              __ circular movement          

4. navy                        __ empty space at the edge

5. motive

6. momentum


1. mobile

2. maternal                 __ choosing neither side 

3. nuclear                   __ showing clear thinking

4. neutral                    __ able to move

5. normal

6. logical  


1. litigation

2. location                  __ movement, travel

3. migration    __ duty, responsibility

4. notation                  __ way of writing      

5. orientation

6. obligation



B. Cloze.  Choose from the words below to complete the passage (8 points):


outcome         occur               magnitude     

monarch         nutrients          occupy           

maintain         luxuries           obvious          


The Man Who Broke the Bank


            Barings Bank used to be one of the oldest and most respected British investment banks.  It had branches all over the world and many famous customers including the British (1)........................ , Queen Elizabeth.  But  in February  1996 there was bad news at the Singapore branch.  In fact, it looked like Barings was in serious trouble.  At first the (2)...................................... of the problem was not clear.  Nobody knew for sure how many bad investments had been made or how much money was involved, but it soon became (3)...................... that the losses were over $1.3 billion dollars, and so large that the 232-year-old bank was forced to close with great losses to its customers.

            How could such a disaster (4)...................?  How was it possible for such a respected and trusted institution to have made such mistakes?  The top management of Barings promised a thorough investigation and they soon found out who was responsible: a young trader called Nick Leeson. 

            Here is his story:  Leeson had done very well at Barings and had received huge bonuses and rapid promotions for his excellent performance.  He came from a very ordinary working-class English family, and he and his wife enjoyed their new life and the (5)...................................... that came with wealth and success in Singapore.  They ate at the finest restaurants and played tennis at the best club.  Leeson was determined to (6)...................................... his record of success at the bank.

            His method was simple.  He made very large, very risky investments for Barings in the hopes that there would be enormous profits.  If there were losses, he entered them in a secret account, and hoped to pay off the growing debt with profits from the next investment success.  But the debts increased so fast that Leeson lost control.  By the time the fraud was discovered, it was too late to save the bank.

            Barings Bank was forced to close, but what was the (7)..................................... for Nick Leeson?  He was given a prison sentence of just six and a half years for his crime, and he did not seem to be very sorry about what he did.  How did he (8...................................... his time in prison?  He spent most of his time writing.  Recently, he published a book called Rogue Trader--How I Brought Down Barings Bank and Shook the Financial World.  Buy it and read it if you like, but remember:  Your purchase is helping to pay the legal bills of a thief.

            --from the Toronto Globe & Mail, March 1996



Appendix 2

Tests were placed online for students to check immediately after completion, using an authoring script developed by Chris Greaves.