POSTSCRIPTUM: THE FOLLOWING TEXT WAS WRITTEN IN ABOUT 2008. WHAT WOULD BE DIFFERENT IF IT WERE RE-WRITTEN IN 2018?VocabProfile is a computer program that performs lexical text analysis. It takes any text and divides its words into four categories by frequency in the language at large not necessarily in the text itself: (1) the most frequent 1000 words of English, (2) the second most frequent thousand words of English, i.e. 1001 to 2000, (3) the academic words of English (the AWL, 550 words that are frequent in academic texts across subjects), and (4) the remainder which are not found on the other lists. In other words, VP measures the proportions of low and high frequency vocabulary used by a native speaker or language learner in a written text. A typical NS result is 70-10-10-10, or 70% from first 1000, 10% from second thousand, 10% academic, and 10% less frequent words. This relatively simple tool has been useful in understanding the lexical acquisition and performance of second language learners.
The Web version of this program lacks some features of the original off-line program, as developed by Batia Laufer and Paul Nation, and in its latest version known as Range. For example, WebVP does not allow you to input several texts at the same time and keep track of which texts are contributing to which parts of the profile. Nor does it handle extremely large texts, and even moderately large texts move fairly slowly through the server-side processing. To dowload Range from Paul Nation's website, click here.
Some research studies using the VP/Range are summarized below.
This study introduces and validates VP as a research instrument. The study first discusses problems associated with other approaches to automatic measurement of lexical richness of texts. One of these, for example, is type-token ratio analysis, which seeks to identify the number of different words appearing in a text, but (1) it tells us nothing about the quality (frequency) of the words, and (2) its results are known to vary with text length [demo]. On the other hand, VP analysis is frequency based and does not vary systematically with text length [demo]. But what useful work can VP do? Several predictions about vocabulary acquisition are tested and affirmed using VP.
- VP score is reliable across two texts by the same learner (provided genre is the same).
- VP score correlates with an independent measure of vocabulary knowledge (Nation's Levels Test which follows the same categorization).
- VP score predicts broader language proficiency measure (learners at three proficiency levels have significantly different VP scores).
Asked to evaluate the lexical component of a popular BBC English course, Meara subjected course contents to a VP analysis and determined that virtually all the words learners would be exposed to came from the 0-1000 band of English, and further that that this did not change as the course moved through the levels from beginner to intermediate, etc. By contrast, a translated version of the French comic book TinTin would expose learners to a rich and varied vocabulary.
Is the typical "communicative" Quebec ESL classrooms a good place to meet lots of new and interesting vocabulary? Analyzing typescripts of classroom interaction with VP, these researchers found that virtually all of the vocabulary offering across several classrooms consisted of basic items from the 0-1000 frequency level, and further that the lexical richness of these communicative classrooms was no different from that of a sampling of 1970's audiolingual classrooms (where vocabulary was typically restricted to focus on structure and pronunciation).
Reflecting previous interest in both word associations (Meara, 1978, Learners' word associations in French, Interlanguage Studies Bulletin 3, 192-211), as well as tests to measure passive vocabulary knowledge (the Yes/No Test, Meara & Buxton, 1987, An alternative to multiple choice vocabulary tests, Language Testing, 4, 142-154), this study proposes an idea for an active vocabulary test using the VP framework. If learners' produce several associations for each of 30 words, rather than the usual one association, then this will force them to the outer bound of their lexicons. Then, if these associations are run through VP, the output should provide an economic measure of active vocabulary.
Asian and Francophone learners contributed word look-ups to a collective database of academic vocabulary. For each group, 300 look-ups were run through VP for to look for any consistent look-up differences related to L1. Roughly twice as many Francophone look-ups were from the 0-2000 frequency zone; roughly twice as many Asian lookups were from the UWL zones.
VP analysis of hundreds of trainees' texts at Concordia, UQAM, and teacher training establishments in Vietnam initially suggests that the vast majority of NNS trainees in all categories are themselves lexically equivalent to those of lower intermediate learners (with 90% of lexis deriving from the 1-1000 band). Further, broader linguistic competence and likelihood of success in a TESL program seem better predicted by the proportion of lexis drawn from post-1000 frequency zones than by other predictors. It is proposed that VP analysis be considered as part of an admission test for TESL trainees.
Abstract: ESL/EFL teacher training programmes make considerable demands on university resources in terms of administrative time and money. On-campus training segments require relatively low professor/student ratios, while the organization and supervision of off-campus practice teaching placements is extremely labour intensive. Because of the financial loss to a university incurred when TESL student attrition rates are high, the selection of suitable candidates at the outset of any programme of study is extremely important. The purpose of this study is to explore the potential for using vocabulary profiles as predictors of academic and pedagogic success in the case of TESL trainees. A slightly modified version of Nation's profiler, developed by Cobb (1999), was used to establish a vocabulary profile for each of 140 Canadian TESL trainees. The trainees varied widely in age and in linguistic background, but all had native or close to native command of spoken English as determined by an entrance interview. Argumentative essays of approximately 500 words in length - written as part of the student selection process - provided the texts that were entered into the profiler. For each text the percentage of words falling into the first 1000 (K1), the second 1000 (K2), the Academic Word List (AWL) and, finally, the off-list category (OL) was calculated. Significant correlations were found between academic performance in the training programme and K1 (negative correlation), AWL (positive correlation) and AWL + OL (positive correlation). On the basis of these findings, the predictive potential of the profiler and possible applications will be discussed.
One ineresting use of VP (I believe the main one for users of online VP) is to evaluate the suitability of reading texts for various levels of learners. Nation's Levels Test and his VP scheme are nicely complementary for text and task selection purposes. For example, if a particular group of learners were strong at the 1001-2000 level, yet weak at all levels beyond that, then they might usefully read texts that present about 5% of their lexical offering at the Academic Word List level. Such texts are relatively simple to find/create with the aid of VP. These would be lexically challenging but manageable (thus a defined lexical i+1). Research has shown that comprehension and learning through inference are facilitated when novel lexis is not greater than 5% of tokens (one novel word per 20 familiar words).
Another unstudied but important use of VP is to shed light on the relationship between learners' passive and active vocabulary knowledge. In teaching vocabulary, it is important but difficult to distinguish between introducing vs. activating word knowledge. If we knew that learners were scoring high at the top end of the Levels Test (AWL and beyond) yet producing texts with VP ratings of 95-5-0-0 then we would know that the task was one of activation.
And not to leave out the proactive learner, VP can be used by students to check on the range and density of their own vocabulary production. If students paste a text into VP they can see how their lexical profiles stack up against native speaker texts (as found on the Internet, or using the samples that come with VP).