The old vocabulary, the new vocabulary, and the Arabic learner

 

Paper version of vocabulary symposium presentation

TESOL Arabia, Dubai, March 2006

 

Tom Cobb

8 April 2006

Montreal

 


Web Pre-Publication - please do not cite this version without permission

1 Introduction

 

Many of us who taught EFL and ESP in the Gulf in the 1980s felt we had no useful research to guide our efforts or shed light on the problems we faced in our jobs. Most of us had some training in teaching second languages (L2s), and L2 reading in particular, which seemed to be our principle task we had been hired for. But none of this training seemed to make much sense in the various pre-medical or pre-engineering etc. language centres of the new universities in Riyadh, Muscat, or Dubai. We were basically working without a plan.

 

Nor did it seem that instructors even when hired for their research backgrounds were particularly encouraged to undertake research in their jobs. For one thing, the teaching day was back-to-back double and triple periods followed by a noonday evacuation for prayer, siesta, sport, or video watching. For another thing, institutions did not appear to encourage investigations that might lead to comparisons between Gulf learners and other learners, presumably to the disadvantage of the former. It was generally believed that the learners were very weak and that any investigation could well contain an element of ridicule.

 

I was therefore surprised to gradually discover that during this time and against the odds a coherent body of Gulf learner research was in fact emerging. Some of it came from local institutions like Sultan Qaboos University, where an all-star cast of applied linguists and Arabists had brought together to develop its Language Centre. Some of it came from established neighboring institutions, like the American Universities of Beirut and Cairo. Some of it came from British and other universities where former expatriate teachers and then increasingly Gulf nationals investigated some aspect of Gulf language training for their doctoral studies. This research, when brought together, is not only a high quality and coherent body of work but, I will argue, played a key role in the discovery of The New Vocabulary that most of us at this conference are probably adherents of.

 

Key principles of this new vocabulary include these:

 

Lexical knowledge is the strongest predictor of reading ability (and inability)

 

Lexis is not a filler for syntactic slots but rather syntax is an emergent property of lexis

 

Some zones of lexis are more important to know than others for different tasks

 

Different degrees are lexical knowledge are needed for different tasks

 

Lexical knowledge does not come for free in a second language

 

Lexical acquisition requires more exposures than natural input provides

 

Lexical processing and acquisition are not identical across orthographies

 

I am not arguing that all these complex and seminal ideas were invented by ESL teachers in the Gulf! I am arguing however that ESL teachers and former ESL teachers working with Arabic speaking learners played a significant role in the development of many of them. But let me begin my case with a description of The Old Vocabulary, the set of assumptions we brought to the teaching of reading in the early days of Gulf EFL and ESP that often led to less than ideal results.

 

 

 

 

 

2 The Old Vocabulary

 

In the 1980s, we weren't so much working without a plan as working with a wrong plan. When modern versions of applied linguistics emerged in the late 1960s and 1970s, there was a need to provide some sort of rationale for the burgeoning international industry of English language teaching. Much of this rationale was initially provided by borrowing ideas from apparently related disciplines. Principle among these were General Linguistics and L1 reading theory. Interestingly, while neither of these disciplines gave much space to vocabulary learning, both made strong assumptions about it.

 

Linguists like and following Chomsky (e.g., 1959) believed the acquisition of mother-tongue (L1) syntax to be the great human achievement, the dividing line between man and beast. This acquisition was inexplicable in terms of any general learning theory, including the various kinds of associationism, and especially behaviorism. Quite extensive vocabularies, on the other hand, could be learned by chimpanzees or other mammals with spare capacity in their craniums simply by linking their various needs to coloured tokens or other word-like signs.

 

Applied linguists in the 1970s, in an attempt to fit in with these dominant ideas, threw out some useful but limited ideas about L1-L2 transfer that had been conceived within a loosely behaviorist framework (Lado, e.g. 1957; Corder, 1967) and instead toiled in countless papers to show how constructs like Universal Grammar or the Language Acquisition Device could be made to relate to the various phenomena of second language (L2) acquisition (e.g., Dulay & Burt, 1974). Classroom teachers either ignored what they saw as irrelevant theorizing, or else dutifully sought ways to incorporate linguistics thinking into their classrooms—despite Chomsky's own disclaimer that none of it had anything to do with language teaching. Either way, anything more than passing interest in vocabulary teaching gradually came to seemed seriously misplaced, and the topic more or less disappeared. The last of the great vocabulary course books was Helen Barnard's Advanced English Vocabulary re-published for the last time in 1972 (yet probably the most photocopied textbook in the Gulf for decades to follow) with nothing comparable to replace it that I know of until Redman and Ellis' A Way with Words in 1991.

 

Also gone was any real emphasis on areas of language use other than speech, including reading. The linguists' interest was all in childhood acquisition of the syntax of L1 speech, upon which reading would be a subsequent and relatively uninteresting add-on. In the acquisition of an L2, of course, particularly by adults, reading is quite likely to play a rather different role. For one thing, it may be less of an add-on and more of a main objective for an adult learner, as it was in most of the Gulf training programs, where reading professional manuals ranked higher in the learning objectives than chatting with expatriates. A more detailed account of reading was obviously needed in applied linguistics than had seemed important in general linguistics, and the early versions of this were borrowed, fairly uncritically (as proposed by Grabe, 1991), from L1 reading theory.

 

In the 1970s one particular account of L1 reading development had taken precedence in mother-tongue language education, the reading-for-meaning, holistic, top-down model proposed by (among others) Kenneth Goodman (1967) and Frank Smith (1971) under the title of “reading as a psycholinguistic guessing game.” By this account, all the cognitive bits and pieces that go into reading and learning to read—all the strategies and knowledge components, including vocabulary knowledge—would fall into place of their own accord through implicit natural deductions from sustained acts of meaningful reading. In other words, all the vocabulary one would ever need could be pleasurably acquired through inference from context, and there was no need to teach it or even plan for it in any detailed way.

 

This version of reading was clearly an idea that applied mainly to young L1 learners, especially to high SES (socio-economic status) young L1 learners who had been raised in literate households. It presupposed a high volume of reading, plentiful availability of materials, a well modeled motivation, and most of all a sufficient length of time for words to be met and re-met, hypotheses about word meanings and functions tested and revised, and so on. And yet this model of reading was quickly imported, for lack of better, into L2 thinking over the 1970s and 1980s. It was not supported by any very convincing evidence, although it did appear to go rather well with the emerging communicative approach in L2 teaching for which there did seem to be some evidence (e.g., Long & Porter, 1984).

 

Goodman was eventually persuaded to devote some time to developing more explicitly the L2 version of his guessing game. A number of L2 principles were soon derived from the L1 work, notably the theory of linguistic universals (Goodman, 1973; Coady, 1979), according to which reading in a second language was the same as reading in first, and transfer of reading ability from L1 to L2 will be automatic. Again, no special emphasis on vocabulary was needed.

 

Oddly, this L1 oriented view of reading came quickly under attack in L1 research itself and has existed in a kind of research war with less holistic approaches ever since (including phonics; reviewed in Adams, 1990). A leading L1 reading researcher, Stanovich (1980; 2000), found in study after study that skill in lexical processing of words out of context was by far the best predictor of competent reading, and that the various forms of top-down or expectation-driven processing were in fact the strategies of weak readers not strong. But guessing theory had already become dominant in applied linguistics, and at least in ESL/EFL teacher training courses remains so to this day. To wit, every training program has a pedagogical grammar course, but few have a pedagogical vocabulary course. It is still almost universally held that an adequate vocabulary can be built through natural exposures in meaningful context—an odd notion that can easily be contradicted by any observer of language classrooms, where more than 50% of the time is invariably spent explaining word meanings. The notion was apparently appealing enough to overrule common sense and plain observation, but fortunately it did not overrule the researchers.

 

Applied linguistics reading research has moved very far from the Smith-Goodman, L1-is-L2 model in almost every way. An extensive body of reading research is now highly developed within specifically L2 terms and is quite careful what it borrows from L1 thinking. For just one example, the notion of a straightforward transfer of reading ability from L1 to L2, when investigated empirically, proved to be a bit more complex than that. Alderson's (1984) research discovered that, far from being automatic, the transfer of L1 abilities (such as effective guessing of new words in context) takes place only after a threshold of L2 knowledge has been crossed. What this threshold consists of has occupied many researchers ever since (Bernardt & Kamil, 1995; Bossers, 1991), but one thing seems clear in studies dating from the 1990 to the present (e.g., Gelderen et al, 2004), that the main plank in this knowledge threshold is knowledge of L2 vocabulary. Put simply, this means that L2 learners have to know some (in principle) specifiable amount of L2 vocabulary before any reading skills or strategies they may have in L1 will become accessible in their L2, or before they can either read with acceptable comprehension or learn any significant amount of new vocabulary through reading. This apparently obvious message has now been demonstrated in numerous experiments, and whole books have been written to convince teachers and course designers of the truth of it (e.g., Nation, 2001).

 

Nor was it only reading that apparently had some need for a lexical input. Syntax itself has now been shown to depend on a threshold of lexical knowledge, although as yet this is less clearly specified than it has been for reading (to be discussed below). Indeed the dependence appears to go further than that. Knowledge of syntax it now seems is in fact rooted in the properties of words, as opposed to being a set of free-floating abstractions into which words are slotted. This is an idea being worked out by both L2 and L1 acquisition researchers (e.g., Bates & Goodman, 2001), and even linguists in the minimalist vein (Chomsky, 1995), but it is possibly a reversal of the usual direction of L1-L2 inheritance inasmuch as key figures on both sides rally to our own Michael Lewis' (1993) slogan that “language consists of grammaticalised lexis not lexicalised grammar.”  

 

It now seems quite astounding that an enterprise involving and affecting so many people as L2 reading could have been launched from such a weak footing, or how so many otherwise intelligent people would not have seen its inappropriateness. It was a case of the emperor's new clothes on a grand scale. Maybe the mismatch was not apparent with high SES European or North American learners acquiring cognate languages, but as a Gulf ESL instructor I always found it extremely anomalous to be driving students though grammar exercises couched in words they had never seen, or giving them only loosely graded reading texts with every other word a look-up, or teaching whole lessons in guessing from context where the words in the contexts were no more likely to be known than the word to be guessed. Thankfully during this period researchers like Alderson (1984) were busy finding the exit from this scenario—not by borrowing theories from quasi-related disciplines, but through clear questions and empirical research.

 

Interestingly, Alderson had spent much of his career not only in classrooms, but also in classrooms full of learners with non-cognate L1 backgrounds, including Arabic speaking learners. This particular link may be a coincidence. Nonetheless it inspired me to notice that a lot of the hard spadework tunneling out of linguistics-psycholinguistics was performed by teachers or former teachers working with Arab learners, and further, that this work has now expanded to become general applied linguistics theory. That is because if you look carefully, the special problems of the Arabic learner are just a high visibility case of the usual problems of any language learner.

 

This was particularly true as instructed language learning expanded beyond the domain of spies in training and preparation for foreign vacation to become a high risks game for life for Vietnamese boat people, evacuees of the Iranian revolution, and many others who suddenly had to function in English—and of course with the coming on-stream of a large proportion of the Arabian Gulf youth population.

 

 

3 The Gulf learner & the old vocabulary

 

Before we look at what was learned from the Arabic learner, let us examine what the Arabic learner did not learn from us in the days of the old vocabulary. The typical Gulf ESP course of the 1980s consisted of working dutifully through the grammar zones from a grammar-based placement test, but randomly through the vocabulary zones from a totally unknown vocabulary base. Reading passages were, in line with L1 notions outlined above, chosen on the basis of somebody's idea of the typical learner's interests rather than any careful analysis of whether the text could profitably be read at all, or read in what way—as intensive reading or extensive reading, for learning to read or for reading to learn, and so on. For example, Figure 1 shows a text from the course book Headway (Soars & Soars, 1991, p. 74) in use with low intermediate learners in Sultan Qaboos University in Oman.

Figure 1

The Observer newspaper recently showed how easy it is, given a suitable story and a smattering of jargon, to obtain information by bluff from police computers. Computer freaks, whose hobby is breaking into official systems, don't even need to use the phone. They can connect their computers directly with any database in the country. Computers do not alter the fundamental issues. But they do multiply the risks. They allow more data to be collected on more aspects of our lives, and increase both its rapid retrievability and the likelihood of its unauthorized transfer from one agency which might have a legitimate interest in it, to another which does not. Modern computer capabilities also raise the issue of what is known in the jargon as 'total data linkage' the ability, by pressing a few buttons and waiting as little as a minute, to collate all the information about us held on all the major government and business computers into an instant dossier on any aspect of our lives.

 


And Figure 2 shows a Vocabprofile (Laufer & Nation, 1995; www.lextutor.ca/vp/) of the same text, breaking it down into 1000 (blue), 2000 (green), Academic Word List (yellow) and Off-list (red) lexical frequency components.

 

Figure 2

 

 

the observer newspaper recently showed how easy it is given a suitable story and a smattering of jargon to obtain information by bluff from police computers computer freaks whose hobby is breaking into official systems do not even need to use the phone they can connect their computers directly with any database in the country computers do not alter the fundamental issues but they do multiply the risks they allow more data to be collected on more aspects of our lives and increase both its rapid retrievability and the likelihood of its unauthorized transfer from one agency which might have a legitimate interest in it to another which does not modern computer capabilities also raise the issue of what is known in the jargon as total data linkage the ability by pressing a few buttons and waiting as little as a minute to collate all the information about us held on all the major government and business computers into an instant dossier on any aspect of our lives

 

 

To summarize the color coding, 73.81% of the tokens in this text are in the first 1000 words of English, 7.74% are in the second 1000, 11.31% are Academic Word List (AWL) words, and 7.14% are not in any of these lists and hence are quite low frequency. This and related information is summarized in Table 1.

 

Table 1

 

 

 

Families

Types

Tokens

Tokens Percent

K1 Words (1 to 1000):

67

73

124

73.81%

K2 Words (1001 to 2000):

12

12

13

7.74%

AWL Words (academic):

11

14

19

11.31%

Off-List Words:

?

11

12

7.14%

 

90+?

110

168

100%

 

 

 

And here (in Table 2) are the vocabulary test results for a typical group of the learners this text is designed to be used with.

 

Table 2

 

 Level

2000

3000

5000

Academic

10,000

 Student 1

 27%

22%

17%

0%

0%

 Student 2

39

22

11

27

22

 Student 3

33

27

11

11

0

 Student 4

33

44

17

27

17

 Student 5

27

17

5

22

5

 Student 6

27

17

0

5

5

 Student 7

50

33

22

0

0

 Student 8

27

11

22

5

0

 Student 9

33

33

17

11

11

 Student 10

39

17

0

0

0

 Student 11

33

17

11

17

0

 MEAN %

33.5

23.6

12.1

11.4

5.5

 S. Dev.

7.1

9.7

7.8

10.5

7.9

 

Note: The test is Nation's (1990) original Vocabulary Levels Test, which did not at that time include a 1000 level although this has now been added (Nation 2001; for a functioning version of this test, see http://www.lextutor.ca/tests/levels/recognition/2-10k/).

 

Comparing text to test at just the 2000 and Academic Word List (AWL; Coxhead, 2000) levels, the text contains 7.74% and 11.31% of lexis at these levels, respectively, while the learners know 33.5% and 11.4% of all the words at these levels, respectively. Here (in Figure 3) is this same text as it looks to a student who knows 35% of its one thousand 2000-zone word families (all 13 words removed except these five: information (x 2), phone, police and government) and about 10% of the 580 AWL families (everything removed except for members of the computer family).

 

Figure 3

 

The Observer newspaper recently showed how easy it is, given a _______ story and a _______ of _______, to _______ information by _______ from police computers. Computer _______, whose _______ is breaking into official systems, don ' t even need to use the phone. They can _______ their computers directly with any _______ in the country. Computers do not _______ the _______ _______. But they do _______ the _______. They allow more _______ to be _______ on more _______ of our lives, and increase both its rapid _______ and the likelihood of its _______ _______ from one agency which might have a _______ interest in it, to another which does not. Modern computer _______ also raise the _______ of what is known in the _______ as ' total _______  _______ ' the ability, by pressing a few _______ and waiting as little as a minute, to _______ all the information about us held on all the _______ government and business computers into an _______  _______ on any _______ of our lives.

 

Note: Level-gapped text made at www.lextutor.ca/cloze/vp_cloze/

 

As the tables and figures above combine to show, these learners are already weak at the 2000 vocabulary level, but their reading assignment comprises about 20% of its lexis from well beyond that level, or in other words has well over one unknown word in five. For these learners, it seems intuitively clear (and research confirms it below) that trying to read this text is not only a difficult and discouraging task, but also one that can be predicted yield little learning—other than the random pick-up of a few odd words that will rarely be met again. Indeed that is exactly the vocabulary that these learners have. As the Levels Test results in Table 2 show, these learners have a smattering of words at all levels, but have on average only about (2000 x 33.5% =) 670 words at the 2000 level itself. In fact, they had more words beyond the 2000 level than within it, the results of random word pick-up that this type of exercise invites.

 

It is hardly any wonder, then, that our learners spent half or more of their reading time writing Arabic translations between the lines of texts such as these (as shown in Figure 4). Our response to this as reading teachers was not to teach them words in any systematic way, but rather to insist they try to guess from the meanings of words in context. Of course we had no idea what conditions would make such guessing possible, for example how many words would have to be known in the context before taking a guess would begin to be feasible.

 

Figure 4

 

 

 

 

In fact, the coursebook this text comes from, in addition to most of those in use at the time, did claim to include a vocabulary emphasis, but in fact there was almost no consistent presentation of the words of any frequency level in any of them, nor a sufficient amount of recycling of the few words there were for any consistent learning to take place. The following table, which was prepared for a previous TESOL Arabia more than a decade ago (Al-Ain, 1995), shows the extent of coverage and recycling of a series of typical course books in use in the late eighties at several Gulf university language centers and units.

 

Ten randomly chosen 20-word samples of the 2400 word Cambridge word frequency lists (Hindmarsh, 1980), on which one the students' tests was based (the Cambridge Preliminary English Test, or PET, discussed below), were compared against the back-of-book vocabulary lists at the back of three major course books (from Cambridge itself as well as Cobuild and Headway), and as a control also against the Longman LDOCE dictionary's 2000-word defining vocabulary. As Table 3 shows, by the end of the second book in each series, none of these course books exposes the students to much more than half the words on this basic list. And even that says nothing about the number of exposures provided for each word (10 seem necessary for learning) or the learning conditions provisioned with each exposure (in terms of words known in the environment). Both topics will be treated in the breakout session.

Table 3

 

CA1

CA2

CA3

CO1

CO2

HE1

HE2

LDOCE

Sample 1

6

9

11

6

14

4

10

19

Sample 2

3

6

10

4

9

3

9

13

Sample 3

6

9

11

4

13

4

9

14

Sample 4

2

6

8

4

9

1

5

10

Sample 5

8

8

10

3

8

7

8

9

Sample 6

4

12

14

7

11

6

8

16

Sample 7

6

9

11

6

10

8

11

16

Sample 8

2

5

10

3

10

6

10

11

Sample 9

5

5

10

5

11

5

10

13

Sample 10

8

11

12

7

10

10

10

12

 

 

 

 

 

 

 

 

 

TOTAL

50

80

107

49

105

54

90

133

MEAN

5

8

10.7

4.9

10.5

5.4

9

13.3

S.D.

2.2

2.4

1.6

1.5

1.8

2.6

1.7

3.1

% of Hindmarsh Words

25

40

53.5

24.5

52.5

27

45

66.5

(CA=Cambridge; CO=COBUILD; HE=Headway; LDOCE=Longman defining vocabulary)

 

So if the “vocabulary emphasis” of these course books was not words per se, then what was it? It was, of course, strategies, principally the strategy of guessing new word meanings from word parts and from semantic context. In threshold terms, this and related strategies were almost certain to be already functioning in the L1 (and hence not in need of training), just unavailable in L2 for want of a threshold knowledge base. It would seem one needs to teach words rather than strategies.

 

Much of the we-have-no-words problem might have gone unnoticed were it not for the introduction in several Gulf institutions of standardized English reading tests. This story played out in different ways in different places, but at Sultan Qaboos University (SQU) in Oman the arrival of the Cambridge Preliminary English Test (PET) rather quickly disclosed the scale of our failures. The PET was the most rudimentary test available from the University of Cambridge’s Language Examinations Service (UCLES). It was based on the Cambridge basic lexicon of 2387 words (Hindmarsh, 1980) and couched in a content that was too simple for our students’ level of maturity. They nonetheless failed the test in droves, took more courses and failed it gain, and too often were eventually expelled from colleges to retire to their families in some degree of disgrace.

 

An advantage that came from the advent of standardized testing, at least at SQU, was that teachers began investigating the problem through action research projects with their learners and came up with some interesting though admittedly piecemeal findings. I began investigating my own students' vocabulary sizes with Nation's Vocabulary Levels Test (1990; VLT), with typical results as already shown above (Figure 2). Another instructor, Arden-Close, observed his learner' interactions with a content instructor, later publishing a paper (1993) with a heart-rending account of the lecturer's attempts to come to grips with his students' incomprehension.

 

Arden-Close describes a chemistry lecturer in a classroom discussion backing up further and further in a search for common lexical ground. Trying to convey the idea of a "carbon fluoride bond" the lecturer tries a succession of progressively more common analogies: teflon pans, a tug of war, an assembly line, all to no avail. In the light of the vocabulary size-testing undertaken subsequently, it is no wonder; pan, war, line and other words from the 2000 list were no doubt themselves unknown, let alone any imaginative compounds derived therefrom. In another classroom postmortem, a biology lecturer describes searching for a common analogy to convey "hybridization" and in the process indicates the real level of the problem:

The first time I gave a hybridization analogy, I talked about dogs, and then I switched to goats; and then it even dawned on me that some of them aren't going to be in touch with the fact that if you mix two different kinds of goats they come out looking in between, and I didn't know all the specific terms there, what their two different breeds of goats are called. You can talk about [mixing] colors, but a lot of them don't know their colors yet (p. 258, emphasis added).

Countless similar unrecorded interchanges took place over the years.


So was the student's lack of words for the
colours a sign that they were some kind of poor learners, or just a sign that no one had taken the trouble to make sure they had covered the basic words of the language? In fact, the students were quite adept at word learning. My VLT work with these learners showed they had often picked up quite a bit of knowledge at off frequency zones throughout the lexicon, just not necessarily at the higher frequency zones (where basic terms like the names of the colors are found). As Table 2 above shows, most learners knew more words beyond the 2000 level than within it—words they had clearly invested in learning, but words that in most cases would be met rarely if ever again.

 

And most interesting, as instructors we started paying attention to what our learners were trying to tell us—indirectly in their little lexical annotations, but also more directly. Figure 5 is a journal entry from a student in 1993, writing to an imaginary friend who would soon be entering the University and facing the PET reading test.

 

 

 

 

 


Figure 5

 

Dear N.,

I heard that you are going to join the College of Commerce and Economics after you finish your high school. I have a lot to tell you about this college. The first and important thing is the PET test. You must pass this test so you can continue your studies in the College. The PET test is not easy as it seems. It is so difficult and we have to do a lot to pass it.... The English that we learned at school is too easy and it's nothing compared with the English in the University. Let me tell you about myself as an example.

I thought that I knew English and really in the school I was from the three best students in the class in English. But here my English is nothing, then I thought I learned nine years English in the school but I don't have any knowledge and I don't know anything about real English. I really don't know the fault from who. ...

Your friend, F.

 

 

The fault from who? In retrospect the answer seems clear enough.

 

As instructors we clearly did not have the tools needed for the size of our undertaking. We sometimes had the technological tools, but lacked the appropriate conceptual tools to make much use of them. What we had, in fact, was a hand-me-down conceptual toolkit that had been devised for other purposes by linguists and L1 researchers that served our learners ill. Fortunately, our piecemeal efforts were not the only investigations under way into the reading and vocabulary problems of the Arab EFL/ESP learner. Others were looking at these same issues in programs of more extended and theoretically motivated research and were already fashioning a more useful toolkit.

 

 

 

 

4 The real psycholinguistics of the Arab learner's lexical processing

 

This section will re-examine some of the planks in The Old Vocabulary platform in the light of subsequent research involving learners from Arabic and other typologically different orthographies (Chinese, Japanese, etc.)

 

4.1 Guessing the meanings of new words in context is easy

 

Laufer and Sim (1985) rather than accepting the efficacy of guessing new word meanings on faith actually took the trouble to investigate whether guessing word meanings is a reliable way to build a vocabulary in a second language or not. In a series of experiments with Arabic and Hebrew speaking ESL learners, they determined that guessing is actually quite a messy business with unreliable results. This finding was subsequently replicated many times with L2 learners (e.g., several of the studies in Huckin, Haynes & Coady, 1993) and eventually even with L1 learners (Schatz & Baldwin, 1986).

 

To investigate this same phenomenon in a more natural and extended instructional setting, Horst, Cobb & Meara (1998) undertook an extensive reading study with Omani academic learners. Learners similar to those already described (n=24) were tested for the number of new words they had learned from reading a whole abridged novel of over 20,000 words. The study took pains to tighten the methodology relative to a number of similar studies in order to register the maximum learning possible. The average amount of learning from this experience for these learners turned out to be an average of about five words. As the authors note, at this rate the journey from a minimal to a functioning lexicon (of 5000) words would involve an investment of more than ten years.

 

Laufer (1989) went on to seek the conditions of reasonably reliable contextual inference. Arabic and Hebrew speaking learners read texts with various proportions of unknown vocabulary (15%, 10%, etc) and then had their comprehension of the same texts measured. Successful comprehension was found to be reliable only when 95% of words in the text were to some extent known. And, predictably, new word inference becomes reliable at the same point. This finding seemed to give some specification to Alderson's (1984) notion of a threshold (also discussed in this context by Cobb & Horst, 2003). Needless to say the search for a threshold has not been concluded (see new work by Nation, in press), with the proportion of known to unknown words needed for competent reading tending to rise rather than fall in more recent studies with tighter methodologies.

 

The Levels Test mentioned in the previous section would suggest that Middle Eastern learners are well below any functioning lexical threshold however defined. This hint was later confirmed in a number of area PhD studies testing both learners and instruments. Al-Hazemi (1993) worked extensively with Meara’s Yes-No vocabulary test in a Saudi Arabian setting, finding the effective vocabulary size of graduates in a military academy to be well under 1,000 words, putting their lexical familiarity with an English text of average difficulty at somewhere below 70%, almost grotesquely short of Laufer’s 95%. In the Levels work mentioned above the test did not have a 1000 level at that time, an omission since rectified (discussed in Nation 1993; delivered Nation, 2001; online at www.lextutor.ca/levels ).

 

 

4.2 Reading processes are universal across languages

 

Koda (1988; 2005) investigated the basic cognitive processes underpinning reading in different orthographies, namely Arabic, Chinese, Spanish, and English, and found that far from being universal these were highly language specific. For example, lexical access, word recognition, and the juggling of top-down and bottom-up information sources are quite different for different orthographies. Further, cognitive level strategies developed for processing one orthography will almost inevitably be used when reading in a different orthography even if wildly inappropriate or counter-productive, in a phenomenon called cognitive process transfer.

 

Abu-Rabia and Seigel (1995) gave some detail to Koda’s picture in studies showing that reading in Arabic always involves a greater and different attention to ambient context than reading in English, in view of the lack of short vowels in Arabic orthography and hence the greater inherent ambiguity of written words and the resulting need for a heightened attention to context to disambiguate the word in an extra step on the way to lexical access.

 

Other researchers deepened the locus of difference from the cognitive down to the even more basic perceptual plane. In a series of studies Randall and Meara (e.g., 1988) investigated possible sources of Arabic EFL learners' notoriously inaccurate spelling. They hypothesized that an inheritance from L1 might be working against these learners, such that words are actually perceived in different ways in the two languages. This is again related to the missing vowels, which the researchers reasoned could cause a different locus of information in the printed word. The spelling errors they were thinking of almost always involved a problem with vowel placement, such as writing cereals for curls, which they gave the name “vowel blindness.”

 

Randall and Meara asked English and Arabic speakers to look at randomly presented strings of o's and decide as quickly as possible whether or not there was also an x in the string. The strings were meant to serve as abstract, semantics-free words. They might include the strings shown in Figure 6.

 

Figure 6

 

ooooo  oxooo  oooox  ooxoo oooox

 

Average reaction times were recorded for both groups across different positions of the x. A fast reaction time for, say, an x at the left of the string and a slower time for one at the right would suggest that the subject normally paid more attention to the left side of written words, presumably because that was the side normally bearing the information needed to make sense of a text (as is he case in English). The null hypothesis would be that any differences were individual or random. In the event, however, Randall and Meara found that each language group had its own consistent and distinct profile. English speakers appeared to sample the string from left to right, with three points of emphasis—a strong one at the left of the word, a slightly weaker one at the end, and a still weaker one in the middle, in an “M” shaped curve. Arabic speakers, on the other hand, sampled the strings from the centre first, with much less attention to either end, in a “U” shaped curve. Both are shown in Figure 7. The M shape presumably indicates a sequential processing with most attention given to the ends and beginnings of words, while the U shape indicates a whole-word processing in which certain details might predictably get lost—like the number and position of vowels in a word like curls.  A practical application of these ideas to language teaching and testing is discussed in Ryan and Meara (1995).

 

Figure 7: Roman M- and Arabic U-shaped response time profiles

 

 

 

 

4.3 All cost and no benefit?

 

It would be quite remarkable if all these differences in cognitive processes provided nothing but disadvantages to the Arabic learner when transferred to another language. For example, might not Abu-Rabia and Siegel's (1995) finding of the greater attention to context in the processing of Arabic play a positive role at some point in lexical processing of English, possibly at the point of lexical acquisition though inference? In fact, in a classic lexical inferencing study by Haynes (1983), learners from four language groups including an Arabic group inferred word meanings from text, where transparent clues to meaning had been placed in either the local context or the global context. Global context refers to a context that extends over several sentences or even paragraphs of text, while local context refers to the immediate sentence or even phrase. Global context is of course more demanding of memory, requiring retention of information for delayed integration, but is the more typical locus of information in a natural inferencing task. In the experiment, local context was found to be moderately useful to learners from all language groups, but global context was useful to the Arabic group only.

 

But has it not already been stated that the Arabic EFL learner exposed the fallacy that inferring new word meanings from context is easy (e.g., Laufer & Sim, 1985)? That is true, but it has also been suggested that a reason inferring is difficult is that more than the critical 95% of words in a typical inferring context are themselves also unknown, as would often be the case with learners having incomplete knowledge of even the basic 1,000 and 2,000 frequency levels (as was shown above to be the case by Cobb, 1997, with Omani learners; Al-Hazemi, 1993, with Saudi Arabian learners; and several others in different Gulf locations throughout the 1990s). Analysis with Vocabprofile shows that even instructional academic texts typically draw only 70% to 75% of their lexis from the 2000 level; in other words, these learners are facing far greater proportions of unknown lexis than just 5%, and this—rather than lack of skill with inferencing—would be the first explanation for the poor inferencing problem.

But the global-local consideration puts an extra twist on this one. The 95% rule will affect global contexts, a potential strength of the Arabic learner, more than local contexts. In a local context, involving only a few words to the right and left of a target word, a lucky match between the words in the context and the words a learner happens to know are likely to be frequent. In a global context, involving several sentences, the chances for multiple unknowns are far better.

Here is a mini-experiment in text analysis to make this point. The text about computer security mentioned above was broken into sentences and these were fed through Vocabprofile individually. The text as a whole has a 1k, 2k, AWL, and Off-list profile of 74%, 8%, 11%, and 7%, respectively, which makes it a difficult text for learners who are lexicon-building in the 2000 zone. But the profiles of individual sentences from the same text may not be the same. Figure 4 shows the sentences profiled individually. If word meanings are distributed in the global context, then this is a difficult text. But if they are contained within single sentences, then a new word lodged in Sentence 5 (with 81% 1k words) is almost certainly easier to interpret than one lodged in Sentence 4 (with only 43% 1k words).

Table 4

 

 

Percent tokens

Sentences

S1

S2

S3

S4

S5

S6

K1 Words (1 to 1000):

69

76

75

43

81

74

K2 Words (1001 to 2000):

12

6

8

0

5

7

AWL Words (academic):

8

6

8

57

7

14

Off-List Words:

11

12

8

0

7

5

 

 

This example could be somewhat difficult to implement, but in principle there must be usable instructional strategies that would compensate for the negative transfers from Arabic to English and build on the positive. In my break-out session I hope to provide examples of both.

To summarize, many of our ideas about teaching EFL and ESP reading in the Gulf in the 1980s were inadequate in specifiable ways. Teaching Arabic speaking learners to read English was a far more complex enterprise than we imagined, or than Chomskyan linguistics or an L1-oriented reading pedagogy could offer much help with. In the meantime, L2-oriented research has developed remarkably quickly, and has told us where to look for a lot of the problems we were having. As yet the pedagogical implications of this research remain under development. Nonetheless, teachers and researchers working with Arabic learners have not only discovered problems but also solutions and again these have proven useful far beyond the original Arabic-learner context. 

 

 

 

 

 

5 From problems to solutions

 

Once you have determined that your particular group of learners has unique learning challenges and opportunities, you enter the world of homemade instructional design. Commercial providers of instructional materials have an interest in promoting universalist models of SLA not necessarily because they believe in them but because the cost of doing otherwise would be enormous. But in any case, as mentioned, the pedagogical implications of some of this uniqueness is far from obvious.

 

The Middle East region has seen the development of many, many instructional design projects in EFL and ESP. Many of these were used a few times and quickly disappeared, or remain to this day housed in metal cabinets in resource centers, sometimes because they were not based on a careful needs analysis, or sometimes because they were committed to paper or other hard media without sufficient piloting. Some however went on to become key components in the ESL toolkit which no one any longer associates with their origin in the Arab world.

 

One implication from the research cited above that seems unmistakable is that the vocabulary needs of Arabic learners must be organized and planned for because they will not be met by magic. Instructors at the American University of Beirut began working with this idea in the 1970s, and they were early explorers of the notion of coverage. More than a decade before Laufer's (1989) 95% coverage finding, Praninskas (1972) and her colleagues in Beirut worked with corpora and frequency lists as aids in selecting materials, writing materials, checking the lexis of examinations, and designing vocabulary courses. While such courses were apparently successful, these researchers were nonetheless surprised to find that even with the most frequent and recurring 2000 words known, learners continued to face difficulties in reading academic texts. There was presumably a further lexical challenge somewhere the basic 2000 words and the specialized lexicon of a particular domain. In the light of subsequent Vocabprofile research, of course, 2000 words provides only 80% coverage of average texts and usually less in academic texts, as opposed to anything like 95%, so this is no surprise.

 

Praninskas and colleagues developed a sophisticated computer analysis of their learners' academic texts and identified a further high frequency zone within that genre—providing the seeds of a methodology later taken up by Xue and Nation (1984) to produce the University Word List, and in turn by Coxhead (2000) to produce the Academic Word List. The idea of these lists is to increase coverage substantially beyond 80% without (a) entering into the lexicon of specific domains, and (b) without imposing an impossible learning burden. The current state of this longstanding project is a streamlined list of 570 word families that reliably raises coverage of academic texts by a further 10% over the 80% already provided by the first 2000 word families (or sometimes more—the text shown in Figure 8 is formed of more than 13% yellow AWL words).

 

Figure 8

 

 

relativistic heavy ion physics is of international and interdisciplinary interest to nuclear physics particle physics astrophysics condensed matter physics and cosmology the primary goal of this field of research is to re create in the laboratory a novel state of matter the quark gluon plasma qgp which is predicted by the standard model of particle physics quantum chromodynamics to have existed ten millionths of a second after the big bang origin of the universe and may exist in the cores of very dense stars star searches for signatures of quark gluon plasma formation and investigates the behavior of strongly interacting matter at high energy density by focusing on measurements of hadron production over a large solid angle it utilizes a large volume time projection chambers tpc for tracking and particle identification in a high track density environment star will measure many observables simultaneously on an event by event basis to study signatures of a possible qgp phase transition and the space time evolution of the collision process at their respective energy the goal is to obtain a fundamental understanding of the microscopic structure of hadronic interactions at the level of quarks and gluons at high energy densities star is one of two large scale experiments under construction at the relativistic heavy ion collider rhic at the national laboratory bnl on for operation in number it has been designed to focus primarily on hadronic observables and features a large acceptance for high precision tracking and momentum analysis at center of mass c m rapidity specific to rhic will be significantly increased particle production thousands of particles produced hard parton parton scattering in heavy ion collisions

 

 

 

The AWL can help teachers find suitable texts, design vocabulary courses—or it can be given to learners to work with by themselves. In my breakout session I will be suggesting some ways of working with the AWL in a networked computing environment.

 

The AWL was again an idea that developed first in response to the needs of Arabic learners but then found a ready market in the larger ESL world beyond. One can now buy excellent books that contextualize and deliver the AWL in an effective manner for anyone who needs it, such as Schmitt and Schmitt's (2005) Focus on Academic Vocabulary: Mastering the AWL, which is now used throughout the Gulf area. So in the end the commercial publishers do get around to meeting local needs, as long as these turn out to be general needs after all.

 

A similar circuit has been taken by my own approaches to using the computer as a vocabulary tutor. Ideas that I developed for my Omani learners in the mid 1990s were later aired online via The Compleat Lexical Tutor Internet website, which has gradually built up a daily clientele of more than 1,000 unique users worldwide, and was then gradually rediscovered in the Middle East, which now accounts for about 30% of its user base. The principle is the same: the Arabic learner highlighted the point that vocabulary needs to be taught, but it was point that needed to be made on behalf of all learners.

 

In my breakout and regular conference sessions I will show and tell some computational ideas that I designed to meet some of my Arabic learners' needs, and that I believe capitalize on some of their strengths.

 

 

 

 

References

Abu Rabia, S. & Siegel, L. (1995). Different orthographies different context effects: The effects of Arabic sentence context in skilled and poor readers. Reading Psychology 16 (1), 1-19.

Adams, M. (1990). Beginning to read: Thinking & learning about print. Cambridge MA: MIT.

Alderson, C. (1984). Reading in a foreign language: A reading problem or a language problem? In J.A. Alderson & A.H. Urquhart (Eds.), Reading in a Foreign Language (pp. 1-27). London: Longman.

Al-Hazemi, A. (1993). Low-level EFL vocabulary tests for Arabic speakers. PhD dissertation, Centre for Applied Language Studies, University of Wales, Swansea.

Arden-Close, C. (1993). Language problems in science lectures to non-native speakers. English for Specific Purposes 12, 251-261.

Barnard, Helen. (1972). Advanced English Vocabulary: Workbooks. Rowley, Mass. Newbury House.

Bates, E., & Goodman, J. (2001). On the inseparability of grammar and the lexicon: Evidence from acquisition. In M. Tomasello & E. Bates (Eds.), Language development: The essential readings (pp. 134-162). Malden, MA: Blackwell Publishers.

 

Bernhardt, E. & Kamil, M. (1995). Interpreting relationships between L1 & L2 reading: Consolidating the linguistic threshold and the linguistic interdependence hypotheses. Applied Linguistics 16, 15-34.

 

Bossers, B. (1991). On thresholds, ceilings and short-circuits: The relation between L1 reading, L2 reading, and L2 knowledge. AILA Review 8, 45–60.

Cambridge University. (1990). Preliminary English Test. Cambridge Local Examinations Syndicate: International examinations.

 

Coady, J. (1979). A psycholinguistic model of the L2 reader. In R. Mackay, R. Barkman & J. Jordan (Eds.), Reading in a second language (pp. 5-12). Rowley MA: Newbury House.

 

Chomsky, N. (1959). Review of B.F. Skinner, Verbal behavior. Language 35, 26-58.

Chomsky, N. (1995). The Minimalist Program. Cambridge, MA: MIT Press.

Cobb, T., & Horst, M. (2001). Reading academic English: Carrying learners across the lexical threshold. In J. Flowerdew & M. Peacock (Eds.) The English for Academic Purposes Curriculum (pp. 315-329). Cambridge: Cambridge University Press.

Cobb, T.M. (1995). Imported tests: Analysing the task. Paper presented at TESOL (Arabia). Al-Ain, United Arab Emirates, March.

Corder, P. 1967: The significance of learner errors. International Review of Applied Linguistics (IRAL) 5, 2/3: 161-170.

Coxhead, A. (2000). A new academic word list. TESOL Quarterly 34, 213-238.

Dulay, H. and M. Burt.1974: Natural sequences in child second language acquisition. Language Learning 24, 37-53.

Gelderen, A. van, Schoonen, R., Glopper, K. de, Hulstijn, J., Simis, A. Snellings, P. & Stevenson, M. (2004). Linguistic knowledge, processing speed and metacognitive knowledge in first and second language reading comprehension; a componential analysis. Journal of Educational Psychology, 96 (1), 19-30.

Ghadessy, M. (1979). Frequency counts, word lists, and materials preparation: A new approach. Forum 17 (1), 24-27.

Goodman, K.S. (1967). Reading: A psycholinguistic guessing game. Journal of the Reading Specialist 6, 126-135.

Goodman, K.S. (1973). Psycholinguistic universals in the reading process. In F. Smith (Ed.), Psycholinguistics and reading (pp. 21-29). New York: Holt, Rinehart, & Winston.

Horst, M., Cobb, T., & Meara, P. (1998). Beyond A Clockwork Orange: Acquiring Second Language Vocabulary through Reading. Reading in a Foreign Language 11 (2), 207-223.

Huckin, T., Haynes, M. and Coady, J. (Eds.) (1993). Second Language Reading and Vocabulary. Norwood, NJ.: Ablex.

 

Koda, K. (1988). Cognitive processes in second-language reading: Transfer of L1 reading skills and strategies. Second Language Research 4 (2), 133-156.

Koda, K. (2005). Insights into Second Language Reading: A Cross-Linguistic Approach. New work: Cambridge University Press.

 

Lado, R. (1957). Linguistics across cultures. Ann Arbor: University of Michigan Press.

Laufer, B. (1989). What percentage of text-lexis is essential for comprehension? In C. Lauren & M. Nordman (Eds.), Special language: From humans thinking to thinking machines (pp. 316-323). Clevedon, UK: Multilingual Matters.


Laufer, B. and Sim, D. (1985). Taking the easy way out: non-use and misuse of contextual clues in EFL reading comprehension. English Teaching Forum 23 (2): 7-10, 20.

Long, M. H., & Porter, P. (1985). Group work, interlanguage talk, and second language acquisition. TESOL Quarterly 19, 2, 207-27.

Nation, P. (1993), Measuring readiness for simplified reading: A test of the first 1000 words of English. RELC 31, 193-203.


Nation, P. (In press.) How many words do you need to be able to read? Second Vocabulary Special Volume, Canadian Modern Language Review, to appear December 2006.

Nation, P. (2001). Learning Vocabulary in Another Language.  Cambridge: Cambridge University Press.

Praninskas, J. (1972). American University Word List. London: Longman.

Randall, M. & Meara, P. (1988). How Arabs Read Roman Letters. Reading in a Foreign Language, 4, 133-145.

Redman, S., & Ellis, R. (1991). A way with words: Vocabulary development activities for learners of English, Book 3. Cambridge: Cambridge University Press.

Ryan, A. & Meara, P. (1995). A diagnostic test for 'vowel blindness' in Arabic speaking learners of English (discussion paper). University of Wales. [Available http://www.swan.ac.uk/cals/calsres/vlibrary/arpm96c.htm on April 8, 2006.]

Schatz, E. K., & Baldwin, R. S. (1986). Context clues are unreliable predictors of word meanings. Reading Research Quarterly, 21, 439-453.

Schmitt, N., & Schmitt, D. (2005). Focus on vocabulary: Mastering the Academic Word List. White Plains NY: Longman

Smith, F. (1971). Understanding reading: A psycholinguistic analysis of reading and learning to read. New York: Holt, Rinehart, & Winston.

Soars, J., & Soars, L. (1991). Headway (Vols. 1, 2 and 3). London: Oxford University Press.


Stanovich, K. (1980). Toward an interactive-compensatory model of individual differences in the development of reading fluency. Reading Research Quarterly 16, 32-71.

Stanovich, K. (2000). Progress in understanding reading: Scientific Foundations & New Frontiers. New York: Guilford Press.

Swan, M., & Walter, C. (1990). The new Cambridge English course (Vols. 1, 2, and 3). Cambridge: Cambridge University Press.

Xue, G. & Nation, P. (1984). A university word list. Language Learning and Communication, 32, 215-219.