The old vocabulary, the new
vocabulary, and the Arabic learner
Paper version of vocabulary
symposium presentation
TESOL Arabia, Dubai, March
2006
Tom Cobb
8 April 2006
Montreal
1 Introduction
Many of us who taught EFL and
ESP in the Gulf in the 1980s felt we had no useful research to guide our
efforts or shed light on the problems we faced in our jobs. Most of us had some
training in teaching second languages (L2s), and L2 reading in particular,
which seemed to be our principle task we had been hired for. But none of this
training seemed to make much sense in the various pre-medical or
pre-engineering etc. language centres of the
new universities in Riyadh, Muscat, or Dubai. We were basically working without
a plan.
Nor did it seem that
instructors even when hired for their research backgrounds were particularly
encouraged to undertake research in their jobs. For one thing, the teaching day
was back-to-back double and triple periods followed by a noonday evacuation for
prayer, siesta, sport, or video watching. For another thing, institutions did
not appear to encourage investigations that might lead to comparisons between
Gulf learners and other learners, presumably to the disadvantage of the former.
It was generally believed that the learners were very weak and that any
investigation could well contain an element of ridicule.
I was therefore surprised to
gradually discover that during this time and against the odds a coherent body
of Gulf learner research was in fact emerging. Some of it came from local
institutions like Sultan Qaboos University, where an all-star cast of applied
linguists and Arabists had brought together to develop its Language Centre.
Some of it came from established neighboring institutions, like the American
Universities of Beirut and Cairo. Some of it came from British and other
universities where former expatriate teachers and then increasingly Gulf
nationals investigated some aspect of Gulf language training for their doctoral
studies. This research, when brought together, is not only a high quality and
coherent body of work but, I will argue, played a key role in the discovery of
The New Vocabulary that most of us at this conference are probably adherents
of.
Key principles of this new
vocabulary include these:
Lexical
knowledge is the strongest predictor of reading ability (and inability)
Lexis
is not a filler for syntactic slots but rather syntax is an emergent property
of lexis
Some
zones of lexis are more important to know than others for different tasks
Different
degrees are lexical knowledge are needed for different tasks
Lexical
knowledge does not come for free in a second language
Lexical
acquisition requires more exposures than natural input provides
Lexical
processing and acquisition are not identical across orthographies
I am not arguing that all
these complex and seminal ideas were invented by ESL teachers in the Gulf! I am
arguing however that ESL teachers and former ESL teachers working with Arabic
speaking learners played a significant role in the development of many of them.
But let me begin my case with a description of The Old Vocabulary, the set of
assumptions we brought to the teaching of reading in the early days of Gulf EFL
and ESP that often led to less than ideal results.
2 The Old Vocabulary
In the 1980s, we weren't so
much working without a plan as working with a wrong plan. When modern versions
of applied linguistics emerged in the late 1960s and 1970s, there was a need to
provide some sort of rationale for the burgeoning international industry of
English language teaching. Much of this rationale was initially provided by
borrowing ideas from apparently related disciplines. Principle among these were
General Linguistics and L1 reading theory. Interestingly, while neither of
these disciplines gave much space to vocabulary learning, both made strong
assumptions about it.
Linguists like and following
Chomsky (e.g., 1959) believed the acquisition of mother-tongue (L1) syntax to
be the great human achievement, the dividing line between man and beast. This
acquisition was inexplicable in terms of any general learning theory, including
the various kinds of associationism, and especially behaviorism. Quite
extensive vocabularies, on the other hand, could be learned by chimpanzees or
other mammals with spare capacity in their craniums simply by linking their
various needs to coloured tokens or other word-like
signs.
Applied linguists in the
1970s, in an attempt to fit in with these dominant ideas, threw out some useful
but limited ideas about L1-L2 transfer that had been conceived within a loosely
behaviorist framework (Lado, e.g. 1957; Corder, 1967) and instead toiled in
countless papers to show how constructs like Universal Grammar or the Language
Acquisition Device could be made to relate to the various phenomena of second
language (L2) acquisition (e.g., Dulay & Burt, 1974). Classroom teachers
either ignored what they saw as irrelevant theorizing, or else dutifully sought
ways to incorporate linguistics thinking into their classrooms—despite Chomsky's
own disclaimer that none of it had anything to do with language teaching.
Either way, anything more than passing interest in vocabulary teaching
gradually came to seemed seriously misplaced, and the topic more or less
disappeared. The last of the great vocabulary course books was Helen Barnard's Advanced
English Vocabulary re-published for the last time in 1972 (yet probably the
most photocopied textbook in the Gulf for decades to follow) with nothing
comparable to replace it that I know of until Redman and Ellis' A Way with
Words in 1991.
Also gone was any real
emphasis on areas of language use other than speech, including reading. The
linguists' interest was all in childhood acquisition of the syntax of L1
speech, upon which reading would be a subsequent and relatively uninteresting
add-on. In the acquisition of an L2, of course, particularly by adults, reading
is quite likely to play a rather different role. For one thing, it may be less
of an add-on and more of a main objective for an adult learner, as it was in
most of the Gulf training programs, where reading professional manuals ranked
higher in the learning objectives than chatting with expatriates. A more
detailed account of reading was obviously needed in applied linguistics than
had seemed important in general linguistics, and the early versions of this
were borrowed, fairly uncritically (as proposed by Grabe, 1991), from L1
reading theory.
In the 1970s one particular
account of L1 reading development had taken precedence in mother-tongue language
education, the reading-for-meaning, holistic, top-down model proposed by (among
others) Kenneth Goodman (1967) and Frank Smith (1971) under the title of
“reading as a psycholinguistic guessing game.” By this account, all the
cognitive bits and pieces that go into reading and learning to read—all the
strategies and knowledge components, including vocabulary knowledge—would fall
into place of their own accord through implicit natural deductions from
sustained acts of meaningful reading. In other words, all the vocabulary one
would ever need could be pleasurably acquired through inference from context,
and there was no need to teach it or even plan for it in any detailed way.
This version of reading was
clearly an idea that applied mainly to young L1 learners, especially to high
SES (socio-economic status) young L1 learners who had been raised in literate
households. It presupposed a high volume of reading, plentiful availability of
materials, a well modeled motivation, and most of all a sufficient length of
time for words to be met and re-met, hypotheses about word meanings and
functions tested and revised, and so on. And yet this model of reading was
quickly imported, for lack of better, into L2 thinking over the 1970s and
1980s. It was not supported by any very convincing evidence, although it did
appear to go rather well with the emerging communicative approach in L2
teaching for which there did seem to be some evidence (e.g., Long & Porter,
1984).
Goodman
was eventually persuaded to devote some time to developing more explicitly the
L2 version of his guessing game. A number of L2 principles were soon derived
from the L1 work, notably the theory of linguistic universals (Goodman, 1973;
Coady, 1979), according to which reading in a second language was the same as
reading in first, and transfer of reading ability from L1 to L2 will be
automatic. Again, no special emphasis on vocabulary was needed.
Oddly, this L1 oriented view
of reading came quickly under attack in L1 research itself and has existed in a
kind of research war with less holistic approaches ever since (including
phonics; reviewed in Adams, 1990). A leading L1 reading researcher, Stanovich
(1980; 2000), found in study after study that skill in lexical processing of
words out of context was by far the best predictor of competent reading, and
that the various forms of top-down or expectation-driven processing were in
fact the strategies of weak readers not strong. But guessing theory had already
become dominant in applied linguistics, and at least in ESL/EFL teacher
training courses remains so to this day. To wit, every training program has a
pedagogical grammar course, but few have a pedagogical vocabulary course. It is
still almost universally held that an adequate vocabulary can be built through
natural exposures in meaningful context—an odd notion that can easily be
contradicted by any observer of language classrooms, where more than 50% of the
time is invariably spent explaining word meanings. The notion was apparently
appealing enough to overrule common sense and plain observation, but
fortunately it did not overrule the researchers.
Applied linguistics reading
research has moved very far from the Smith-Goodman, L1-is-L2 model in almost
every way. An extensive body of reading research is now highly developed within
specifically L2 terms and is quite careful what it borrows from L1 thinking.
For just one example, the notion of a straightforward transfer of reading
ability from L1 to L2, when investigated empirically, proved to be a bit more
complex than that. Alderson's (1984) research discovered that, far from being
automatic, the transfer of L1 abilities (such as effective guessing of new
words in context) takes place only after a threshold of L2 knowledge has been
crossed. What this threshold consists of has occupied many researchers ever
since (Bernardt & Kamil, 1995; Bossers, 1991), but one thing seems clear in
studies dating from the 1990 to the present (e.g., Gelderen et al, 2004), that
the main plank in this knowledge threshold is knowledge of L2 vocabulary. Put
simply, this means that L2 learners have to know some (in principle)
specifiable amount of L2 vocabulary before any reading skills or strategies
they may have in L1 will become accessible in their L2, or before they can
either read with acceptable comprehension or learn any significant amount of
new vocabulary through reading. This apparently obvious message has now been
demonstrated in numerous experiments, and whole books have been written to
convince teachers and course designers of the truth of it (e.g., Nation, 2001).
Nor was it only reading that
apparently had some need for a lexical input. Syntax itself has now been shown
to depend on a threshold of lexical knowledge, although as yet this is less
clearly specified than it has been for reading (to be discussed below). Indeed
the dependence appears to go further than that. Knowledge of syntax it now
seems is in fact rooted in the properties of words, as opposed to being a set
of free-floating abstractions into which words are slotted. This is an idea
being worked out by both L2 and L1 acquisition researchers (e.g., Bates &
Goodman, 2001), and even linguists in the minimalist vein (Chomsky, 1995), but
it is possibly a reversal of the usual direction of L1-L2 inheritance inasmuch
as key figures on both sides rally to our own Michael Lewis' (1993) slogan that
“language consists of grammaticalised lexis not lexicalised grammar.”
It now seems quite astounding
that an enterprise involving and affecting so many people as L2 reading could
have been launched from such a weak footing, or how so many otherwise
intelligent people would not have seen its inappropriateness. It was a case of
the emperor's new clothes on a grand scale. Maybe the mismatch was not apparent
with high SES European or North American learners acquiring cognate languages,
but as a Gulf ESL instructor I always found it extremely anomalous to be
driving students though grammar exercises couched in words they had never seen,
or giving them only loosely graded reading texts with every other word a
look-up, or teaching whole lessons in guessing from context where the words in
the contexts were no more likely to be known than the word to be guessed.
Thankfully during this period researchers like Alderson (1984) were busy
finding the exit from this scenario—not by borrowing theories from
quasi-related disciplines, but through clear questions and empirical research.
Interestingly, Alderson had
spent much of his career not only in classrooms, but also in classrooms full of
learners with non-cognate L1 backgrounds, including Arabic speaking learners.
This particular link may be a coincidence. Nonetheless it inspired me to notice
that a lot of the hard spadework tunneling out of linguistics-psycholinguistics
was performed by teachers or former teachers working with Arab learners, and
further, that this work has now expanded to become general applied linguistics
theory. That is because if you look carefully, the special problems of the
Arabic learner are just a high visibility case of the usual problems of any
language learner.
This was particularly true as
instructed language learning expanded beyond the domain of spies in training
and preparation for foreign vacation to become a high risks game for life for
Vietnamese boat people, evacuees of the Iranian revolution, and many others who
suddenly had to function in English—and of course with the coming on-stream of
a large proportion of the Arabian Gulf youth population.
3 The Gulf learner & the
old vocabulary
Before we look at what
was learned from the Arabic learner, let us examine what the Arabic learner did
not learn from us in the days of the old vocabulary. The typical Gulf ESP
course of the 1980s consisted of working dutifully through the grammar zones
from a grammar-based placement test, but randomly through the vocabulary zones
from a totally unknown vocabulary base. Reading passages were, in line with L1
notions outlined above, chosen on the basis of somebody's idea of the typical
learner's interests rather than any careful analysis of whether the text could
profitably be read at all, or read in what way—as intensive reading or
extensive reading, for learning to read or for reading to learn, and so on. For
example, Figure 1 shows a text from the course book Headway (Soars &
Soars, 1991, p. 74) in use with low intermediate learners in Sultan Qaboos
University in Oman.
Figure 1
The Observer
newspaper recently showed how easy it is, given a suitable story and a
smattering of jargon, to obtain information by bluff from police computers.
Computer freaks, whose hobby is breaking into official systems, don't even need
to use the phone. They can connect their computers directly with any database
in the country. Computers do not alter the fundamental issues. But they do multiply
the risks. They allow more data to be collected on more aspects of our lives,
and increase both its rapid retrievability and the likelihood of its
unauthorized transfer from one agency which might have a legitimate interest in
it, to another which does not. Modern computer capabilities also raise the
issue of what is known in the jargon as 'total data linkage' the ability, by
pressing a few buttons and waiting as little as a minute, to collate all the
information about us held on all the major government and business computers
into an instant dossier on any aspect of our lives.
And Figure 2 shows a Vocabprofile (Laufer &
Nation, 1995; www.lextutor.ca/vp/) of the same text, breaking it down into 1000 (blue),
2000 (green), Academic Word List (yellow) and Off-list (red) lexical frequency
components.
Figure 2
the observer newspaper
recently showed how easy it is given a suitable story
and a smattering of jargon to
obtain information by
bluff from police computers
computer freaks whose hobby
is breaking into official systems do not even need to use the phone
they can connect their computers
directly with any database in
the country computers do not alter
the fundamental issues but
they do multiply the risks
they allow more data to be collected
on more aspects of our lives and increase
both its rapid retrievability and
the likelihood of its unauthorized transfer from
one agency which might have a legitimate interest in it to another
which does not modern computer capabilities also
raise the issue of what is known in the jargon
as total data linkage the
ability by pressing a few buttons and waiting as little as a
minute to collate all the information
about us held on all the major government
and business computers into an instant
dossier on any aspect of
our lives
To summarize the color
coding, 73.81% of the tokens in this text are in the first 1000 words of
English, 7.74% are in the second 1000, 11.31% are Academic Word List (AWL)
words, and 7.14% are not in any of these lists and hence are quite low
frequency. This and related information is summarized in Table 1.
Table 1
|
|
Families |
Types |
Tokens |
Tokens
Percent |
|
K1 Words (1 to 1000): |
67 |
73 |
124 |
73.81% |
|
K2 Words (1001 to 2000): |
12 |
12 |
13 |
7.74% |
|
AWL Words (academic): |
11 |
14 |
19 |
11.31% |
|
Off-List Words: |
? |
11 |
12 |
7.14% |
|
|
90+? |
110 |
168 |
100% |
And here (in Table 2) are the
vocabulary test results for a typical group of the learners this text is
designed to be used with.
Table 2
|
Level |
2000 |
3000 |
5000 |
Academic |
10,000 |
|
Student 1 |
27% |
22% |
17% |
0% |
0% |
|
Student 2 |
39 |
22 |
11 |
27 |
22 |
|
Student 3 |
33 |
27 |
11 |
11 |
0 |
|
Student 4 |
33 |
44 |
17 |
27 |
17 |
|
Student 5 |
27 |
17 |
5 |
22 |
5 |
|
Student 6 |
27 |
17 |
0 |
5 |
5 |
|
Student 7 |
50 |
33 |
22 |
0 |
0 |
|
Student 8 |
27 |
11 |
22 |
5 |
0 |
|
Student 9 |
33 |
33 |
17 |
11 |
11 |
|
Student 10 |
39 |
17 |
0 |
0 |
0 |
|
Student 11 |
33 |
17 |
11 |
17 |
0 |
|
MEAN % |
33.5 |
23.6 |
12.1 |
11.4 |
5.5 |
|
S. Dev. |
7.1 |
9.7 |
7.8 |
10.5 |
7.9 |
Note:
The test is Nation's (1990) original Vocabulary Levels Test, which did not at
that time include a 1000 level although this has now been added (Nation 2001;
for a functioning version of this test, see http://www.lextutor.ca/tests/levels/recognition/2-10k/).
Comparing text to test at
just the 2000 and Academic Word List (AWL; Coxhead, 2000) levels, the text
contains 7.74% and 11.31% of lexis at these levels, respectively, while the
learners know 33.5% and 11.4% of all the words at these levels, respectively.
Here (in Figure 3) is this same text as it looks to a student who knows 35% of
its one thousand 2000-zone word families (all 13 words removed except these
five: information (x 2), phone, police and government) and
about 10% of the 580 AWL families (everything removed except for members of the
computer family).
Figure 3
The Observer newspaper recently
showed how easy it is, given a _______ story and a _______ of _______, to
_______ information by _______ from police computers. Computer _______, whose
_______ is breaking into official systems, don ' t even need to use the phone.
They can _______ their computers directly with any _______ in the country.
Computers do not _______ the _______ _______. But they do _______ the _______.
They allow more _______ to be _______ on more _______ of our lives, and
increase both its rapid _______ and the likelihood of its _______ _______ from
one agency which might have a _______ interest in it, to another which does
not. Modern computer _______ also raise the _______ of what is known in the
_______ as ' total _______ _______ '
the ability, by pressing a few _______ and waiting as little as a minute, to
_______ all the information about us held on all the _______ government and
business computers into an _______
_______ on any _______ of our lives.
Note:
Level-gapped text made at www.lextutor.ca/cloze/vp_cloze/
As the tables and figures
above combine to show, these learners are already weak at the 2000 vocabulary
level, but their reading assignment comprises about 20% of its lexis from well
beyond that level, or in other words has well over one unknown word in five.
For these learners, it seems intuitively clear (and research confirms it below)
that trying to read this text is not only a difficult and discouraging task,
but also one that can be predicted yield little learning—other than the random
pick-up of a few odd words that will rarely be met again. Indeed that is
exactly the vocabulary that these learners have. As the Levels Test results in
Table 2 show, these learners have a smattering of words at all levels, but have
on average only about (2000 x 33.5% =) 670 words at the 2000 level itself. In
fact, they had more words beyond the 2000 level than within it, the results of
random word pick-up that this type of exercise invites.
It is hardly any wonder,
then, that our learners spent half or more of their reading time writing Arabic
translations between the lines of texts such as these (as shown in Figure 4).
Our response to this as reading teachers was not to teach them words in any
systematic way, but rather to insist they try to guess from the meanings of
words in context. Of course we had no idea what conditions would make such
guessing possible, for example how many words would have to be known in the
context before taking a guess would begin to be feasible.
Figure 4

In fact, the coursebook this
text comes from, in addition to most of those in use at the time, did claim to
include a vocabulary emphasis, but in fact there was almost no consistent
presentation of the words of any frequency level in any of them, nor a
sufficient amount of recycling of the few words there were for any consistent
learning to take place. The following table, which was prepared for a previous
TESOL Arabia more than a decade ago (Al-Ain, 1995), shows the extent of
coverage and recycling of a series of typical course books in use in the late
eighties at several Gulf university language centers and units.
Ten randomly chosen
20-word samples of the 2400 word Cambridge word frequency lists (Hindmarsh,
1980), on which one the students' tests was based (the Cambridge Preliminary
English Test, or PET, discussed below), were compared against the back-of-book
vocabulary lists at the back of three major course books (from Cambridge itself
as well as Cobuild and Headway), and as a control also against the Longman
LDOCE dictionary's 2000-word defining vocabulary. As Table 3 shows, by the end
of the second book in each series, none of these course books exposes the
students to much more than half the words on this basic list. And even that
says nothing about the number of exposures provided for each word (10 seem
necessary for learning) or the learning conditions provisioned with each
exposure (in terms of words known in the environment). Both topics will be
treated in the breakout session.
Table
3
|
|
CA1 |
CA2 |
CA3 |
CO1 |
CO2 |
HE1 |
HE2 |
LDOCE |
|
Sample 1 |
6 |
9 |
11 |
6 |
14 |
4 |
10 |
19 |
|
Sample 2 |
3 |
6 |
10 |
4 |
9 |
3 |
9 |
13 |
|
Sample 3 |
6 |
9 |
11 |
4 |
13 |
4 |
9 |
14 |
|
Sample 4 |
2 |
6 |
8 |
4 |
9 |
1 |
5 |
10 |
|
Sample 5 |
8 |
8 |
10 |
3 |
8 |
7 |
8 |
9 |
|
Sample 6 |
4 |
12 |
14 |
7 |
11 |
6 |
8 |
16 |
|
Sample 7 |
6 |
9 |
11 |
6 |
10 |
8 |
11 |
16 |
|
Sample 8 |
2 |
5 |
10 |
3 |
10 |
6 |
10 |
11 |
|
Sample 9 |
5 |
5 |
10 |
5 |
11 |
5 |
10 |
13 |
|
Sample 10 |
8 |
11 |
12 |
7 |
10 |
10 |
10 |
12 |
|
|
|
|
|
|
|
|
|
|
|
TOTAL |
50 |
80 |
107 |
49 |
105 |
54 |
90 |
133 |
|
MEAN |
5 |
8 |
10.7 |
4.9 |
10.5 |
5.4 |
9 |
13.3 |
|
S.D. |
2.2 |
2.4 |
1.6 |
1.5 |
1.8 |
2.6 |
1.7 |
3.1 |
|
%
of Hindmarsh Words |
25 |
40 |
53.5 |
24.5 |
52.5 |
27 |
45 |
66.5 |
(CA=Cambridge; CO=COBUILD;
HE=Headway; LDOCE=Longman defining vocabulary)
So if the “vocabulary
emphasis” of these course books was not words per se, then what was it?
It was, of course, strategies, principally the strategy of guessing new word
meanings from word parts and from semantic context. In threshold terms, this
and related strategies were almost certain to be already functioning in the L1
(and hence not in need of training), just unavailable in L2 for want of a
threshold knowledge base. It would seem one needs to teach words rather than
strategies.
Much of the we-have-no-words
problem might have gone unnoticed were it not for the introduction in several
Gulf institutions of standardized English reading tests. This story played out
in different ways in different places, but at Sultan Qaboos University (SQU) in
Oman the arrival of the Cambridge Preliminary English Test (PET) rather quickly
disclosed the scale of our failures. The PET was the most rudimentary test
available from the University of Cambridge’s Language Examinations Service
(UCLES). It was based on the Cambridge basic lexicon of 2387 words (Hindmarsh,
1980) and couched in a content that was too simple for our students’ level of
maturity. They nonetheless failed the test in droves, took more courses and
failed it gain, and too often were eventually expelled from colleges to retire
to their families in some degree of disgrace.
An advantage that came from
the advent of standardized testing, at least at SQU, was that teachers began
investigating the problem through action research projects with their learners
and came up with some interesting though admittedly piecemeal findings. I began
investigating my own students' vocabulary sizes with Nation's Vocabulary
Levels Test (1990; VLT), with typical results as already shown above
(Figure 2). Another instructor, Arden-Close, observed his learner' interactions
with a content instructor, later publishing a paper (1993) with a heart-rending
account of the lecturer's attempts to come to grips with his students'
incomprehension.
Arden-Close describes a
chemistry lecturer in a classroom discussion backing up further and further in
a search for common lexical ground. Trying to convey the idea of a "carbon
fluoride bond" the lecturer tries a succession of progressively more
common analogies: teflon pans, a tug of war, an assembly line, all to no avail.
In the light of the vocabulary size-testing undertaken subsequently, it is no
wonder; pan, war, line and other words from the 2000 list were no doubt
themselves unknown, let alone any imaginative compounds derived therefrom. In
another classroom postmortem, a biology lecturer describes searching for a
common analogy to convey "hybridization" and in the process indicates
the real level of the problem:
The first time I gave a
hybridization analogy, I talked about dogs, and then I switched to goats; and
then it even dawned on me that some of them aren't going to be in touch with
the fact that if you mix two different kinds of goats they come out looking in
between, and I didn't know all the specific terms there, what their two
different breeds of goats are called. You can talk about [mixing] colors, but a lot of them don't know their colors yet (p.
258, emphasis added).
Countless similar
unrecorded interchanges took place over the years.
So was the student's lack of words for the colours a sign that they were some kind of poor learners, or
just a sign that no one had taken the trouble to make sure they had covered the
basic words of the language? In fact, the students were quite adept at word
learning. My VLT work with these learners showed they had often picked up quite
a bit of knowledge at off frequency zones throughout the lexicon, just not
necessarily at the higher frequency zones (where basic terms like the names of
the colors
are found). As Table 2 above shows, most
learners knew more words beyond the 2000 level than within it—words they had
clearly invested in learning, but words that in most cases would be met rarely
if ever again.
And most interesting, as
instructors we started paying attention to what our learners were trying to
tell us—indirectly in their little lexical annotations, but also more directly.
Figure 5 is a journal entry from a student in 1993, writing to an imaginary
friend who would soon be entering the University and facing the PET reading
test.
Figure 5
Dear N.,
I heard that you
are going to join the College of Commerce and Economics after you finish your
high school. I have a lot to tell you about this college. The first and
important thing is the PET test. You must pass this test so you can continue
your studies in the College. The PET test is not easy as it seems. It is so
difficult and we have to do a lot to pass it.... The English that we learned at
school is too easy and it's nothing compared with the English in the University.
Let me tell you about myself as an example.
I thought that I
knew English and really in the school I was from the three best students in the
class in English. But here my English is nothing, then I thought I learned nine
years English in the school but I don't have any knowledge and I don't know
anything about real English. I really don't know the fault from who. ...
Your friend, F.
The fault from who? In
retrospect the answer seems clear enough.
As instructors we clearly did
not have the tools needed for the size of our undertaking. We sometimes had the
technological tools, but lacked the appropriate conceptual tools to make much
use of them. What we had, in fact, was a hand-me-down conceptual toolkit that
had been devised for other purposes by linguists and L1 researchers that served
our learners ill. Fortunately, our piecemeal efforts were not the only
investigations under way into the reading and vocabulary problems of the Arab
EFL/ESP learner. Others were looking at these same issues in programs of more
extended and theoretically motivated research and were already fashioning a
more useful toolkit.
4 The real psycholinguistics
of the Arab learner's lexical processing
This section will re-examine
some of the planks in The Old Vocabulary platform in the light of subsequent
research involving learners from Arabic and other typologically different
orthographies (Chinese, Japanese, etc.)
4.1 Guessing the meanings of
new words in context is easy
Laufer and Sim (1985) rather
than accepting the efficacy of guessing new word meanings on faith actually
took the trouble to investigate whether guessing word meanings is a reliable
way to build a vocabulary in a second language or not. In a series of
experiments with Arabic and Hebrew speaking ESL learners, they determined that
guessing is actually quite a messy business with unreliable results. This
finding was subsequently replicated many times with L2 learners (e.g., several
of the studies in Huckin, Haynes & Coady, 1993) and eventually even with L1
learners (Schatz & Baldwin, 1986).
To investigate this same
phenomenon in a more natural and extended instructional setting, Horst, Cobb
& Meara (1998) undertook an extensive reading study with Omani academic
learners. Learners similar to those already described (n=24) were tested for
the number of new words they had learned from reading a whole abridged novel of
over 20,000 words. The study took pains to tighten the methodology relative to
a number of similar studies in order to register the maximum learning possible.
The average amount of learning from this experience for these learners turned
out to be an average of about five words. As the authors note, at this rate the
journey from a minimal to a functioning lexicon (of 5000) words would involve
an investment of more than ten years.
Laufer (1989) went on to seek
the conditions of reasonably reliable contextual inference. Arabic and Hebrew
speaking learners read texts with various proportions of unknown vocabulary
(15%, 10%, etc) and then had their comprehension of the same texts measured.
Successful comprehension was found to be reliable only when 95% of words in the
text were to some extent known. And, predictably, new word inference becomes
reliable at the same point. This finding seemed to give some specification to
Alderson's (1984) notion of a threshold (also discussed in this context by Cobb
& Horst, 2003). Needless to say the search for a threshold has not been
concluded (see new work by Nation, in press), with the proportion of known to unknown
words needed for competent reading tending to rise rather than fall in more
recent studies with tighter methodologies.
The Levels Test mentioned in
the previous section would suggest that Middle Eastern learners are well below
any functioning lexical threshold however defined. This hint was later
confirmed in a number of area PhD studies testing both learners and
instruments. Al-Hazemi (1993) worked extensively with Meara’s Yes-No vocabulary
test in a Saudi Arabian setting, finding the effective vocabulary size of
graduates in a military academy to be well under 1,000 words, putting their
lexical familiarity with an English text of average difficulty at somewhere
below 70%, almost grotesquely short of Laufer’s 95%. In the Levels work
mentioned above the test did not have a 1000 level at that time, an omission
since rectified (discussed in Nation 1993; delivered Nation, 2001; online at www.lextutor.ca/levels ).
4.2 Reading processes are
universal across languages
Koda (1988; 2005)
investigated the basic cognitive processes underpinning reading in different
orthographies, namely Arabic, Chinese, Spanish, and English, and found that far
from being universal these were highly language specific. For example, lexical
access, word recognition, and the juggling of top-down and bottom-up
information sources are quite different for different orthographies. Further,
cognitive level strategies developed for processing one orthography will almost
inevitably be used when reading in a different orthography even if wildly
inappropriate or counter-productive, in a phenomenon called cognitive process
transfer.
Abu-Rabia and Seigel (1995)
gave some detail to Koda’s picture in studies showing that reading in Arabic
always involves a greater and different attention to ambient context than
reading in English, in view of the lack of short vowels in Arabic orthography
and hence the greater inherent ambiguity of written words and the resulting
need for a heightened attention to context to disambiguate the word in an extra
step on the way to lexical access.
Other researchers deepened
the locus of difference from the cognitive down to the even more basic
perceptual plane. In a series of studies Randall and Meara (e.g., 1988)
investigated possible sources of Arabic EFL learners' notoriously inaccurate
spelling. They hypothesized that an inheritance from L1 might be working
against these learners, such that words are actually perceived in different
ways in the two languages. This is again related to the missing vowels, which
the researchers reasoned could cause a different locus of information in the
printed word. The spelling errors they were thinking of almost always involved
a problem with vowel placement, such as writing cereals for curls,
which they gave the name “vowel blindness.”
Randall and Meara asked
English and Arabic speakers to look at randomly presented strings of o's
and decide as quickly as possible whether or not there was also an x in
the string. The strings were meant to serve as abstract, semantics-free words.
They might include the strings shown in Figure 6.
Figure 6
ooooo oxooo
oooox ooxoo oooox
Average reaction times were
recorded for both groups across different positions of the x. A fast
reaction time for, say, an x at the left of the string and a slower time
for one at the right would suggest that the subject normally paid more
attention to the left side of written words, presumably because that was the
side normally bearing the information needed to make sense of a text (as is he
case in English). The null hypothesis would be that any differences were
individual or random. In the event, however, Randall and Meara found that each
language group had its own consistent and distinct profile. English speakers appeared
to sample the string from left to right, with three points of emphasis—a strong
one at the left of the word, a slightly weaker one at the end, and a still
weaker one in the middle, in an “M” shaped curve. Arabic speakers, on the other
hand, sampled the strings from the centre
first, with much less attention to either end, in a “U” shaped curve. Both are
shown in Figure 7. The M shape presumably indicates a sequential processing
with most attention given to the ends and beginnings of words, while the U
shape indicates a whole-word processing in which certain details might
predictably get lost—like the number and position of vowels in a word like curls. A practical application of these ideas to
language teaching and testing is discussed in Ryan and Meara (1995).
Figure 7: Roman M- and Arabic
U-shaped response time profiles


4.3 All cost and no benefit?
It would be quite
remarkable if all these differences in cognitive processes provided nothing but
disadvantages to the Arabic learner when transferred to another language. For
example, might not Abu-Rabia and Siegel's (1995) finding of the greater
attention to context in the processing of Arabic play a positive role at some
point in lexical processing of English, possibly at the point of lexical acquisition
though inference? In fact, in a classic lexical inferencing study by Haynes
(1983), learners from four language groups including an Arabic group inferred
word meanings from text, where transparent clues to meaning had been placed in
either the local context or the global context. Global context refers to a
context that extends over several sentences or even paragraphs of text, while
local context refers to the immediate sentence or even phrase. Global context
is of course more demanding of memory, requiring retention of information for
delayed integration, but is the more typical locus of information in a natural
inferencing task. In the experiment, local context was found to be moderately
useful to learners from all language groups, but global context was useful to
the Arabic group only.
But has it not already been
stated that the Arabic EFL learner exposed the fallacy that inferring new word
meanings from context is easy (e.g., Laufer & Sim, 1985)? That is true, but
it has also been suggested that a reason inferring is difficult is that more
than the critical 95% of words in a typical inferring context are themselves
also unknown, as would often be the case with learners having incomplete
knowledge of even the basic 1,000 and 2,000 frequency levels (as was shown
above to be the case by Cobb, 1997, with Omani learners; Al-Hazemi, 1993, with
Saudi Arabian learners; and several others in different Gulf locations
throughout the 1990s). Analysis with Vocabprofile shows that even instructional
academic texts typically draw only 70% to 75% of their lexis from the 2000
level; in other words, these learners are facing far greater proportions of
unknown lexis than just 5%, and this—rather than lack of skill with
inferencing—would be the first explanation for the poor inferencing problem.
But the global-local
consideration puts an extra twist on this one. The 95% rule will affect global
contexts, a potential strength of the Arabic learner, more than local contexts.
In a local context, involving only a few words to the right and left of a
target word, a lucky match between the words in the context and the words a
learner happens to know are likely to be frequent. In a global context,
involving several sentences, the chances for multiple unknowns are far better.
Here is a mini-experiment
in text analysis to make this point. The text about computer security mentioned
above was broken into sentences and these were fed through Vocabprofile
individually. The text as a whole has a 1k, 2k, AWL, and Off-list profile of
74%, 8%, 11%, and 7%, respectively, which makes it a difficult text for
learners who are lexicon-building in the 2000 zone. But the profiles of
individual sentences from the same text may not be the same. Figure 4 shows the
sentences profiled individually. If word meanings are distributed in the global
context, then this is a difficult text. But if they are contained within single
sentences, then a new word lodged in Sentence 5 (with 81% 1k words) is almost
certainly easier to interpret than one lodged in Sentence 4 (with only 43% 1k
words).
Table 4
|
|
Percent
tokens |
|||||
|
Sentences |
S1 |
S2 |
S3 |
S4 |
S5 |
S6 |
|
K1 Words (1 to 1000): |
69 |
76 |
75 |
43 |
81 |
74 |
|
K2 Words (1001 to 2000): |
12 |
6 |
8 |
0 |
5 |
7 |
|
AWL Words (academic): |
8 |
6 |
8 |
57 |
7 |
14 |
|
Off-List Words: |
11 |
12 |
8 |
0 |
7 |
5 |
This example could be
somewhat difficult to implement, but in principle there must be usable
instructional strategies that would compensate for the negative transfers from
Arabic to English and build on the positive. In my break-out session I hope to
provide examples of both.
To summarize, many of our
ideas about teaching EFL and ESP reading in the Gulf in the 1980s were
inadequate in specifiable ways. Teaching Arabic speaking learners to read
English was a far more complex enterprise than we imagined, or than Chomskyan
linguistics or an L1-oriented reading pedagogy could offer much help with. In
the meantime, L2-oriented research has developed remarkably quickly, and has
told us where to look for a lot of the problems we were having. As yet the
pedagogical implications of this research remain under development.
Nonetheless, teachers and researchers working with Arabic learners have not
only discovered problems but also solutions and again these have proven useful
far beyond the original Arabic-learner context.
5 From problems to solutions
Once you have determined that
your particular group of learners has unique learning challenges and
opportunities, you enter the world of homemade instructional design. Commercial
providers of instructional materials have an interest in promoting universalist
models of SLA not necessarily because they believe in them but because the cost
of doing otherwise would be enormous. But in any case, as mentioned, the
pedagogical implications of some of this uniqueness is far from obvious.
The Middle East region has
seen the development of many, many instructional design projects in EFL and
ESP. Many of these were used a few times and quickly disappeared, or remain to
this day housed in metal cabinets in resource centers, sometimes because they
were not based on a careful needs analysis, or sometimes because they were
committed to paper or other hard media without sufficient piloting. Some
however went on to become key components in the ESL toolkit which no one any
longer associates with their origin in the Arab world.
One implication from the
research cited above that seems unmistakable is that the vocabulary needs of
Arabic learners must be organized and planned for because they will not be met
by magic. Instructors at the American University of Beirut began working with
this idea in the 1970s, and they were early explorers of the notion of
coverage. More than a decade before Laufer's (1989) 95% coverage finding,
Praninskas (1972) and her colleagues in Beirut worked with corpora and frequency
lists as aids in selecting materials, writing materials, checking the lexis of
examinations, and designing vocabulary courses. While such courses were
apparently successful, these researchers were nonetheless surprised to find
that even with the most frequent and recurring 2000 words known, learners
continued to face difficulties in reading academic texts. There was presumably
a further lexical challenge somewhere the basic 2000 words and the specialized
lexicon of a particular domain. In the light of subsequent Vocabprofile
research, of course, 2000 words provides only 80% coverage of average texts and
usually less in academic texts, as opposed to anything like 95%, so this is no
surprise.
Praninskas and colleagues
developed a sophisticated computer analysis of their learners' academic texts
and identified a further high frequency zone within that genre—providing the
seeds of a methodology later taken up by Xue and Nation (1984) to produce the
University Word List, and in turn by Coxhead (2000) to produce the Academic
Word List. The idea of these lists is to increase coverage substantially beyond
80% without (a) entering into the lexicon of specific domains, and (b) without
imposing an impossible learning burden. The current state of this longstanding
project is a streamlined list of 570 word families that reliably raises
coverage of academic texts by a further 10% over the 80% already provided by
the first 2000 word families (or sometimes more—the text shown in Figure 8 is
formed of more than 13% yellow AWL words).
Figure 8
relativistic heavy ion physics is of international and interdisciplinary interest to nuclear physics particle physics astrophysics condensed matter physics and cosmology the primary goal of this field of research is to re create in the laboratory a novel state of matter the quark gluon
plasma qgp which is predicted by the standard model of particle physics quantum chromodynamics to have existed ten millionths of a second after the big bang origin of the universe and may exist in the cores of very dense stars star searches for signatures of quark gluon plasma formation
and investigates the behavior of strongly interacting matter at high energy density by focusing on measurements of hadron production over a large
solid angle it utilizes a large volume time projection chambers tpc for tracking and particle identification in a high track density environment star will measure many observables
simultaneously on an event by event
basis to study signatures of a possible qgp phase transition and the
space time evolution of the collision process at their respective energy the goal is to obtain a fundamental understanding of the microscopic structure of hadronic interactions at the level of quarks and gluons at high energy densities star is one of two large scale experiments under construction
at the relativistic heavy ion collider rhic at the national laboratory
bnl on for operation in number it has been designed to focus primarily on hadronic observables and features a large acceptance for
high precision tracking and momentum analysis at center of mass c m rapidity specific to rhic will be significantly increased particle production thousands of particles produced
hard parton parton scattering in heavy ion collisions
The AWL can help teachers
find suitable texts, design vocabulary courses—or it can be given to learners
to work with by themselves. In my breakout session I will be suggesting some
ways of working with the AWL in a networked computing environment.
The AWL was again an idea
that developed first in response to the needs of Arabic learners but then found
a ready market in the larger ESL world beyond. One can now buy excellent books
that contextualize and deliver the AWL in an effective manner for anyone who
needs it, such as Schmitt and Schmitt's (2005) Focus on Academic Vocabulary:
Mastering the AWL, which is now used throughout the Gulf area. So in the
end the commercial publishers do get around to meeting local needs, as long as
these turn out to be general needs after all.
A similar circuit has been
taken by my own approaches to using the computer as a vocabulary tutor. Ideas
that I developed for my Omani learners in the mid 1990s were later aired online
via The Compleat Lexical Tutor Internet website, which has gradually
built up a daily clientele of more than 1,000 unique users worldwide, and was
then gradually rediscovered in the Middle East, which now accounts for about
30% of its user base. The principle is the same: the Arabic learner highlighted
the point that vocabulary needs to be taught, but it was point that needed to
be made on behalf of all learners.
In my breakout and regular
conference sessions I will show and tell some computational ideas that I
designed to meet some of my Arabic learners' needs, and that I believe
capitalize on some of their strengths.
References
Abu
Rabia, S. & Siegel, L. (1995). Different orthographies different context
effects: The effects of Arabic sentence context in skilled and poor readers. Reading
Psychology 16 (1), 1-19.
Adams,
M. (1990). Beginning to read: Thinking & learning about print. Cambridge
MA: MIT.
Alderson,
C. (1984). Reading in a foreign language: A reading problem or a language problem? In J.A.
Alderson & A.H. Urquhart (Eds.), Reading in a Foreign Language (pp.
1-27). London: Longman.
Al-Hazemi, A. (1993). Low-level EFL vocabulary
tests for Arabic speakers. PhD dissertation, Centre for Applied Language
Studies, University of Wales, Swansea.
Arden-Close, C. (1993). Language problems in science
lectures to non-native speakers. English for Specific Purposes 12,
251-261.
Barnard, Helen. (1972). Advanced English Vocabulary: Workbooks.
Rowley, Mass. Newbury House.
Bates, E., & Goodman, J. (2001). On the inseparability of
grammar and the lexicon: Evidence from acquisition. In M. Tomasello & E.
Bates (Eds.), Language development: The essential readings (pp.
134-162). Malden, MA: Blackwell Publishers.
Bernhardt, E. & Kamil, M. (1995).
Interpreting relationships between L1 & L2 reading: Consolidating the
linguistic threshold and the linguistic interdependence hypotheses. Applied
Linguistics 16, 15-34.
Bossers, B. (1991). On thresholds, ceilings and short-circuits:
The relation between L1 reading, L2 reading, and L2 knowledge. AILA Review
8, 45–60.
Cambridge
University. (1990). Preliminary English Test. Cambridge Local
Examinations Syndicate: International examinations.
Coady,
J. (1979). A psycholinguistic model of the L2 reader. In R. Mackay, R. Barkman
& J. Jordan (Eds.), Reading in a second language (pp. 5-12). Rowley
MA: Newbury House.
Chomsky, N. (1959). Review
of B.F. Skinner, Verbal behavior. Language 35, 26-58.
Chomsky, N. (1995). The
Minimalist Program. Cambridge, MA: MIT Press.
Cobb,
T., & Horst, M. (2001). Reading academic English: Carrying learners across
the lexical threshold. In J. Flowerdew & M. Peacock (Eds.) The English for Academic Purposes Curriculum
(pp. 315-329). Cambridge: Cambridge University Press.
Cobb, T.M. (1995). Imported
tests: Analysing the task. Paper presented at TESOL (Arabia). Al-Ain, United
Arab Emirates, March.
Corder, P. 1967: The significance of learner errors. International Review of Applied Linguistics (IRAL) 5, 2/3: 161-170.
Coxhead, A. (2000). A new
academic word list. TESOL Quarterly 34,
213-238.
Dulay, H. and M. Burt.1974: Natural sequences in child second language
acquisition. Language Learning 24,
37-53.
Gelderen, A. van, Schoonen, R., Glopper, K. de, Hulstijn, J., Simis, A.
Snellings, P. & Stevenson, M. (2004). Linguistic knowledge, processing
speed and metacognitive knowledge in first and second language reading
comprehension; a componential analysis. Journal of Educational Psychology,
96 (1), 19-30.
Ghadessy, M. (1979). Frequency counts, word lists, and materials
preparation: A new approach. Forum 17 (1), 24-27.
Goodman,
K.S. (1967). Reading: A psycholinguistic guessing game. Journal of the
Reading Specialist 6, 126-135.
Goodman, K.S. (1973).
Psycholinguistic universals in the reading process. In F. Smith (Ed.), Psycholinguistics
and reading (pp. 21-29). New York: Holt, Rinehart, & Winston.
Horst, M., Cobb, T., & Meara, P. (1998).
Beyond A Clockwork Orange: Acquiring Second Language Vocabulary through
Reading. Reading in a Foreign Language 11
(2), 207-223.
Huckin,
T., Haynes, M. and Coady, J. (Eds.) (1993). Second Language Reading and
Vocabulary. Norwood, NJ.: Ablex.
Koda, K. (1988). Cognitive
processes in second-language reading: Transfer of L1 reading skills and
strategies. Second Language Research 4 (2), 133-156.
Koda,
K. (2005). Insights into Second Language Reading: A Cross-Linguistic
Approach. New work: Cambridge University Press.
Lado, R. (1957). Linguistics across cultures. Ann Arbor:
University of Michigan Press.
Laufer,
B. (1989). What percentage of text-lexis is essential for comprehension? In C.
Lauren & M. Nordman (Eds.), Special language: From humans thinking to
thinking machines (pp. 316-323). Clevedon, UK: Multilingual Matters.
Laufer, B. and Sim, D. (1985). Taking the easy way out: non-use and misuse of
contextual clues in EFL reading comprehension. English Teaching Forum
23 (2): 7-10, 20.
Long, M. H., & Porter,
P. (1985). Group work, interlanguage talk, and second language acquisition. TESOL Quarterly 19, 2, 207-27.
Nation, P. (1993), Measuring
readiness for simplified reading: A test of the first 1000 words of English. RELC
31, 193-203.
Nation, P. (In press.) How many words do you need to be able to read? Second
Vocabulary Special Volume, Canadian Modern Language Review, to
appear December 2006.
Nation, P. (2001). Learning
Vocabulary in Another Language. Cambridge: Cambridge University Press.
Praninskas, J. (1972). American University Word List. London:
Longman.
Randall, M. & Meara, P. (1988). How Arabs Read
Roman Letters. Reading in a Foreign Language, 4, 133-145.
Redman, S., & Ellis, R. (1991). A way with words: Vocabulary
development activities for learners of English, Book 3. Cambridge: Cambridge
University Press.
Schatz, E. K., & Baldwin, R. S. (1986). Context
clues are unreliable predictors of word meanings. Reading Research Quarterly, 21, 439-453.
Schmitt, N., & Schmitt, D. (2005). Focus on vocabulary: Mastering the Academic
Word List. White Plains NY: Longman
Smith, F. (1971). Understanding reading: A
psycholinguistic analysis of reading and learning to read. New York: Holt,
Rinehart, & Winston.
Soars, J., & Soars, L.
(1991). Headway (Vols. 1, 2 and 3). London: Oxford University Press.
Stanovich, K. (1980). Toward an interactive-compensatory model of individual
differences in the development of reading fluency. Reading Research
Quarterly 16, 32-71.
Stanovich, K. (2000). Progress in understanding
reading: Scientific Foundations & New Frontiers. New York: Guilford
Press.
Swan, M., & Walter, C. (1990). The new Cambridge
English course (Vols. 1, 2, and 3). Cambridge: Cambridge University Press.
Xue, G. & Nation, P. (1984). A university word
list. Language Learning and Communication, 32, 215-219.