Toward teachability: Finding the high frequency zones across a language

October 2006
Notes toward a draft

Languages have always been taught on the assumption that the learner of a second language (L2) wants to become a native speaker (NS). However, there are two problems with this. It may not be the learner's goal to become an NS, and anyway it may not be possible under normal circumstances (Cook, 1999). Yet in the absence of a better model, the industry soldiers on with a defective model - wishing it could teach the whole language, but in fact teaching random pieces of it.

In lexis, for example, even 'research based' courses start from a sketchy coverage of the most frequent 1000 or so words of the language, and then jump to random encounters with extraneous low-frequency items that learners have no reason to learn and that may or may not leverage further learning (Cobb, 1995; Nation 2001).

What is the alternative to the NS model? One idea would be to determine the 'core' of a language and then teach as much of that as the learner has time or capacity for. The search for a learnable core has been quite successful mainly with regard to one dimension of language competence, lexis. Zipf long ago demonstrated the skewed nature of lexical distribution in English, a language with many words but surprisingly few in frequent use, and the pedagogical value of this finding has gradually been unpacked. Nation and Waring (1997) have shown with corpus analysis that the 2000-word GSL plus 570-word Academic Word List constitute 90% of the running words in average texts, and that 90% is not far from a threshold for independent reading.

The point of the present review is to show that the search for a teachable core of language is no longer confined to the dimension of vocabulary items. It will bring together related work in several other dimensions of L2 competence:

But is there any hope that such a multi-dimensional core, once fully characterized, would be learnable? In lexis, we know that 2500 words are learnable. (Check ESL298b - 1000 words in four months). Milton and Meara, 500 per year (and with partial learning, as we are attempting to document with the matrix model, probably x% more). From here, it seems there would probably not be more to learn than the usual three-volume language course.

Once learned, would such a core prove "generative" for the rest of the language should the learner wish to make his/her way toward NS? Again, what we know about lexis suggests it could be. For lexis, 95% of known tokens in a text is the basis for further inferential (hence independent) learning (Nation, 2001). Is there a parallel case for other dimensions of language competence? This is an empirical question.

This research is symptomatic of the maturation of applied linguistics as a field separate from linguistics (Widdowson, 2000).


Tom Cobb, Montreal, 9 Oct 06