Beyond BNC-14k Lists

The words in these lists are < one-per-million items from the BNC corpus, selected according to two constraints:

This latter was determined by subtraction from Nation's lists: running output through BNC Vocabprofile and selecting off-list items.

GROUP 1
Words <1 per million and present in 6 to 9 sub-corpora
Tending to rarity and/or specialization.
Large numbers of words at each range, mainly because too rare and offering too little coverage to appear in Nation's 14 lists.
range=06 27k (± 2,700 words)
range=07 20k (± 2,000 words)
range=08 16k (± 1,600 words)
range=09 13k (± 1,300 words)
GROUP 2
Words <1 per million, present in 10 to 16 sub-corpora.
Very colourful and useful words, just outside Nation's selection, words known to any educated English speaker yet possibly not for productive use.
range=10 10k (± 1,000 words)
range=11 09k (± 900 words)
range=12 08k (± 800 words)
range=13 07k (± 700 words)
range=14 06k (± 600 words)
range=15 06k (± 500 words)
range=16 05k (± 450 words)
GROUP 3
Words <1 per million, present in 17 to 40 sub-corpora
These words are tending toward medium, not low, frequency.
Small lists - Nation has already cleaned out this area.
range=17-18 08k (± 750 words)
range=19-20 06k (± 600 words)
range=21-23 07k (± 650 words)
range=24-26 05k (± 450 words)
range=27-30 04k (± 400 words)
range=31-5 02k (± 150 words)
range=36-40 01k (< 100 words)
± 7600 words± 5000 words± 3000 words

These 18 word lists, comprising ± 15,600 items, are (arguably) an empirically based definition of the outer fringe of the general or non-specialist lexicon of English (although no doubt some specialist items will be found in the leftmost set of lists above).

Note: these lists are not lemmatized or organized into family units; different members of the same family may appear in several lists, or in Nation's 14 lists.

This work was performed using Post-14 cruncher (see sidebar menu) and Vocabprofile BNC (see Related Links).