Fourteen family lists were extracted from the British National Corpus on the basis of (1) frequency and (2) range across the 100 subdivisions of the corpus by Paul Nation and colleagues in 2012 to accompany his Range software, as an alternative to the GSL+AWL (Classic) framework.Six further lists were developed in 2014 by Tom Cobb for use in VP on Lextutor, on user request to extend the reach to 20k families. Nation later also extended the lists to 20k using a different methodology.
The flemma version of the BNC lists was developed by Laurence Anthony in a procedure described here and integrated into VP and Familizer in summer 2019. (A flemma is a 'form based lemma' i.e. one not categorized by part of speech where that would be applicable.)These lists were substantially 'cleaned' in Dec 2020 (proper nouns removed, etc) on the occasion of building the 100-word lists. Proper nouns were buried in the k-lists but could be up to 10% of a c-list.
The Lextutor Facebook announcement of this work in Nov 2020 provides a good description:
![]()