Home > Coverage
 Coverage Calculator v.3 CHECK + UPDATE 3 DEC 2025       
    The percentage of list words in a corpus
This program calculates how many times the words on a list appear in a corpus. A list of the 2,000 most common word families is often said to 'cover' up to 80% of the individual words (tokens) in a general corpus of English - i.e., 80% of the word tokens in the corpus will be words from that list. || Treatment of proper nouns is a checkbox option.|| Headword lists can be expanded into family/lemma lists here || List coverage in texts can be calculated here (Demo 7). || Known max of this routine 2024: 13,000 wds in list by ≈ 1 million wds in corpus (test corpora/texts will be reduced by program if needed)

RESEARCH: >   1. Nation (2006)   2. Laufer Ravenhorst (2010)   3. Schmitt Jiang Grabe (2011)   4. Schmitt Cobb et al (2015)     5. Laufer (2020)     6. Cobb Laufer (2021)  


DEMO LISTS

BN-Coca Fams
1k | 2k | 3k | 1-3k

Nuclear English (1-3k)   [?]
BNC/COCA Based
    @ nfl-0 nfl-1 nfl-2 nfl-7
NFL-0 Based
    Coming soon - available now
    at Nuclear Builder for copy paste

Nuclear French (1-3k)
Listes de fréquence nucléaire françaises
LFNF-x (>x% of family)

fr_lfnf-0
fr_lfnf-1
fr_lfnf-5
Updates may
be available at
Nuc. List Builder
(1) Select or paste + name LIST

 

  (2) Choose Corpus/Text
      (Eng > Fr; chopped to 1m)

 

(4) Click

   

(5) See result (Propers/leftovers at bottom)