Coverage Calculator v.1|
Find the percentage of individual words in a corpus that are covered by a given word list
This program calculates the number of times the words on a list appear in a corpus. For example, a list of the 2000 most common word families is often said to 'cover' up to 80% of the words in a general corpus of English. The coverage figure refers to individual words (or 'running words,' or 'tokens') appearing throughout a corpus; the list words can be tokens, lemmas, or families, as pertinent. Related: Headword lists can be familized or lemmatized here . List coverage in texts can be calculated here (Demo 7).
Known max of this routine at end 2017: 13,000-word list x 2.3 million word corpus
COVERAGE RESEARCH: > 1. Adophs Schmitt (2003) 2. Nation (2006) 3. Schmitt Jiang Grabe (2011) 4. Schmitt Cobb et al (2015)