Home>Frequency>Nuclear input ::: UPDATE 2025-01-28
Nuclear List Builder v.4
  Reduce a family list to frequent members
  + NEW - DERIVATIONS COUNT || FRENCH FAMILIES
+ Mobile
Jan '25
The BNC/Coca family lists are based on very large corpora, with families as complete as possible in order to classify every word of any text (in, e.g., VP). But even K-1 to K-3 families may contain members that learners will never meet, or which appear mainly in specific text types (medicine, engineering). Thus the case for reducing these lists to their essentials in initial or specialist learning.
    Nuclear List Builder "crosses" family lists against word frequencies in a smaller resident corpus (1-4 million word) or user specialist corpus (up to ≅ 800,000 words) to obtain a list of just the family members that are frequent in that corpus. Read a paper or summary about applying this idea to English (French en route)


(1) Choose full BNC/Coca
or familized Lonsdale Fr
(FNFL-0)

(2) Choose Cross-Corpus

User upload
(850k wds ; format ~.txt; Enc UTF-8)

 OR 

Stored corpus

(3) Click 'Get List' to view complete list

FIRST
Explore cutoffs
OPTION: Fam Freqs


(4)

THEN
Choose
cut-offs

  (5) Cut-offs↓
Exclude words <=
of Fam

AND
Count < in Cross-Corpus
 ? 

OPTION: Mark derived words "z_"      ? 
(6) Click

 

OPTION:
Review % (on/off)

(7) Get Result