Home > Frequency > Nuclear input
Nuclear List Builder v.3
  Reduce families to members > x corpus hits
  NEW IN '23 - FRENCH FAMILIES     AUG '23 z_words clean-up F+E
+: headword override
+: all/no-inflects override
+: display derived option
+: French fams
NEW: Freq-for-complete-fams list
NEW: Percent derived/fam Apr 23
Most family lists (e.g. BNC/Coca) are based on large corpora and are as complete as possible in order to classify every word of any text (e.g. in VP) but there is no reason for learners to know every form of every word. They may never meet them, and anyway different text genres (engineering, medical) employ signature subsets of family members. This program "crosses" family lists against word frequencies in a smaller general or specialist corpus (1-4 million word) or user corpus (up to ≅ 800,000 words) to obtain just the members it contains. Read a research paper or summary about this


(1) Choose full BNC/Coca, AWL, or French
families list

(2) Upload Own Cross-Corpus/Text
(850k wds max; must be ~.txt)

 or 

Choose Cross-Corpus

 
(3) View output as...

FIRST
Complete frequencies and %'s
(To explore cutoffs  ?  or get Fam-Freq list)

THEN
Cut-offs and options

  (4) Choose cut-offs ↓
Exclude words <=
of Fam

AND <
in COCA (Eng.)
< in Cross-Corp (Fr.)
 ? 

Show derived="z_"
 ? 

Retain
Heads  ? 
English only
TEMPORARY
All inflects >0
 ? 
No inflects.  ? 
(5) Click
 

(6) See/Get Result  
    (7)