NUCLEAR INPUT

Home>Frequency>Nuclear input ::: UPDATE 2025-12-12

Nuclear List Builder v.4.3
Reduce a family list to its frequent members
+ NEW - DERIVATIONS COUNT || FRENCH FAMILIES

+ Mobile
Jan '25

The BNC/Coca family lists are based on large corpora with families as complete as possible in order to classify every word of any text (in, e.g., VocabProfiles). But even K-1 to K-3 families may contain members that learners will never meet, or which appear mainly in specific text types (medicine, engineering). There is thus a case for reducing these lists to their essentials in both initial and specialist learning.
Nuclear List Builder "crosses" family lists against word frequencies in a smaller (1-4 million words) corpus to obtain a list of just the family members that are frequent in that corpus. Why is this interesting? Read a paper about this, or its summary. (*Parallel French study ~~en route~~ arrivé 14 janvier 2026*)

(1) Choose Full
Start List

(2) Choose Cross-Corpus

User upload
(850k wds ; format ~.txt; Enc UTF-8)

Stored corpus

(3) Click 'Make List' to...

FIRST
Explore cutoffs
Or just
Fam sum

(4)

THEN
Choose
cut-offs

(5) Cut-
offs↓

Include only words >
of Fam

in Cross-Corpus ?

WITh OPTIONS:
Mark derived
words "z_" ?

Show % ?

Fam sums ?

(6)

(7) Get Result (8) Copy list + [?]

$\stats count\$