Most family lists (e.g. BNC/Coca) are based on large corpora and are as complete as possible in order to classify every word of any text (e.g. in VP) but there is no reason for learners to know every form of every word. They may never meet them, and anyway different text genres (engineering, medical) employ signature subsets of family members. This program "crosses" family lists against word frequencies in a smaller general or specialist corpus (1-4 million word) or user corpus (up to ≅ 800,000 words) to obtain just the members it contains. Read a research paper or summary about this

