Miscellaneous Tools for Text Processing

    Testing and staging ground for useful pieces of future Lextutor routines; pieces of existing routines with independent uses

Forwarding addresses:     FreqList Builders haved moved out to ../freq;     Randomizers to ../rand

1. Tag Stripper

Removes HTML tags.
And Jan '16 square brackets [bla bla] and curly braces {bla bla}
2. Corpus Builder
Join up to 50 files - to >500,000 wds. NEW v.3 - ZIP upload, no known limit (APR 2020)
3. Random Wiki Entries by Subject
Build your own balanced corpus with modest labour
4. Sentence Extractor / T-Unit Calculator (+ Std. Dev.)
File to sentences.
5. Proper Stripper BACK!
Eliminate proper nouns from the middles of sentences
5. The Compleat Stripper (some elements under review)0




  • Some of these routines require TEXT files as their input. A text file is a simple file that contains no codes for emphasis, font sizes, etc. To transform a Word file into a text file, simply SAVE it AS text. You will not thereby lose the original file, but create an additional text file (identifiable by the .txt extension on the name).

  • Most of these routines take their inputs from a menu that accesses files on YOUR computer; they have not been adapted for copy-paste text entry.

  • Complex jobs can involve combining routines (e.g., first strip tags of html, save as text file, then combine, build list, extract sentences, or many others).

    Tom Cobb - UQAM - and correspondents, users, code-bloggers