Miscellaneous Utilities for Text Processing

Testing and staging ground for useful pieces of future Lextutor routines.

And pieces of existing routines with independent uses

Forwarding addresses:

    FreqList Builders now have their own area at ../freq

    Randomizers now have their own area at ../rand

1. Tag Stripper

Removes HTML tags.
ANd *NEW! Jan '16 square brackets [bla bla] and curly braces {bla bla}
2. Corpus Builder
Join up to 25 files - to about half a million words.
3. Sentence Extractor / T-Unit Calculator (+ Std. Dev.) *NEW!
File to sentences.
4. Proper Stripper
Under repair

5. Two useful off-site DBs (Collocations and Associations)




  • Some of these routines require TEXT files as their input. A text file is a simple file that contains no codes for emphasis, font sizes, etc. To transform a Word file into a text file, simply SAVE it AS text. You will not thereby lose the original file, but create an additional text file (identifiable by the .txt extension).

  • Most of these routines take their file inputs from a menu that accesses the hard drive or YOUR computer; they have not been adapted for copy-paste text entry. They have not been tested for French.

  • For complex jobs, combine routines (e.g., first strip tags of html file, save as text file, then build list or extract sentences).

    Tom Cobb - UQAM - and correspondents, users, code-bloggers