Home > Text Tools
Miscellaneous Utilities for Text Processing

Testing and staging ground for useful pieces of future Lextutor routines.

And pieces of existing routines with independent uses


Forwarding addresses:

    FreqList Builders now moved out to ../freq;     Randomizers to ../rand

1. Tag Stripper

Removes HTML tags.
And Jan '16 square brackets [bla bla] and curly braces {bla bla}
2. Corpus Builder
Join up to 25 files - to about half a million words.
3. Random Wiki Entries by Subject
Build your own balanced corpus with modest labour
4. Sentence Extractor / T-Unit Calculator (+ Std. Dev.)
File to sentences.
5. Proper Stripper BACK!
Eliminate proper nouns from the middles of sentences
5. The Compleat Stripper (some elements under review Sept 2016)
Brought back on user demand Sept 2016 with problematic experiments removed

 

 


Notes

  • Some of these routines require TEXT files as their input. A text file is a simple file that contains no codes for emphasis, font sizes, etc. To transform a Word file into a text file, simply SAVE it AS text. You will not thereby lose the original file, but create an additional text file (identifiable by the .txt extension).

  • Most of these routines take their file inputs from a menu that accesses the hard drive or YOUR computer; they have not been adapted for copy-paste text entry.

  • For complex jobs, combine routines (e.g., first strip tags of html file, save as text file, then build list or extract sentences).
  •  

    Tom Cobb - UQAM - and correspondents, users, code-bloggers