| Miscellaneous Utilities for Text Processing |
FreqList Builders now have their own area at ../freq. 1. Tag Stripper
Removes HTML tags.2. Corpus BuilderJoin up to 25 files - to about half a million words.3. Sentence Extractor / T-Unit Calculator (+ Std. Dev.) *NEW!File to sentences.4. Proper StripperUnder repair summer 20105. The Compleat Stripper
NEW JUNE 2010: TEN kinds of text clean-up for input to other routines - including Javascript Regexes and Regex Checker.6. Three useful off-site DBs (Collocations and Associations)
Notes
Some of these routines require TEXT files as their input. A text file is a simple file that contains no codes for emphasis, font sizes, etc. To transform a Word file into a text file, simply SAVE it AS text. You will not thereby lose the original file, but create an additional text file (identifiable by the .txt extension). Most of these routines take their file inputs from a menu that accesses the hard drive; they have not been adapted for copy-paste text entry. They have not been tested for French. For complex jobs, combine routines (e.g., first strip tags of html file, save as text file, then build list or extract sentences).