Corpus Software

Corpus Software

Text analysis

COSMAS - A corpus analysis toolbox, online accessible since 1995, see COSMAS. 778 million words online, virtual corpus composition, complex query language, concordancing, collocation analysis etc.

MonoConc Pro. Commercial Windows concordance program (produced by me). See the Athelstan site.

MonoConc, a Mac/Windows concordance program that allows sorts (2R,1R,2L,1L) and provides simple frequency information. For information on availability, see MonoConc.

ParaConc, a Mac/Windows concordance program for parallel texts. A version is available for free for research purposes (under license). For other uses, the single user price is $49.95. See ParaConc.

Conc, a Mac concordance program, is available via ftp from SIL. Also available by anonymous-ftp from (/
Indiana University LETRS Conc QuickGuide.

Free Text, a Mac concordance program, should be available from the U. of Michigan site. Also available from

HUM, developed by William Tuthill, is available by anonymous-ftp from (/

TextAnalyst Commercial software that produces a semantic network on the basis of text input. The company, Megaputer also produces a data mining tool PolyAnalyst.

Lexical Freenet Web-based thesaurus

ShoeBox Fieldwork oriented program. Information available from SIL.

VisualText A suite of commercial text analysis tools.

Word Cruncher Info available from WPT

WordSmith Mike Scott's WordSmith page.


AUTASYS by Alex Chengyu Fang at UCL.

TreeTagger Language-independent HMM tagger. Parameter files for English, French, German.

Tagger overview by Linda Van Guilder

The (LOB) CLAWS1 tag set

CoreLex -- a tagset and database for semantic tagging based on WordNet