Text Corpora


English Corpora

American National Corpus  Second release available from LDC

British National Corpus. A large (100 million words) corpus of modern English (1990's).  Includes an online component. Also available online with an excellent interface here.  See also BNC Indexer

International Corpus of English  A major project directed by Gerald Nelson. Some corpora can be downloaded (under licence) -- Hong Kong, East Africa, India, Phillipines, Singapore. Others are available on CDROM -- Great Britain and New Zealand.

Oxford Text Archive WEB site Good starting point for British novels: Dickens, Trollope, etc. 

 Susanne Corpus by Geoffrey Sampson

OED Online   Subscription ($30 a month is one option)

Free eBooks - Project Gutenberg  17000 books

Project Gutenberg of Australia

Corpus of Spoken, Professional American-English The corpus is available commercially from Athelstan. There is a 50,000 word sample available online. This two-million word corpus can be used for lexical and grammatical analyses, but not for close discourse analysis as there is no backchannel, overlap, pause length etc,

Renascence Editions  Worked printed in English between 1477 and 1799

COBUILD Concordance/Collocation sampler (online) 

Wellington Corpus of Written/Spoken New Zealand English. 

Penn-Helsinki Corpus of Middle English

The Lampeter Corpus of Early Modern English Tracts 

ICAME, Bergen. An important site. Contains info on the ICAME CDROM and archives of the corpora list. 

The Bergen Corpus of London Teenage Language

Corpus of Written British Creole  Contact Mark Sebba to obtain the corpus

The TRAINS Spoken Dialogue Corpus

CCAT Archive Gopher site at U. Penn. 

Voice of America News (Gopher)

CBC Canadian broadcasting archives. Includes sound files.

Marx & Engels Online Library

World Religious Texts

English-Miscellaneous

O.J. Simpson Trial Transcripts
Presidential Inaugural Addresses  All the president's addresses.