Corpus of Spoken Professional American-English -- Untagged Version
Info on tagged version
Description of corpus
The corpus, which has been constructed from a selection of existing
transcripts of interactions in professional settings, contains two
main sub-corpora of a million words each. One sub-corpus consists
mainly of academic discussions such as faculty council meetings and
committee meetings related to testing. The second sub-corpus contains
transcripts of White House press conferences, which are almost
exclusively question-and-answer sessions.
The transcripts making up the spoken American corpus have been
selected on the basis of being relatively unedited. However, since they
have not been produced by linguists, the transcripts do not have all the
features one might wish for. For further info on the corpus, you can
look at the more detailed description, examine or
download a sample of the corpus (below), or contact Michael Barlow.
Price: $49 (Individual user); $179 Site licence
Sample of corpus
You can examine or download a sample of the corpus. The
sample differs from the actual corpus in that some sections have been
deleted in order to fit in several text types. In addition, the tags
coding the names of speakers <SP> and </SP> will probably
not be displayed by your web-browser, which means that speaker names
will simply appear untagged.
Each section starts with (Sample n). You can use this marking if you want to download the file and separate the different text types.
The file is around 300K.