Corpus of Spoken, Professional American-English
Description of corpus
The corpus, which has been constructed from a selection of existing
transcripts of interactions in professional settings, contains two
main sub-corpora of a million words each. One sub-corpus consists
mainly of academic discussions such as faculty council meetings and
committee meetings related to testing. The second sub-corpus contains
transcripts of White House press conferences, which are almost
exclusively question-and-answer sessions.
The transcripts making up the spoken American corpus have been
selected because they appear to be relatively unedited. However, they
have not been produced by linguists and so do not have all the
features one might wish for. For further info on the corpus, you can
look at the more detailed description, examine or
download a sample of the corpus (below), or contact Michael Barlow.
You might also want to look at the list of 400+ speakers in the corpus.
Price: $49 (Individual user); $179 Site licence
Sample of corpus
You can examine or download a sample of the corpus. The
sample differs from the actual corpus in that some sections have been
deleted in order to fit in several text types. In addition, the tags
coding the names of speakers <SP> and </SP> will probably
not be displayed by your web-browser, which means that speaker names
will simply appear untagged.
Each section starts with (Sample n). You can use this marking if you want to download the file and separate the different text types.
The file is around 300K.
List of speakers
The following list of
speakers will be updated as more info is unearthed, either from
the corpus or from other sources. If you are using CSPA, I would
recommend that you download this information from time to time.