Corpus of Spoken, Professional American-English

Description of corpus

The corpus, which has been constructed from a selection of existing transcripts of interactions in professional settings, contains two main sub-corpora of a million words each. One sub-corpus consists mainly of academic discussions such as faculty council meetings and committee meetings related to testing. The second sub-corpus contains transcripts of White House press conferences, which are almost exclusively question-and-answer sessions.

The transcripts making up the spoken American corpus have been selected because they appear to be relatively unedited. However, they have not been produced by linguists and so do not have all the features one might wish for. For further info on the corpus, you can look at the more detailed description, examine or download a sample of the corpus (below), or contact Michael Barlow.

You might also want to look at the list of 400+ speakers in the corpus. See below.

Price: $49 (Individual user); $179 Site licence


Sample of corpus

You can examine or download a sample of the corpus. The sample differs from the actual corpus in that some sections have been deleted in order to fit in several text types. In addition, the tags coding the names of speakers <SP> and </SP> will probably not be displayed by your web-browser, which means that speaker names will simply appear untagged.

Each section starts with (Sample n). You can use this marking if you want to download the file and separate the different text types. The file is around 300K.

List of speakers

The following list of speakers will be updated as more info is unearthed, either from the corpus or from other sources. If you are using CSPA, I would recommend that you download this information from time to time.