Language corpora

A corpus is a collection of written or spoken texts. With the use of computers it is possible to compile large amounts of authentic written and spoken language. This compilation of online text can then be analysed in various ways to establish patterns of grammar and vocabulary usage. The usual computer programming tool to perform the language analysis is called a concordancer. The corpora (or corpuses) listed below permit the use of an online concordancer to investigate the text they contain.
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written. (Summary information copied from the BNC homepage.)
Michigan Corpus of Academic Spoken English, containing more than 160 transcripts with over 2 million wods of text.
Ths corpus contains more than 56 millions words of text. The link to the left opens a demonstration concordancer for direct access to the corpus database.
This website allows you to quickly and easily search more than 100 million words of text of American English from 1923 to the present, as found in TIME magazine. (Summary information copied from the corpus website.)
For release in late 2007
This is a very simple concordancer (on this website) that allows you to analyse texts that you have compiled yourself.
These two links contain useful information about corpora and how to use them, with many practical examples.