Студопедия
rus | ua | other

Home Random lecture






Corpora in an historical perspective


Date: 2015-10-07; view: 530.


Corpora

At its most general, a corpus (plural corpora) may be defined as a body or collection of linguistic data for use in scholarship and research. Since the early 1960s, interest has increasingly focused on computer corpora or machine-readable corpora, which are the main subject of this entry. However, in the first three sections I shall begin by considering the place in linguistic research of corpora in general, whether machine-readable or not. In the remaining sections I shall consider why computer corpora have been compiled or collected; what are their functions and their limitations; what are their applications, more particularly, their use in natural language processing (NLP). This entry will illustrate the field of computer corpora only by reference to corpora of Modern English (cf. - ).

In traditional linguistic scholarship, particularly on dead languages (languages which are no longer used as an everyday means of communication in a speech community), the corpus of available textual data, however limited or fragmentary, was the foundation on which scholarship was built. Later, particularly in the first half of the twentieth century, corpora assumed importance in the transcription and analysis of extant, but previously unwritten or unstudied, languages, such as the Amerindian languages studied by linguists such as Franz Boas (1911) and the generation of American linguists who succeeded him.


<== previous lecture | next lecture ==>
Cross-linguistic research | Recent macrolinguistic surveys
lektsiopedia.org - 2013 год. | Page generation: 0.061 s.