Dominic Smith ⇓

MPhil Research: a corpus-driven study of semantic change

This page is archived. It was last modified in July 2007 and it will probably not be updated.

MPhil Research: a corpus-driven study of semantic change

My MPhil thesis investigated how techniques from the field of corpus linguistics can be used to analyse how word meanings have changed over the course of time. Since copyright in the work is owned by the University of Birmingham, I cannot make it available here, but the abstract and bibliography are below, and academic users should be able to get hold of it via online thesis archives, or from Birmingham University library.


What we know: It is accepted by most linguists that the word is problematic as a unit of language, and it is increasingly seen as an accident of history where scribes chose to place spaces. This can be seen from the fact that some languages have one word to describe what other languages need a whole phrase to say (in one famous example, German has a single word for a 'high-heeled-shoe-friendly' floor covering), whilst other words have many meanings (cf. 'to give something up for Lent' / 'to give something up to the Police').

It is for this reason that clusters of words with their collocates (words which are statistically likely to occur within a given proximity of other words) are seen as a primary way to decode meaning. ('to give up' used in combination with 'for' leaves the meaning 'a reason for abstinance', whilst 'to give up' + 'to' = 'to surrender'.)

My hypothesis: Some words related to history or society will fall out of use or gain other senses as the world around them changes. My MPhil research investigated whether it is possible to measure this change empirically by looking at how the liklihood of groups of words co-occurring alters over time.

How I did it: It was done by building a large corpus of fictional English-language texts, representing every thirty years (roughly one generation) between 1790 and 1910 and then looking at how collocates of the word 'work' change over this time.

My supervisor was Professor Wolfgang Teubert. My MPhil was awarded on 14th July 2006.

Staff-student seminars

Updated 7 Oct 05: I gave a presentation of the results of my MPhil to the Staff and Postgraduate student seminars in the School of Arts and Humanities at Birmingham University on 7th October 2005. The presentation is available in Powerpoint format.

I gave a presentation of my work so far, including some initial results, to the Staff and Postgraduate student seminars in the School of Arts and Humanities at Birmingham University on 18th February 2005. The presentation and resources may be downloaded:

  • Presentation Slides (232kB)
    The presentation slides I used in PDF format (requires Adobe Acrobat Reader)
  • Original Data (960kB)
    A ZIP archive containing Excel spreadsheets and WordSmith Concordance files used for the case study of Lady and Woman in the Staff and Student seminars.

Pilot study

The powerpoint slideshow which I used in my talk about my attempts to use Corpus Linguistics to investigate the usage of the word Revolution, which I gave at Birmingham University on 3rd December 2004, is available for download below (503kB).

Download the Revolution Powerpoint

The Thesis

My thesis cannot be downloaded from this website, since copyright is now owned by the University of Birmingham. I can nevertheless provide occassional copies to interested parties, or alternatively it is available from the University of Birmingham.


The particular intention of this study is to design a methodology for a detailed, statistical, analysis of any semantic change in the field of employment, which might have been caused by changes in society. This is done by drawing on theories from a range of fields, including hermeneutics and traditional linguistics, philosophical descriptions of meaning, as well as recent hypotheses from corpus linguistics.

To provide results, the study uses techniques from corpus linguistics, which thus far have been based very much on the study of language in a synchronic context. This project is innovative in its adoption of these methodologies in a diachronic dimension for the specific purpose.

A preliminary set of results was obtained by applying this methodology to a six-million word corpus of nineteenth century English prose fiction, especially compiled for the study. These results are criticised in detail, and suggestions made for further study on the basis of apparent trends revealed in them.


The bibliography, including the sources for the data in the corpus, may be downloaded in PDF format.

[x] Me

I am Dominic Smith and this is my personal website.
I am a radio amateur, Agile project manager and website developer living in Cambridge.   More about me »