The Wellington Corpus of Spoken New Zealand English

Project director

Emeritus Professor

School of Linguistics and Applied Language Studies

Corpus manager

Adjunct Research Fellow

School of Linguistics and Applied Language Studies

This information is taken from Holmes, J., Vine, B. & Johnson, G. (1998). The Wellington Corpus of Spoken New Zealand English: A Users' Guide. Wellington: School of Linguistics and Applied Language Studies, Victoria University of Wellington.

Further information can be obtained from:

Corpus Manager, Archive of New Zealand English, School of Linguistics and Applied Language Studies, Victoria University of Wellington, PO Box 600, Wellington 6140, NEW ZEALAND.
Email: bernadette.vine@vuw.ac.nz
Tel: (+64 4) 463 5639
Fax: (+64 4) 463 5604

Composition of the WSC

The collection dates for the WSC were 1 January 1988 to December 31, 1994. Ninety-nine percent of the data was collected in 1990 to 1994.
The proportions of speech styles are:

The extracts in the corpus are divided into 15 categories and these categories cover a range of contexts in which each style of speech is found. In the table below, the categories are grouped in terms of whether they are monologues or dialogues, public or private, scripted, or unscripted. The codes assigned to the categories are also provided, along with the word targets for each category.

The formal speech section of the WSC involves all the monologue categories and the DGUs (Parliamentary debate). The semi-formal section is comprised of the interview categories, both public and private: oral history (DPH), social dialect (DPP) and broadcast interviews (DGI). The remaining dialogue categories comprise the informal speech section, with 50% of the overall corpus being comprised of private face-to-face conversations (DPC).

Monologue

View our monologue online.

Whose speech is included?

Our corpus is a corpus of spoken New Zealand English. Therefore, we needed to establish criteria for selecting people to be included. We rejected the notion of selecting people who sounded as if they were New Zealanders, since this would have self-evidently pre-judged an issue which the corpus data was intended to illuminate—namely what constitutes New Zealand English. Similarly, non-linguistic criteria such as citizenship or residency are fraught with problems, since those who hold such qualifications may be very recent arrivals from elsewhere. Even longer-term residents cannot be expected to have acquired features which distinguish New Zealand speech from other varieties if they have arrived in the country after puberty. Consequently, we adopted a criterion which has been regarded by others as very stringent, but which we felt confident would ensure the integrity of the New Zealand samples included in the corpus.

A speaker of New Zealand English is defined as someone who has lived in New Zealand since before the age of 10 years

A certain amount of overseas experience was regarded as normal within New Zealand, but again for reasons relating to the need to establish the distinctive features of a New Zealand variety of English, people who had spent extensive periods of time overseas were excluded. More than ten years or over half their lifetime (whichever was the greater) was considered an extensive period, and this rendered people ineligible for inclusion in the spoken corpus. Also excluded were people who had returned from an overseas trip within the last year.

To summarise:

Included

Excluded

Ethnic and gender representation

People of any ethnicity (e.g. Dutch, Samoan, Greek, and Tongan) were considered eligible for inclusion in the spoken corpus provided they satisfied the criterion for eligibility as a New Zealander. No attempt was made to include representative samples from particular ethnic groups other than Māori. It was considered important to include an appropriate proportion of the speech of the indigenous Māori people, and while this was not possible within each sub-category, it was recognised as a reasonable aim for the corpus as a whole. Māori contribute 18% of the total words in our transcribed corpus and Pākehā 76%.

Some degree of gender balance was also considered desirable, with an ideal overall goal of 50% female speech and 50% male speech within the 1,000,000-word sample. Women contribute 52% and men 48% of the final transcribed words, reflecting the New Zealand population balance.

Other social factors

Recognising that it was unrealistic to attempt to collect a representative sample which took account of additional social variables such as social class, regional origin, level of education, occupation and age, no attempt was made to pre-determine the number of contributors in such categories. However, every speech sample collected is described as fully as possible in these respects for each speaker contributing to the corpus. No attempt was made at iwi representation and information on iwi affiliation was not collected.