Don’t ask the admiral …

A project to enumerate minimal pairs in English RP
John Higgins, Stirling University

Paper for Scottish IATEFL conference, Spring 1997

I notice we have a few people present who already know plenty about phonetics. I hope they will bear with me for the first ten minutes while I establish a starting position.

We are quite used to the idea that foreigners talk funny, but we may not be so used to the idea that foreigners listen funny. Or that we can “listen in English” or “listen in Japanese”. If we listen in English when someone is talking Japanese, or listen in Japanese when someone is talking English, we will not hear as if we listen to English in English, or listen to Japanese in Japanese. Nor do I mean that we hear more; on the contrary, we may hear less, since listening in one’s first language lets us filter out irrelevant information as well as supplying mentally what is not present in the stream of speech through elision and assimilation. We use our understanding of meaning to “write out in our minds” the complete utterance, something which probably proves much harder “in foreign”.

Phonemes

The framework that we use in listening to and speaking a language we know well is its phonological system, the agreement among the speech community as to which sounds or suprasegmental features (stress and pitch) will affect word-level meaning. Although the range of different sounds our speech organs can make is almost infinitely variable, all languages seem to operate with a finite set of contrasts which are used to create the word-store. These are the phonemes. How any phoneme is realised on any occasion is subject to all kinds of variability. There is speaker variation, the timbre and pitch of the voice, its volume or pace. There is free variability. Most English speakers will pronounce the words WHICH or WHY with a voiced initial consonant /w/ some of the time, and a voiceless one resembling /hw/ at other times, but without a clear principle governing the choice. There is distributional variability; in RP we have clear /l/ initially and dark /l/ finally. Welsh speakers tend to use clear /l/ throughout and some Americans have dark /l/ throughout. But meaning doesn’t change. For Polish speakers, I understand, the contrast is phonemic. Two words differing only in the pronunciation of the /l/ can have different meanings. For us the difference between /r/ and /l/ is phonemic, and words like LICE and RICE are different, importantly different. It is quite difficult for us to imagine ourselves into the situation of a speaker for whom the /r/ versus /l/ difference is no more significant than the /w/ versus /hw/ difference.

Minimal pairs

To establish the set of phonemes, what linguists have traditionally done is scour the language for minimal pairs, pairs of words whose pronunciations differ by only one segment but which have distinct meanings. Thus PAT and BAT show that /p/ and /b/ are distinct, PAT and FAT distinguish /p/ and /f/ and so on. This way we can establish the inventory of phonemes, which in the case of English RP amounts to 20 vowel elements (vowels and diphthongs) and 24 consonants. There are some well-known problems. /h/ never occurs in final position in the syllable, and /ŋ/ (the velar nasal) never occurs initially in a stressed syllable, so there are no minimal pairs to separate these sounds. It would be possible to analyse /h/ as a distributional variant of /ŋ/. In this case, however, the sounds are so unrelated that this would not be sensible and conventionally they are assigned to separate phonemes.

Self-repair

The use of minimal pairs has extended from their theoretical role in establishing the phoneme inventory into the practical function of training foreign learners to recognise and, one hopes, come to make the native speaker’s phonemic contrasts. The argument for such activities might run as follows. Any language can tolerate a certain number of homonyms and homophones, words which look or sound the same but which have different meanings. However, there comes a point where there is potential for confusion. I have established that English has over 2400 homophones from a lexicon of 70,000 forms (not lemmas) recorded in the Advanced Learners Dictionary, mostly pairs of words but with a few larger sets running up to the sixfold set of AIR, HEIR, ERE, E'ER, AYR and AIRE. There is a widely held view among linguists that language is a self-repairing mechanism; when things are too complex, we simplify, and when things are too simple or similar we differentiate. If the number of homophones in general use ever gets beyond some critical point, the language community would spontaneously introduce some new terms or vary the pronunciation to fix the problem. There is a nice example of this in the words ORAL, of the mouth, and AURAL, of the ear. For the population at large these are homophones, but this scarcely matters as they use ORAL only when talking about oral exams and oral hygiene and AURAL hardly at all. Among applied linguists and language teachers the confusion has become dangerous, since they have plenty of occasions to talk about matters concerning the ear and the mouth. So, within the last twenty years, a new pronunciation of the word AURAL has emerged, /ɑʊrǝl/, to save the confusion, but this pronunciation is in use in only one very small group of professionals. I must admit, however, that I have my doubts about the efficiency of the self-repair mechanism. If it is in place, why has it not widened the gap between 13 and 30 and all the other -teen and -ty words? Somehow we continue to put up with missed appointments and missed trains because X said 10:15 and Y heard 10:50.

Ear training

Foreign learners, the argument goes, who confuse and conflate contrasts which matter in English are going to enlarge greatly the stock of homophones, both aurally, words which sound the same to them, and orally, words which English listeners will mishear. This takes them past a threshold of confusion, imposes on the listener (whether foreigner or native) a large extra penalty of decoding and guesswork, making conversation much more tiring and misunderstanding much more likely. That is the argument, and it leads to quantities of WHICH WORD DID YOU HEAR single word exercises in classrooms all over the world. In most cases there are problems which are well known locally. Many nationalities confuse FEET and FIT. Arabic speakers replace /p/ with /b/, buying things with ONE BOUND or going to Westminster to see the PIG PEN CLOCK. German speakers traditionally use /v/ in place of initial /w/, (“Ve haf vays of making you talk.”), but sometimes hypercorrect by putting /w/ into words with initial /v/. “I live in a willage”. Japanese airline staff have been known to wish passengers a present fright. For all these cases and more, there are drills to deal with the problem. Unfortunately the treatment does not seem to lead infallibly to a cure, and there are teachers who would gladly abandon remedial pronunciation work altogether.

Why isn’t minimal pair ear-training working? One reason may be that the important confusions are not always minimal. You may, like me, have seen a recent poster at this university advertising a rugby fixture, STIRLING VERSES LIVERPOOL. Homophones are begetters of spelling mistakes and spelling mistakes are indicators of homophones; probably the commonest spelling error in English is THEIR for THERE, closely followed by PRINCIPAL for PRINCIPLE. Minimal pairs frequently beget spelling errors in overseas students’ writing. But VERSUS and VERSES are not even a minimal pair, though the two points of difference are both in an unstressed syllable, which tends to neutralise the distinctions. Perhaps we should not be trying to fix the problem at the level of the distinctive segment, the phoneme, but at the level of the distinctive feature or setting, which may extend over several segments. Thus the words WINS and WINCE seem to be a minimal pair /wɪnz/ versus /wɪns/ distinguished by the /z/ versus /s/ contrast. Looking at the words with a spectrogram will show the final consonants as effectively identical. The difference is between a long fully voiced /ɪn/ and a shorter partially devoiced /ɪn/.

Need for lists

Meanwhile if we assume that what we are doing with our ear-training and repetition drills is increasing intelligibility by reducing the number of homophones in learners’ speech and hearing, then it is worth while to count the cost, to measure just how many extra homophones are created when a learner makes a particular mistake. As far as I know this has never been done. It is not easy to do manually, and to do it computationally depends on the availability of machine-readable pronouncing dictionaries. Once one has such a resource, however, it becomes possible to ask the question, “How many extra homophones will a FEET/FIT or LOCK/ROCK confusion put into a learner’s English?” One can start with a raw count, but later must ask what the real potential for confusion is, in other words which of the pairs are the same part of speech and so capable of occurring in the same structural context, and how many of them might occur within the same topic of discourse. It is fairly unusual, I think, to find a pair as close as this:
    You should have heard them jeering/cheering at the end of the game.
in which a crucial distinction of meaning is carried solely by a voicing feature.

The first object of the project I have started is to round up every minimal pair in English. That is not as easy as it sounds. There are 190 potential contrasts of vowels and 280 of consonants (including the empty /h/ v. /ŋ/set). Then a question arises as to whether one should also include the contrast of a sound versus the absence of that sound, ie pairs such as BACK and BANK or BACK and BACKER. In some cases this will have relevance to learners; the contrast EAT/HEAT is a known difficulty for the French. Another crux was contrasting a consonant with a vowel. It would seem that you can’t do that without affecting the syllable structure; SCREEN and SERENE differ only by one sound, but they do not feel like a minimal pair. This, however, would not apply to a syllabic /l/. BARROW and BARREL or CABER and CABLE seem much closer to being minimal.

Sources of the lists

The data set I am using is the 1974 computer-usable version of the Oxford Advanced Learners Dictionary (CUVOALD) which has been updated (1995) and deposited in the Oxford Text Archive by Roger Mitton. The dictionary contains pronunciations, grammatical codes, frequency codes, and the syllable count for each entry, but no definitions, and was intended to provide a raw list for anyone experimenting with spell-checkers and grammar aids. Mitton has expanded each lemma to include all inflections, and has added common place names and personal names, turning the 30,000 original headwords into a list of 70,646 words. Pronunciation for each entry is given in the version of machine-readable phonetics for English RP proposed by John Wells for the Alvey Committee. Words with two pronunciations (homographs like WIND or RECORD) are given two entries, though other homonyms are not.

The first step was to identify all the homophones. This entailed re-indexing the dictionary according to pronunciation and identify any adjacent identical pairs. The next step was to take the symbols for two distinct sounds and to replace them with the same dummy character, then reindex the dictionary and identify the extra false homophone pairs created. These with the original symbols reinstated would make up the minimal pair list, though a good deal of checking and editing turned out to be necessary.

In displaying the lists on the Stirling University web site (since 2005 on my marlodge.net website) I included the raw number of pairs found, ranging from 0 (found only in pairings with consonants /h/, /ʒ/ or /ŋ/) to 1,009 (PAT versus CAT). I also attempted to supply a rating for the importance of the contrast, the similarity of the sounds and therefore the probability of confusion, from 1, very different and therefore unlikely to cause problems, to 5, very similar and therefore a likely cause of difficulty. If the self-repairing hypothesis is true, one would expect the low numbers to match high frequency and high values to match low frequency. A quick glance at the tables shows no such correspondence. Does this mean that self-repair does not occur? To be sure of that one would need to look at the prevalence of the sounds involved and the meaning contrasts created. With this in mind I worked out a “semantic loading” feature for each contrast, but I have not yet done the mathematics that would indicate if any link between difficulty and frequency exists.

Postscript

The lists as they stand provide a source for any teacher wanting to create some practice exercises for their classes, and for this I get thanked from time to time. Whether I have supplied data on language change I cannot say, and by now (2025) I have run out of energy and insight to go further.

Revised Chiang Mai
May 2025