(from English Phonetics, the journal of the EPSJ,
Vol 21, January 2017, a Festschrift for Professor Jack Windsor Lewis on the
occasion of his 90th birthday.)
I had the good
fortune to get to know Jack Windsor Lewis in Oslo in 1966 and we have remained friends
ever since. We collaborated on a book of exercise material for my adult
education learners (Higgins and Lewis, 1968) while my wife worked with him
preparing his English pronouncing dictionary (Lewis, 1972). His excellent Guide to English Pronunciation (1969)
was what aroused my interest in homographs, especially Chapter Five on word
rhythm. This book has, by the way, perhaps the most striking cover design of
any book in the field of phonetics.
*****
The anomalous
English spelling system has two troublesome by-products. The first is the
existence of nearly three thousand homophones, words spelled differently but
pronounced alike, such as fair/fare.
These lead to many errors in writing, among native speakers as well as foreign
learners, some of the commonest being the confusion of their with there, or principle with principal. It is also a substantial
problem in automatic speech recognition, and many hilarious errors have been
reported when such software has been used to generate sub-titles during live
broadcasts: The air to the thrown prints
Charles, for example. Not that English is unique in this respect. French,
too, is notorious for its homophone problems.
A second by-product,
less problematical and more local to English, is the
occurrence of homographs, words spelled alike but pronounced differently, such
as wound (past tense of “to wind”
versus “injury”) or moped (“was
miserable” versus “small motorcycle”). These are sometimes known as heteronyms.
There are around six hundred and fifty of these, listed on my website http://www.minpairs.talktalk.net/graph.html. Although they may occasionally be
mispronounced, they rarely cause much misunderstanding. However, they must
cause problems for anyone who aims to create an accurate text-to-speech
algorithm for a computer. Not only must two pronunciations be stored for the
one word, but there must also be a means of selecting the relevant one. This
entails either picking up contextual markers or using real-time parsing of the
whole utterance.
To see how well this
is done at present, I selected four devices with inbuilt algorithms, namely a
third-generation Kindle e-reader from 2010 and an eighth-generation model from
2016, the Narrator software in Windows 10, and the voice on my Apple iPad using
Apple iOS9. Each of these offers a single female voice, though I believe
alternative male voices are available, certainly for Windows.
I also chose four
on-line companies out of many offering text-to-speech services: Acapela, Ivona, Oddcast and ReadSpeaker, all of
which offer a demonstration mode on their web pages. These companies offered a
selection of voices, male, female and child, with a range of regional accents.
In some cases thirty or more different voices were
selectable. What I soon learned was that the voices did not all perform
consistently on my test sentences. Clearly they were not drawing on the same
dictionary resource. I did not have the time to run every sentence through
every voice, so I normally took a sample of three or four voices, recording
diversity when I observed it.
I have anonymised
the results to some extent rather than identifying individual suppliers, since
I do not wish to promote one product over another; anyone who wants to assess
an individual company’s product can replicate my tests if they wish. In the
tables below, the first two columns are the results of the two generations of
Kindle e-reader, then Windows 10, followed by Apple, while columns E to H are the four on-line companies though not in the order listed
above.
I began with perhaps
the severest test of all, the choice of /ri:d/
versus /red/ as pronunciations of read.
Interestingly this is something that native speakers hardly ever get wrong in
spontaneous speech and rarely in reading aloud. When they do make an error in
reading, it is usually corrected spontaneously except in highly ambiguous
contexts. This is an interesting fragment of evidence to suggest that our
mental lexicon is stored as an inventory of sounds, with little dependence on
spelling. If software is to match this, it must use a full
parse that can distinguish a noun from a verb, present from past, and a past
participle from an infinitive, rather than simply using proximity testing for to or will or have/had.
I started with four sentences with verbs which
were unequivocally /i:/, unmarked present, infinitive
after a modal verb, infinitive with to,
and imperative.
When
I read his books, I am never disappointed.
I
will read it when I have time.
I
don’t want to read it now.
Here’s
her new book. Read it and then pass it on.
As expected, these gave no problems. I followed
this with four which were equally certainly /e/ though one of them was
deliberately tricky, including a to
in front of a read past participle to
smoke out algorithms which rely on to
to identify an infinitive.
I
have read it already.
Have
you ever read his first book?
When
he read it, he was bowled over.
The
contents of this shelf are restricted to previously read books.
Kindle 2010 |
Kindle 2016 |
Windows 10 |
Apple |
E |
F |
G |
H |
e |
e |
e |
e |
e |
e |
e |
e |
i: |
e |
i: |
i: |
e |
i: |
i: |
e |
i: |
i: |
i: |
i: |
e |
e |
e |
e |
i: |
i: |
i: |
i: |
i: |
i: |
i: |
i: |
These results were very disappointing. All the
engines picked up have read, but only
three of the eight recognised the question form in Have you ever read. The on-line engines picked up /e/ in he read, but the built-in algorithms all
failed this one. Rather as I expected, none of them escaped the trap I set in
the last sentence.
I then tried four sentences which were less
clear cut and might cause even a native speaker to hesitate.
When
I go on holiday I read books like that.
When
I used to go on holiday I read books like that.
I
read almost everything she wrote.
I
read almost everything she writes.
Here, again, results were disappointing with all
the algorithms playing safe with /i:/ apart from one
internet engine which inexplicably played safe in the opposite direction,
giving /e/ for all.
Finally, for completeness, I included two cases
of nouns (always /i:/).
It’s
a good read.
We
need some good reads around Christmas.
As expected, these caused no problems, with /i/ being used in all the algorithms. However, the
conclusion seems to be that very little real-time parsing is being applied.
With a few exceptions, the engines seem to rely on a default pronunciation,
with a few exceptional contexts, perhaps storing the phrases he/she read and have/has read with the /e/ pronunciation and everything else with /i/.
The 650 English
homographs can be broken down into several different categories, not all of
equal importance. I have made a more or less random selection of items from
these groups and assembled sentences to display the contrast. In the tables
below I have used the following symbols:
the symbol + shows clearly
distinguished outputs matching what one expects from a native speaker, eg ‘Please /ri`ko:d/
my objection. You did not make a /`reko:d/
last time.’
the symbol ~ shows the reverse of this, both words wrong, eg ‘Please /`reko:d/
my objection. You did not make a /ri`ko:d/
last time.’
the < symbol shows the first pronunciation used for both words, eg ‘Please /ri`ko:d/
my objection. You did not make a /ri`ko:d/
last time.’
and the symbol > shows the
second pronunciation in both, eg ‘Please /`reko:d/ my objection. You did not
make a /`reko:d/ last time.’
One minor category
is the set of double-stress words (or variable stress as they are called in
JWL’s Guide, p. 53). These are words
which vary according to their position, front-stressed before a noun or end-stressed
when final in the phrase. These include individual words such as afternoon, upstairs, outside, routine, as well as numbers from 13 to
99 apart from multiples of 10, nationality adjectives ending in -ese, and many compound adjectives such
as easy-going or home-made. The difference is usually fairly clear in such examples
as “an outside toilet” opposed to “Take it outside, please.” However, there may
be a tendency towards greater front-stressing. Words like downhill, princess or sardine, which are often assigned to
this category, no longer sound wrong when front-stressed in sentences like
“They’re going downhill”, “She’s a real princess” or “I can’t stand sardines.”
The following
sentences were used to test this feature in the eight algorithms:
Give me the inside story. You were on the inside, weren’t you?
Can you speak Chinese? No, but I can play Chinese whispers.
He was born in
nineteen-twenty-seven. When he came back from the war he was just nineteen.
Kindle 2010 |
Kindle 2016 |
Windows 10 |
Apple |
E |
F |
G |
H |
< |
+ |
+ |
+ |
+ |
+ |
> |
+ |
< |
+ |
+ |
+ |
+ |
+ |
+ |
< |
< |
+ |
+ |
+ |
+ |
< |
+ |
+ |
What this shows
immediately is the advance made by the Kindle, where the early model had all
these words invariably front-stressed. The occasional anomalies I noted in F, G
and H are the product of even
stressing of the two syllables rather than completely wrong stressing. Generally the results were much better than expected.
The second category,
much the most numerous, is variable stress where one form, usually a noun or adjective,
is front-stressed while a matching verb is end-stressed. In some cases the
stress difference is the only distinction, e.g. import, digest or torment. In other cases
there is a reduction of the unstressed vowel: affix, conduct, or produce. The 1989 edition of the Advanced Learners Dictionary listed
nearly 300 words of this type, though it is probably true that some of them are
losing the distinction and becoming front-stressed in all contexts, such as decrease, increase, replay, and
possibly the -port words, import, export and transport, all
of which are often heard front-stressed as verbs. In some cases
one of the uses is obscure and might be got wrong by a native speaker, collect as a noun meaning prayer or second as a verb meaning to send away on
temporary duty for instance, so I would not expect the algorithms to make the
distinction. However I included them, along with other
more clear-cut cases.
Her conduct was deplorable. We cannot allow such a
person to conduct the choir.
He got his just deserts. They sent him to the
deserts of North Africa.
He
would frequent the bars around Soho, and we had frequent visits in the autumn.
Take that object away. I object to its presence.
Kindle 2010 |
Kindle 2016 |
Windows 10 |
Apple |
E |
F |
G |
H |
> |
+ |
+ |
> |
+ |
+ |
+ |
+ |
> |
> |
> |
> |
> |
> |
+ > |
+ < |
> |
> |
< |
> |
+ |
+ |
+ < > |
+ |
< |
+ |
+ |
< |
+ |
+ |
+ > |
+ |
G performed
very inconsistently, some of its thirty different voices making the
distinctions correctly while others did not. There was an interesting
inconsistency in H where the American
voices dealt correctly with deserts
while the British voices failed.
At present there is no need to present him with a
reward.
Please record my disagreement. You did not make a
record last time.
I refuse to stay in this room. There is far too much
refuse everywhere.
Then follows the collect for the second Sunday in
Advent after which we collect your freewill offerings.
We will need to second two officers to the security
section. Yes, I second that.
Kindle 2010 |
Kindle 2016 |
Windows 10 |
Apple |
E |
F |
G |
H |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ > |
+ |
< |
+ |
+ |
< |
+ |
< |
+ < |
+ |
+ |
> |
> |
+ |
> |
+ |
+ > |
> |
> |
> |
> |
> |
> |
> |
> < |
> |
Again G performed very inconsistently. As
expected second was wrong throughout,
but collect fared better, though the perceived
correct versions may have been due to even stressing of the two syllables
rather than front stressing.
She advocates a cautious approach, but she is a
poor advocate for her cause.
He alternates happiness and misery on alternate
days.
There is no need to duplicate your effort. I made a
duplicate of the document yesterday.
She will go into the graduate programme, provided
that she graduates next year.
Kindle 2010 |
Kindle 2016 |
Windows 10 |
Apple |
E |
F |
G |
H |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
< |
< |
+ |
< |
+ |
+ |
+ < |
+ < |
< |
+ |
+ |
> |
+ |
+ |
+ < |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
I was surprised how
well this group was handled since it requires some parsing to distinguish nouns
from verbs.
I
will not put up with this abuse any longer. Nor will I let you abuse your
friends.
That was a close call. Close the door, please.
What’s the use of waiting? We will use up all the
sugar.
I used to enjoy James Bond books. They used up my
leisure time well.
Kindle 2010 |
Kindle 2016 |
Windows 10 |
Apple |
E |
F |
G |
H |
> |
+ |
+ |
+ |
> |
+ |
+ < > |
+ > |
+ |
+ |
> |
< |
+ |
+ |
+ < > |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
> |
+ |
+ |
Again the level of success was gratifying since
some parsing must be taking place. Only F
used a fully voiced /z/ in “I used to enjoy …”
In G and H there were unexpected differences between the available voices.
You
will have to re-mark this paper. I don’t like the remark you made at the top.
You will have to remark this paper. I don’t like
the remark you made at the top.
Kindle 2010 |
Kindle 2016 |
Windows 10 |
Apple |
E |
F |
G |
H |
> |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
> |
> |
> |
> |
> |
> |
> |
> |
There were no
problems with the hyphenated spelling other than with the older Kindle, and I
assume the same would be true with the other words in this set, re-sign, re-form, re-join, etc.
With the hyphen omitted the words are not treated as homographs.
It’s a blessed nuisance. He’s not blessed with much
intelligence.
I never learned how to read books by learned
professors.
Kindle 2010 |
Kindle 2016 |
Windows 10 |
Apple |
E |
F |
G |
H |
> |
> |
> |
~ |
> |
< |
+ > |
+ > |
< |
< |
> |
< |
< |
< |
+ < |
+ < |
I was surprised that this was so poorly handled by most of the
algorithms, and at how the iPad got blessed
completely the wrong way round.
We’ll make a rendezvous, but he has missed the last
five rendezvous, and I am not hopeful.
Kindle 2010 |
Kindle 2016 |
Windows 10 |
Apple |
E |
F |
G |
H |
< |
< |
< |
< |
< |
< |
< > |
< |
I was not surprised that there were no /z/ plurals audible here. Many
native speakers might hesitate when reading aloud, although in spontaneous speech
they would probably put in a /z/ here or even with proper names: “The
restaurant stocks a good range of Beaujolais and Chablis.”
This leaves a large group of homographs arising from
a variety of causes. I have selected ten of these more or less at random.
How do you calculate the arithmetic mean? It
doesn’t need much arithmetic.
The stamp attaches here. It was issued by the
attachés at the embassy.
He often catches sea bass. Then he goes to church
and sings bass in the choir.
She won the bow and arrow contest, then took a bow
in front of the audience.
What does it mean when the does assemble in
Richmond Park?
You entrance me. From your first entrance I was
overwhelmed.
Kindle 2010 |
Kindle 2016 |
Windows 10 |
Apple |
E |
F |
G |
H |
< |
+ |
< |
~ |
+ |
> |
> < ~ |
+ > |
+ |
+ |
+ |
+ |
> |
+ |
< |
+ < |
> |
+ |
> |
> |
+ |
+ |
> < |
+ |
< |
< |
< |
> |
> |
> |
+ < |
< > |
< |
+ |
< |
< |
+ |
+ |
+ < |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
Arithmetic was better dealt
with than expected, but I was again surprised that the iPad made the
distinction but in reverse. A big disappointment was the word bow. Only one of the voices in G got it right. To balance that, I was
surprised to find all the algorithms succeeding with entrance.
To finish the
discussion he made a fine distinction. Well a finish
distinction I would say.
How can you
write French when there is no grave accent on your keyboard? Voltaire would
turn in his grave.
This permit is
invalid. This parking space is reserved for an invalid.
I had quite a
job persuading people to read the Book of Job.
Kindle 2010 |
Kindle 2016 |
Windows 10 |
Apple |
E |
F |
G |
H |
< |
< |
< |
< |
< |
< |
< |
< |
> |
> |
> |
> |
> |
> |
> |
> |
> |
+ |
+ |
< |
+ |
+ |
+ > |
+ > |
< |
< |
< |
< |
< |
+ < |
+ < |
+ < |
The only success with the job/Job
distinction was with the American voice in F
and some but not all the American voices in G and H. Does this
reflect the strength of religious observance in the USA? I was rather
mischievous to include the two pronunciations of finish, which defeated all the algorithms as expected. I was
disappointed that grave also defeated
them.
Overall the results are mixed. Text-to-speech has made huge advances on
the robotic voice we have associated over the last twenty years with Professor
Stephen Hawking. While none of the algorithms is yet capable of rendering an
extended passage of prose fiction as pleasantly as a trained actor, all of them
deliver fully comprehensible speech, often with pleasant natural timbres and an
acceptable range of intonation. However, it is clear that the handling of
homographs remains problematical. Mistakes rarely cause complete
misunderstanding, but they do interfere with smooth comprehension and
pleasurable listening. The better results obtained from the 2016 Kindle as
against the earlier model show that the field is developing, and by the time
this paper appears it may be that many of the specific shortcomings in current
software will have been fixed. It is worth remembering that this work does owe
something to the pioneering work of phoneticians in the last century among whom
Jack Windsor Lewis was a significant figure.
References
Higgins, J.J. and Lewis, J. Windsor. Pronunciation
and Listening Practice, a language laboratory course. Illustrated by David Handforth. Oslo. Studentersamfundets
Fri Undervisnings Forlag,
1968.
Lewis, J. Windsor. A Guide to
English Pronunciation for users of English as a Foreign Language. Oslo,
Bergen, Tromso. Universitetsforlaget,
1969.
Lewis, J. Windsor. A Concise
Pronouncing Dictionary of British and American English. London. Oxford
University Press. 1972.