A Deep Dive Into Pronunciation Changes

What makes some sounds easier to pronounce?

Feb 01, 2023

Unanswered Questions

In an earlier article, I summarized Guy Deutscher’s The Unfolding of Language. That book presented three ways that languages change: economy, emphasis, and analogy. Looking back, the “economy” piece left me unsatisfied for a couple of reasons. First, I wasn’t quite clear on why we find some sound combinations easier than others. I get that we say “hafta” instead of “have to.” But why is “ft” easier than “vt?” I can say the supposedly harder one over and over: “have to have to have to have to have to have to,” and I’m not laying on the floor gasping for air or anything. For another example, one can hear North Americans dropping the “t” in “plenty.” Why the “t,” though? Why don’t we hear “p’enty” or “ple’ty?” Secondly, Deutscher often highlighted cases in which speakers reduced the length of a word. Yet, languages haven’t evolved into a series of tiny phenomes. The author argued that the other two tendencies (emphasis and analogy) can extend words, but these effects don’t seem strong enough to override the constant shortening of words.

This article will try to shed a bit of light on both questions. I can’t offer a “theory of everything” with respect to language change, but I will try to provide examples that illuminate general patterns. This article will focus on English accents in North America because that’s how I talk. If you’re not a native English speaker, or if you speak a different accent, some of the items might not apply. Yes, by the way, I am aware that there are a variety of accents in the US and Canada. No, that doesn’t invalidate anything I’m going to write. Finally, I must also acknowledge my debt to George Yule’s The Study of Language (Fifth Edition), which provides the basis for explanation of consonants.

Consonants

The How

Read the following bullet points loud for me, without adding any vowel sounds:

mmmmmmmmmmmmmmmmm
rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
ffffffffffffffffffffffffffffffffffffffffff
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
ttttttttttttttttttttttttttttttttttttttttt

You probably managed the first four with ease. The “t” one, however, probably tripped you up. You might have added a vowel in between and said something like “ta-ta-ta-ta-ta-ta.” I specifically said not to do that, so you’ve failed the assignment. Other readers probably transitioned from “t” to “s.” If so, maybe you can guess why “out” is “uit” in Dutch and “ut” in Swedish, but “aus” in German. Others might have just spaced out the “t” sounds and spoken that line slower than the ones above.

There’s a reason the “t” doesn’t quite flow the same way. It represents a type of consonant called a stop or a plosive. I’ll opt for the word “stop” in this article, at least until I come across a big red “PLOSE” sign. These consonants consist of a brief stoppage (or, presumably, plossage) of airflow followed by a release. English contains six stop consonants: “t,” “d,” “p,” “b,” “g,” and “k.”

I’m simplifying here, as I tend to do. There’s a bit more variety in our pronunciation than we can express via the standard Roman alphabet. To address this issue, linguists have created the International Phonetic Alphabet (IPA), which contains a unique symbol for each sound. I’ll try to avoid IPA when I can. Still, it’s worth considering the difference between the “t” sounds in “tap” and “eighth.” Linguists represent the former, an aspirated consonant, as “tʰ.” If you put your hand in front of your mouth, you can hear the outward breath (aka, an “aspiration.”) Meanwhile, one hears a “t̪” in “eighth,” as we alter the sound in anticipation of the “th.” Some speakers seem to drop the “t” in words like “button.” We call this a glottal stop, and it boasts one of the coolest IPA symbols: ʔ. You hear this sound between the vowels in “uh-oh,” and it’s more common amongst the Briʔish. There’s one more common “t” sound, but it’s not a stop, and it’s interesting enough to deserve its own section.

Enough about stops, let’s get to the other consonants. Many English consonants fall into the fricative category. With these, we narrow our vocal passages to create a bit of friction as the sound leaves our mouths. They are:

s
z
f
v
th (in thing); IPA: θ
th (in this); IPA: ð
sh; IPA: ʃ
s (measure); IPA: ʒ
- This will be the one sound that I will always refer to via IPA since I can’t think of any combination of standard characters that represents it. I sometimes see people write it as an “sz,” but there are not enough “sz” words for everyone to see that as the “s” in measure.
h

Other consonants, called nasals, allow air to escape through the nose. English contains three such consonants: m, n, and the “ng” sound in a word like “sing.” That latter sound gets represented with a “ŋ” in IPA. When you hear someone say “fightin’,” there’s no “g” getting dropped. Instead, the speaker is opting for an “n” over and “ŋ.” You can also feel the difference between nasal and other types of consonants. Put your hand on your nose and say “sssss,” “rrrr,” or “ta-da” and you won’t notice much. However, you will feel an expansion of your olfactory organ when uttering “mmmm,” “nnnnnn,” or the back part of “ring.”

The last group contains the vowel-esque approximants. This group includes “glides” which involve a, you know, gliding of the tongue. English contains two glides: “w” and “y.” The latter is written as “j” in IPA, which will make sense if you’re familiar with German or Dutch. Finally, we have the liquids: “r” and “l.” The English “r” requires us to curl our tongues back, while our “l” sounds block the middle and force air out the sides of the tongue.

The Where

Consonants, unlike vowels, require some restriction of airflow. This restriction occurs at the point of articulation. This section will explain why consonants with the same manner of articulation (detailed in the previous section) sound different.

Let’s start with the lips. These are the point of articulation for the “m,” “b,” “p,” and “w” sounds. You’ll notice that both lips touch with these sounds, earning them the name bilabial. Other times, you only need the bottom lip and the top row of teeth, and linguists refer to these sounds (“v” and “f”) as labio-dental. Next is the bane of every English-learners existence: the dentals. We represent both dental sounds with a “th,” and they occur between our teeth. If you’re learning English and struggle to create these sounds, don’t worry, it’s just in our most common word. This paragraph is getting repetitive so I’m gonna head to bullet points for the rest:

Alveolar (“t,” “d,” “s”, “z,” “n,” “r”). The alveolar ridge is the slope between the teeth and the top of the mouth.
Post-alveolar (“sh,”“ʒ,” and “r”). These occur just behind the ones above
Palatal (the “y” in “yellow”)
Velar (“k,” “g,” and “ng”). We articulate these consonants at the back of the mouth
Glottal (“h” and “ʔ” in English, but this would also include the “g” in Dutch). These get articulated deep down in the glottis.
Other languages contain “uvular” consonants, such as the French “r.”

The Voice

The previous two sections often listed pairs of consonants together: “t” and “d,” “g” and “k,” “sh” and “ʒ;” and a few others. In each case, we articulate these sounds in the same place and in the same manner. They share the “how” and “where.” These sounds only differ due to the presence or lack of vibration from the voice box. We call the vibrating ones “voiced” and the others “unvoiced.” The unvoiced ones tend to have a bit more of a buzz to them. Compare the buzzy “ʒ” to the less buzzy “sh.” You can also contrast the two “th” sounds in the phrase “the thing.” The first “th” buzzes while the latter feels like more of a spit. Since English prefers its consonants voiced, I’ll only list the unvoiced ones:

h
t
s
sh
p
k
f
th (as in “thing”)

Vowels

There’s less to discuss for vowels. Vowels involve a free flow of air, so we don’t have to think about the manner of articulation. English speakers almost always voice vowels, so we can also ignore that complexity. Thus, vowels differ mostly on the location of articulation. I don’t need to write any paragraphs explaining the concept of “front” and “back” or the notion of the “high” and “low,” and I think Wikipedia’s chart will suffice:

This represents a side view of the mouth, with the left sitting closer to the teeth and the top resting closer to the palette. Don’t worry about all the IPA symbols for now. We’ll run through a few examples and highlight our point of articulation. This exercise won’t travel as well as the ones involving consonants, so apologies to those without a North American accent.

Top front: feed, key

Mid/top front: fit, kit

Bottom front: fad, cat

Central: alive, elephant

Low central: fund, cut

Top back: food, coop

Bottom back: pot, cot

In some accents, you might hear a separate vowel (as in caught) that sits a bit above the last set. For me, however, “cot” and “caught” sound the same.

Finally, we have diphthongs, which require a movement of the tongue. In the word “sky,” I move my tongue from a bottom central position to a high front one. In “show,” I move it from the middle of the back to the top of the back. Other diphthongs include the vowel sounds in “great,” “toy,” and “cow,” where you can probably trace the movements in your mouth.

One last extra piece of info that applies to both vowels and consonants: making a circle with your lips. If you do so when making a sound, the vowel becomes “rounded” while the consonant is “labialized.”

The Boring Stuff

I’ll begin with a couple of boring examples. I consider these boring because I referenced them in the previous piece. One common process is called “nasalization,” where nasal consonants change their preceding vowels. The “a” in “man,” for instance, doesn’t sound like the one in “map.” Another common change occurs when speakers alter an earlier vowel based on the location of a later one. This creates some of our irregular plurals, like “men.” In an older version of the language, the plural of “man” sounded something like “man-eez.” Speakers reduced the distance between vowel sounds by switching “men-eez.” Later, the final syllable fell off, leaving us with a vowel change but without the reason for the vowel change.

Which Sounds Do We Lose?

Apologies for all the background info, but we’re now ready to get to the point. So, why “eas’ coast” and not “ea’t coast?” Why “frien’ly” and not “frie’dly?” It seems that we tend to drop the stop, likely because these consonants cut off airflow. It explains why you see you “and” written informally as “n,” while there’s nothing informal about the Dutch cognate “en.” It’s why you won’t be surprised to learn that the African American English Vernacular word “bae” came from “babe” rather than, say, “base” or “bane.” Did you ever wonder why “knife” starts with a '“k” or “gnaw” starts with a “g?” English speakers once pronounced these sounds, but, when choosing between the nasal and the stop, the latter had to go.

The Part We Hafta Get To

Returning to the first paragraph, why is “hafta” easier than “have to?” After all, this preference doesn’t seem unique to English. Dutch verbs usually add a “t” to their stem in the third-person singular conjugation. One might expect a similar conjugation for “heben,” but the Dutch speakers will say “he has” as “hij heeft.” There seems to be some cross-cultural agreement on putting an “f” before the “t.” I can’t pretend to know exactly why this occurs, but I have a guess.

Sorry to jump around so much, but we hafta take one more detour before solving the mystery of “hafta.” Imagine a non-native speaker asked you how you determine the proper plural of a word. He’s not talking about irregulars like “mice” or “teeth.” He’s just asking about the normal ones, the ones that end in “s?” How do we know which sound that makes? Consider the following:

dogs
cats
apps
abs
frogs
blocks
plans
arms
holes
keys
toes

We don’t acknowledge this distinction in the written word, but some of these end with an “s” sound while others end in a “z.” I’m not sure if the average native English speaker could specify the rule. Remember from the first section that consonants contain three main features: type of articulation, place of articulation, and voice. The first two won’t help us here, but we can rely on the voiced-unvoiced distinction. Consider “cats” versus “dogs.” In the former, the singular word ends with a voiced consonant: “t.” It also receives an unvoiced plural declination. The latter ends with an unvoiced one, “d,” and gets the buzzy “z” ending. In other words, we prefer to keep a string of constants on one side of the voiced/unvoiced divide. This voice-matching preference can explain all the examples above, alongside almost every other regular English plural that I’m aware of. It can even explain some irregular plurals like “knives” and “leaves.” Personally, both “leafs” and “leaves” sound right to me, but I would never say “leafz” or “leavss.”

That explains the mystery of “hafta.” There, the two consonants possess a matching voice, while the old school “have to” forces us to vibrate our voice box immediately after finishing an unvoiced consonant. The same applies to the Dutch example. “Hebt” (which they use in the second person singular, for what it’s worth) requires a switch between a voiced stop and an unvoiced stop. It’s easier to keep the voice the same, and it’s probably even easier to swap out that first stop for a fricative.

Our writing obscures the importance of voice matching in English. Consider words that contain either “j” or “ch.” Both cases represent an affricative or a stop followed by a fricative. Using IPA, we can describe the first as /dʒ/ while the second as /tʃ/. Think of all these words:

chain
jump
cheese
joy
chimp
jaw

Meanwhile, I can’t think of a single word with /tʒ/ or /dʃ./ I struggle to even make these combinations.

Linguists have found a general trend in language, where consonants become voiced in between vowels. Since we usually voice our vowels, this process keeps all the sounds on the same page. That’s why, if you read this article to anyone, he or she has probably learned about “nazal” and “frigative” consonants. Consonants tend to de-voice at the end of words, which is why the German “Hand” sounds less like “hand” and more like “haunt.”

Adding Sounds

Quick exercise. Say “bed,” pause for a bit, and then say “room.” Keep doing that, but lessen the length of the pause until you reach the compound word “bedroom.” Some of you may pronounce the word bedroom the same way you would pronounce “bed” and “room” on your own. In that case, I just wasted your time. Others might notice that an extra sound snuck in there. You’ll hear something like bedʒroom.

What’s going on here? If our only examples of “economy” involve dropping letters, this may seem a bit odd. It will make a bit more sense though when we run through the mechanics of pronouncing this word. Let’s skip the “be” and “oom” part and focus on the middle. Without the added sound, we’d head straight from an alveolar stop (“d”) to the post-alveolar approximant (“r.”) In other words, we’d move our tongue back a little bit, and switch from a consonant that cuts off the airflow to one the allows for fairly free airflow. The “ʒ” allows for an easier transition. We articular it at the same place as the “r,” meaning our tongue’s already in the right spot. It’s also an affricative, so it splits the difference between a stop and an approximant. Thus, the sound is “on the way” between the two consonants, so it makes things run more smoothly if we add it in.

English spelling may suck and impede people from learning the language, but our outdated spellings allow us to see the history of the language. Consider the word “drink.” The Dutch one, not the English. Native speakers pronounce that word more or less as it’s spelled. Now, consider the seemingly identical English one. As with “bedroom,” most of us need to sneak a “ʒ” in there to get the word across the line. That word isn’t unique, either. Say “drink,” “drop,” “draw,” or “drill.” None of those words transition straight from the stop to the approximant. The same happens with the unvoiced sibling. If you say “train,” “try,” or “trap,” you can hear yourself saying “tshrain,” tshry,” and “tshrap.” English speakers once pronounced these words as written, but our linguistics ancestors performed some subtʃraction by addition.

There are other instances where speakers add sounds to words. In my less-politically correct high school days, students would insult each other with the word “estupid.” This referenced the common pronunciation of “stupid” among native Spanish speakers. Linguistics calls this process “prothesis,” or adding an additional sound to the beginning of a word. Prothesis has played a large role in the history of Spanish, leading the Latin “schola” to the modern “escuela.” That extra vowel sound doesn’t help much on its own, but in a sentence, it can help link together disparate consonants. English has a bit of a history of adding sounds in the middle, such as the “b” in “timber.” It’s easy to see why speakers added a “b:” it’s a voiced consonant that we articulated at the same spot as the “m.” In the UK, the dropped “r”s have reasserted themselves to link vowels together. An Englishman might say “the water is” and “the wat-uh was.” This consonant has even gone on the offensive, too, appearing in words like “idea” where there was never an “r” in the first place.

The “r” seems to wreak a lot of havoc in English. It colors the preceding vowels, adds fricatives, and murders the “w” in words like “wreak.” It also distinguishes English from other nearby languages since English doesn’t contain anything similar to the rolled r in Spanish or Dutch.

… or does it?

The Metal Section (or is it the Medal One?)

One last word to analyze: “metal.” The first two sounds are straightforward. They amount to normal pronunciations of the “m” and “e.” Going in reverse, the “l” seems normal. Yet, there’s a bit more complexity there, too. English deploys a separate “light” and “dark” version of “l” in words like, well, “like” and “well.” If you pronounce it at all, the “a” sounds more like “uh.” That’s a schwa, and it’s everywhere. Considering the vowel chart, we can see why the schwa fills so many vowel slots: it sits in the middle of the mouth.

Finally, let’s delve into that “t.” If I have any UK readers, they probably pronounce the “t” in a standard. They’re probably “that’s normal “t,” innit mate?” Things get more interesting in North America. If you grew up in Canada or the US, you may have struggled to differentiate the spelling of “metal” from that of “medal.” You might even transcribe the word “metal” as “medal” when you hear it spoken. It’s not quite a “d” sound either, though. If you say the word “dad,” your tongue movement won’t match the one in “metal.” Instead, you’re flapping that middle consonant. I didn’t even mention this manner of articulation in the consonant section, since it doesn’t occur anywhere else in North American English (as far as I know, at least). This phenomenon combines two of the ones mentioned above: the tendency to avoid stops and to add a new consonant that’s “on the way” to an approximant.

One thing that fascinates me about this change is the IPA symbol for the sound. What do you think it looks like? A “t” with some squiggly lines? An upside-down “d?” Maybe some weird alien symbol like the glottal stop? Nope, it looks like this: /ɾ./ Yes, that’s an “r.” Other languages recognize this sound as an “r,” and it’s the closest English gets to the rolled or trilled sounds heard in Spanish, Dutch, or Italian.

I remember reading a Dutch pronunciation guide, which explained the Dutch “r” with the English word “butter.” I thought, wait, isn’t that a normal “r?” The article then clarified that the Dutch “r” doesn’t like the “r” in “butter,” but the “tt.” That’s why I love phonology. The “t” sound birthed a second “r” sound that’s articulated in a different manner than all of our language’s other consonants. That’s the sort of simplicity I can get behind.

Erin E.

Feb 1, 2023

In Chicago, we had a good friend who kept talking about her "french room" and I was finally like what the hell is that. Turns out, it was the "front room."

Expand full comment

Mari, the Happy Wanderer

Feb 3, 2023

Such a fun and fascinating essay! My daughter is taking a linguistic class in college, and I am going to (gonna?) send this to her, because she will enjoy it!

Your bit about the different pronunciations in British and American English made me laugh, because our choir, made up of Swiss women plus me, just had our concert, which featured Motown songs. I had to teach the Swiss ladies to sing the American way--converting t to d and dropping the g from ing, as you note.

Another friend is a different choir, and there are no Americans. They sang Leonard Cohen’s “Hallelujah,” and it was pretty funny to hear them articulating the words so precisely (“do you” instead of the American “do ya”) that the rhymes didn’t work.

1 reply by Klaus

7 more comments...

Simplify

Discussion about this post