Cantonese is usually regarded to be one of the dialects of Chinese. This language is spoken by millions of people in many areas of the World, for example, the United States, Australia, Europe, South America, etc. It originated in Southern China in the province of Guangzhou, which is known in the English speaking world as Canton. The name of the language comes from the place where it originated.

With migrations of Chinese people Cantonese has spread to many different countries, and this language is one of the main languages in Hong Kong, where the government officials use it for official communication. The language is also used in everyday life in Hong Kong, and many schools use it as the main language of instruction.

In mainland China, it is used as Lingua Franca, which means that it is a basic means of communication in the domain of trade, where it is used by people who do not share a common code of communication.

Even though, superficially, Cantonese may appear very similar to Mandarin, the speakers of the two languages usually have trouble understanding each other. Most of the similarities are in the domain of lexicon, but the differences are located in syntax and phonology. However, Cantonese has undergone some sound changes which have not affected the orthography so texts in written Mandarin and Cantonese are almost identical.

Since one of the locations where this language is used the most is Hong Kong, the language is under tremendous influence of Western languages, predominantly English. Hong Kong is now a huge melting pot of various cultures, the most prominent of which are those based on English and Cantonese. Therefore, English has influenced Cantonese to a great extent, which is particularly apparent in the lexicon.

Although Cantonese has ideographic orthography, it has been Romanized so there are systems for writing it by using Roman alphabet. The most commonly used Romanization system is the so-called Yale. This system was developed for multiple purposes, such as writing books and dictionaries. However, the most important purpose of this writing system is for instruction of foreign language learners.

With this purpose in mind, Parker Po-fei Huang and Gerald P. Kok developed this orthographic system. This orthographic system is not flawless, and it has some features that are quite problematic to learners. The famous example is the authors’ decision to write aspirated voiceless plosives by using the same letters that are used to mark voiced plosives in languages that distinguish between voiced and voiceless sounds.

Therefore, aspirated voiceless plosive [p] is written as b, while its non-aspirated counterpart is written as p. Despite all these difficulties, the experience of foreign language instructors and learners is such that one might say that this writing system makes the process of learning Cantonese immensely easier.

Tone Language

Similarly to Mandarin, Cantonese is a tone language. This means that the phonetic quality of vocalic tone actually influences the meanings of lexemes. This quality is a suprasegmental quality in a language, which means that it cannot be marked down to a single segment or a phoneme.

Different tone can distinguish between two syllables that have exactly the same phonemic content. Therefore, linguists working on Cantonese have found that this language contains somewhere around 1760 different syllables.

It is an accepted claim that Cantonese has six different tones. The first one is present in two forms – high level and high falling. These two forms do not make for different lexemes, and they are used almost interchangeably. In Hong Kong, this distinction is becoming obsolete.

The tones that make lexemic difference are medium level and medium rising; low falling and very low level; low rising and low level. In addition to these, some linguists locate three additional types of syllables, which are marked by high level, medium level and low level tones, but contain a plosive in the coda.

IPA uses diacritics above vowels to mark these various tones, and symbols are as follows: High level – í; High falling – î; Medium rising – ǐ; Medium level – ī; low falling – i̖; very low level – ı̏; low rising – i̗; low level – ì.

This way of marking tone makes it very easy to remember, as the position of the diacritic sign points to the way in which the vowel is pronounced. The Romanized spelling marks tones by simply using numbers from one to nine after syllables – it includes the group ending in plosives.

The Syllable Structure

In Cantonese, there are various types of syllable structure. Most commonly, syllables begin with an initial consonant which is called the onset, and there are 19 consonants that can start a syllable. An initial consonant is not obligatory in Cantonese as one may find syllables without the initial consonant, in which case it is said that there is a null initial. Consonant clustering in the onset of a syllable is not a characteristic of Cantonese.

The part of the syllable complementary to the onset is the rime. The rime can in turn be divided into nucleus and coda. As we can conclude by the name itself, nucleus is the main part of the syllable, and it is the only obligatory part. Coda can consist of one or more consonants.

In Cantonese, we can have syllables without the coda, which are called open, but we can also have closed syllables which contain a consonant in the coda. However, it appears to be the case that we cannot have consonants clustering in the coda. Also, the choice of consonants in the coda is limited to nasals and stops only.

In addition, there are also diphthong-like elements in Cantonese. A diphthong is defined as essentially one vowel which changes its quality during the process of pronunciation. Nevertheless, speakers perceive that there are two vocalic elements at two ends of the diphthong.

Cantonese speakers use diphthongs, but the diphthongs end in semi-vowels, not vowels. These semi-vowels are: [w] and [j]. Therefore, the possible diphthongs in Cantonese are: [ej], [a:j], [aj], [u:j], [ɔːj], [i:w], [ɛːw], [aw], [a:w], [ɔːw], and [ɵ:y].


Vowels are sounds which are produced without an obstacle in the oral tract, and these sounds usually carry syllables.

Cantonese Chinese has 14 different vowels. These are divided into high, mid and low according to the height of the tongue when the vowel is pronounced, and front, central and back vowels, according to horizontal positioning of the tongue.

High front vowels are: [i], [y], [ej]; mid front vowels are: [ɛ], [ɛ:], [e] and [œ]; central low vowels are [a:] and [a]; central mid vowel is [ɵ]; high back vowels are: [u:] and [ow], mid back vowel is [o] and low back vowel [ɔː]. Graphically represented, the situation in Cantonese looks like this:


As we can see, Cantonese also distinguishes between vocalic phonemes by using the feature of vowel length. Vocalic phonemes [a], [ɛ] and [o] have their pairs in the phonemes [a:], [ɛ:] and [ɔː] from which they differ only in vocalic quantity or length. In addition to vowels, syllables in this language can be carried by nasals [m] and [ŋ].

If we compare the data from Cantonese vowels to the situation in English, we find some similarities, but there are also very many differences. The following vowels are not present in English: [y], [ɵ]. English also lacks the two diphthongs ending in semi-vowels: [ow], [ej]. On the other hand, English has some vowels that Cantonese does not. Schwa, or phonetically [ə], is an example.

Finally, English does not have the vocalic feature fronted which is attested in the Cantonese vowel [y:]. This feature is present in many languages, and can be found in German, for example. English did have it in its history, but fronted vowels were replaced by diphthongs.


Consonants are sounds which are created by forming some kind of an obstacle for the air current in the vocal tract. According to the type of the obstacle, there are different types of consonants. Linguists usually differentiate between consonants on the basis of their place and manner of articulation, but there are other features that are relevant as well.

Manner of articulation represents the way in which the obstacle is formed and removed. Division of consonants according to manner of articulation usually goes as follows: plosives (stops), affricates, fricatives, nasals and approximants (which contain the set of semi-vowels).

Plosives are sounds which are produced by forming a complete obstacle in the vocal tract so the air current creates pressure behind the obstacle. Then, the obstacle is suddenly released, and the sound is emitted in the way similar to explosion.

Plosives or stops in Cantonese are: [p],[ ph], [t], [th], [kʷ], [kʷʰ], [ʔ]. We see here that aspiration is a feature that distinguishes between phonemes in this language, and we mark it by “h” in superscript. Also, labialization, which is marked by “w” in the superscript, is relevant as well.

Fricatives are sounds which are produced without a complete obstacle, but articulatory organs form a narrow canal in which the air current creates friction that is registered as a fricative sound. The set of Cantonese fricatives goes as follows: [f], [s] and [h].

Affricates are sounds which are in between fricatives and plosives. They have very complex pronunciation since the first part of the process resembles that of plosives and the second part is similar to that of fricatives.

Namely, in the first part, a complete obstacle is formed, but instead of releasing it suddenly, the speaker transforms the obstacle into a canal in which friction is created. Cantonese only has one affricate which can come in an aspirated or non-aspirated version: [ts] and [tsh].

Nasals are sounds which are produced by forming a complete obstacle in the vocal tract, but the air current is released through the nose. The following sounds are Cantonese nasals: [m], [n], [ŋ].

Approximants are produced in such a way that the organs within the vocal tract are only approximated so the air current does not encounter a real obstacle. Because of this, these sounds can take on various properties of vowels. For example, in many languages they can carry syllables.

Approximants are often further subdivided, but for current purposes, it is sufficient to classify them as one group. The only real Cantonese approximant is [l], but there are also two semi-vowels [w] and [j], which are considered to be parts of diphthongs.

Another important feature of consonants is their place of articulation. The place of articulation is the exact point in the local tract where the obstacle is created. Linguists usually use the following classification of consonants according to the place of articulation: bilabial, labiodental, dental, alveolar, palato-alveolar, palatal, velar, uvular and glottal.

Bilabial consonants are sounds which are pronounced by forming the obstacle with the lips. Cantonese language has two bilabials – [p] and [ph] differentiated only by aspiration.

Labiodental sounds are produced by creating the obstacle using the upper teeth and the lower lip. There is only one such sound in this language, and that is [f].

Dental sounds are those sounds whose obstacle is formed on the teeth. Cantonese has two sounds of this type: [t] and [d]. Linguists, however, debate whether these are really dental or they belong to a different category – alveolar consonants.

Alveolar consonants are pronounced with the tip of the tongue touching the alveolar ridge above the upper teeth. Cantonese [s] and [l] belong to this category, in addition to [t] and [d], if we accept the abovementioned claim.

In Cantonese, there is one palatal sound – [j], which is pronounced with the tongue touching the palate.

The category of velar sounds is quite rich in Cantonese. Velars are produced when the root of the tongue touches the dorsal part of the mouth, or as it is called the velum. Cantonese velars are: [ŋ], and [k]. However, Cantonese [k] is very special in that when aspiration is added we get a new velar consonant [kh].

Furthermore, with respect to this sound, Cantonese utilizes one more potential tool for discriminating among phonemes, and that is labialization so we have a labialized velar plosive consonant [kw] and its aspirated counterpart – labialized aspirated velar plosive – [kwh]

Finally, there is one more category of consonants according to the place of articulation, and these are glottal consonants. With glottal sounds the obstacle is created in the canal between the vocal cords called glottis. There is a glottal stop [ʔ] and a glottal fricative [h] in Cantonese.

Graphic representation of Cantonese consonants with respect to the place of articulation is as follows:

Bilabial labiodental dental alveolar palatal velar glottal
regular labialized
stops regular [p] [t] [k] [kw] [ʔ]
aspirated [ph] [th] [kh] [kwh]
fricatives [f] [s] [h]
affricates [ts] [tsh]
nasals [m] [n]
Approximants [w] [l] [j]

General remarks about Cantonese consonants and comparison with the English ones

As a first remark, we can say that Cantonese consonants are fewer in number. Some of the English consonant phonemes that are lacking in Cantonese are: bilabial plosive [b], labiodental fricative [v] and dental fricatives [θ] and [ð]. It also lacks English palato-alveolar affricates [tʃ] and [dʒ]; alveolar fricatives [ʃ] and [ʒ], approximant [r] and velar stop [g].

In addition, the place of articulation of the Cantonese plosive [t] is dental, whereas the English counterpart is alveolar. Furthermore, Cantonese makes use of aspiration as a distinctive feature, thereby differentiating phonemes [t] and [th]; [p] and [ph]; [ts] and [tsh]; [k] and [kh]; and [kw] and [kwh].

In English, aspiration is just one of the redundant features of voiceless plosives such as [ph] and it does not make phonemic difference. On the other hand, there are mechanisms that English utilizes heavily while Cantonese does not know of. One such mechanism is consonant voicing.

In English, there are many pairs of sounds which are differentiated only on the basis of this feature. Examples of such pairs of sounds are [b] and [p], [θ] and [ð], [g] and [k]. In fact, it seems that Cantonese only has voiceless obstruents (stops, fricatives or affricates).

Finally, glottal stop [ʔ] is a phonemic sound, whereas in English it is an allophone heard only in some dialects, and in those cases, it is usually a realization of an intervocalic stop. Examples of this sound in English are found in words such as: [dəpɑːʔmɪnt](department).

Also, Cantonese does not have a rhotic sound with phonemic status. In English, a rhotic sound is realized as an approximant and has phonemic status, which means it differentiates between words, but in Cantonese it does not. Lateral [l] is a phoneme in Cantonese which can in some contexts be realized as [r].

This is why many Cantonese native speakers often confuse these two sounds in English, sometimes producing comic effects, which is very often utilized in popular cinema.

Another difference with respect to English is that Cantonese uses labialization to distinguish between phonemes, which is apparent in examples like [kw] and [k], while in English labialized [k] is only an allophone, a contextually conditioned realization, of [k] as in [kw ɔtə](quarter).


In this short review of Cantonese phonology, I have mentioned some of the most prominent phenomena that arise when it comes to the sound system of this language. I have also briefly outlined a contrastive analysis of Cantonese and English. The most striking difference is that Cantonese is a tone language, which means that the meaning of morphemes and lexemes can be differentiated just by using different tone.

There are six types of these tones, plus three special types which are reserved for syllables ending in a plosive. These syllables are called “checked syllables”. Vocalic system of Cantonese is also quite different from that of English. Cantonese does not have so many diphthongs, and those diphthongs that do appear contain a semi-vowel. Cantonese also has a fronted vowel which cannot be found in English.

Furthermore, consonant systems of the two languages are quite different. Firstly, English has more consonants than Cantonese. Secondly, Cantonese uses the mechanisms of labialization and aspiration to differentiate between consonant phonemes, while English only has those features as redundant features used to distinguish between different allophones of the same phoneme.

Thirdly, voicing is not a relevant discriminatory quality in Cantonese since all plosives, fricatives and affricates are voiceless. As far as English is concerned, voicing plays a crucial role in differentiation between many phonemes, as it has been illustrated in the paper. Finally, Cantonese does not make use of rhotic sounds as phonemes, and has them only as allophones of the approximant phoneme [l].


Transcription of the words in the file

Old – [wekjo]

Person – [peɾsona]

Rain – [pjoʔja]

River – [fju:me]

