Southeast Asian Linguistics Research Unit
Research Spotlight

A new look at Pattani Malay initial geminates: a statistical and machine learning approach

Among languages of the world that have geminates, Pattani Malay is famous for having geminates in the word-initial position, whereas most other languages have only medial ones. A preliminary report from a collaborative project by Dr. Pittayawat Pittayaporn, and Ph.D. students Pimtip Kochaiyaphum (ChulaSEAL), Francesco Burroni (Cornell) and Sireemas Maspong (Cornell), presented at the 34th Pacific Asia Conference on Language, Information and Computation (PACLIC34), shows that the language may be losing its signature feature.

Geminates refer to consonants whose sound duration is longer than the duration of single ones. In writing systems that use the Latin alphabet, geminates are typically spelled with doubled letters. In many languages such as Italian, Hungarian, Japanese etc., these doubled consonants, transcribed as [Cː] in International Phonetic Alphabet) are significant because they can distinguish two words. For instance, Hungaria megy [mɛɟ] means ‘to go’ but meggy [mɛɟː] means ‘sour cherry’. Similarly, Japanese kita means ‘came, arrived’ but kitta means ‘cut, sliced’.

Pattani Malay, also known as Patani Malay, is a Malay dialect spoken in the three southernmost provinces of Thailand: Pattani, Yala, and Narathiwat. An extremely similar variety is also found across the border in Kelantan, Malaysia. Not only are its initial geminates a typological rarity, but claims have been made that they are turning into something like pitch accent, e.g. the use of different pitch contours to distinguish otherwise phonetically similar words, such as the difference between the rising pitch and falling pitch in the Japanese words for ‘flower’ and ‘nose’, respectively.

In this study, the researchers revisit the acoustic correlates of initial geminates in Pattani Malay, asking what acoustic properties distinguish them from their corresponding singletons. To acquire relatively naturalistic speech data, 14 participants were prompted by Thai sentences and asked to say the Pattani Malay equivalents. This procedure yields 26 sentences containing 13 minimal pairs such as [labɔ] ‘profit vs. [lːabɔ] ‘spider’ and [ɡaɟi] ‘wage’ and [ɡːaɟi] ‘saw’. In addition to duration, f0 and intensity of the following vowels were also measured.

To the surprise of the researchers, the acoustic results showed that the contrast between geminates and singletons in Pattani Malay may not be as robust as previously hypothesized. Specifically, they found that only duration is significantly different between the two types of initial consonants, while in previous studies f0 and intensity differences were found to also be significant. Moreover, the analysis reveals that the durational differences are estimated to be  about 17 ms., while in Abramson’s work the initial geminates averaged three times longer than their corresponding singletons. This discrepancy, the researchers believe, is due to differences in  experimental designs. While this study uses relatively naturalistic speech data, previous reports were based on highly controlled lab speech.

Comparison of mean durations of initial geminates (IG) vs. singletons (no IG). The violin plot shows that the initial geminates are slightly longer. The density plot shows that the two categories overlap considerably.

A bigger surprise comes from the computational analysis of the robustness of the contrast. Linear Discriminant Analysis (LDA) was used to quantify the extent to which acoustic properties distinguished the geminates and the singletons. LDA is a technique that uses linear combinations of features to maximize the separation between two or more categories. In this study, 80% of the data is assigned to a training set and the remaining 20% to a test set.  The best models include the first segment’s duration and ratio of the duration of the first segment to the entire word. In this model, mean accuracy in classifying the speech as geminates or singletons tokens taken over 10,000 iterations is above chance (above 50% accuracy), but quite poor (about 62%). This shows that even though there are statistically significant differences between geminates and singletons, the contrast is not very robust. The low performance indicates that initial geminates and singletons are not easily distinguishable in the dimensions examined.

Output of LDA showing large overlap between categories

The researchers interpret this result as pointing to an on-going merger between geminates and singletons due to the low functional load of the contrast. That is, the geminates are disappearing from the language because they only serve to distinguish a small number of minimal pairs. The team also links this possible change to the well attested loss of morphological processes involving gemination.

In conclusion, the study paints a very different picture of Pattani Malay initial geminates from previous well-known studies. It shows that the typological rarity that brought the language to fame may be disappearing. However, it seems that the loss will not push this fascinating language into obscurity but will put the spotlight on it as it leads to new questions of current theoretical relevance.

Phuris Chirapornchai


Abramson, Arthur S. 1987. Word-initial consonant length in Pattani Malay. International Congress of Phonetic Science (ICPhS) 6, 68–70.

Abramson, Arthur S. 2003. Acoustic cues to word-initial stop length in Pattani Malay. International Congress of Phonetic Science (ICPhS) 15, 387–390.

Abramson, Arthur S. 2004. Toward prosodic contrast: Suai and Pattani Malay. In International Symposium on Tonal Aspects of Languages: Emphasis on Tone Languages, 1–4.

Blevins, Juliette. 2004. Evolutionary Phonology: the emergence of sound patterns. Cambridge; New York: Cambridge University Press.

Phuengnoi, Nattaphon. 2010. An acoustic study of stressed and unstressed syllables in Pattani Malay and Urak Lawoi’. Bangkok: Chulalongkorn University thesis.

Uthai, Ruslan. 1993. A comparison of word formation in Standard Malay and Pattani Malay. Bangkok: Chulalongkorn University thesis.


Related posts

To me, or not to me: How ASD children produce and comprehend personal reference terms

How to (more precisely) analyze vowel variation from acoustic data in Thai


Linguistic motivations behind tonal contour changes: A case of Bangkok Thai tones


Kathoey’s reinterpretation of Thai feminine pronouns and the construction of gender identity