Southeast Asian Linguistics Research Unit
Research Spotlight

Probabilistic Measure for Diffusion of Linguistic Innovation: As Seen in the Usage of Verbal “Nok” in Thai Twitter

Linguistic researchers are now using Twitter as a valuable resource for their studies. In a recent research article, Nozomi Yamada and Pittayawat Pittayaporn explore the use of Twitter as a corpus and propose that it plays a dynamic role in driving language change. They specifically investigate how the Thai word “Nok,” meaning bird, has also come to be used as a verb to express the failure to meet expectations. Unlike other instances of word meaning changes, “Nok” still retains its original bird-related meaning, resulting in multiple meanings for the word (polysemy).

Traditionally, researchers have measured the spread of new words or meanings through word frequency analysis. However, when it comes to the innovative use of “Nok” as a verb, relying solely on token frequency isn’t enough to capture the change because it involves polysemy rather than the introduction of a completely new word. To tackle this challenge, Yamada and Pittayaporn employ three probabilistic measures:

  1. Conditional probability of Bigrams: This measure identifies the frequency of specific word combinations with nouns and verbs in Thai, allowing for the detection of syntactic structures associated with “Nok.”
  2. Tweet-Level PMI: This measure examines the statistical association between two words, indicating how often they appear together and how independent they are from each other.
  3. Cosine similarity of word embeddings: By comparing the word embedding of “Nok” with embeddings of different words from various time periods, this measure provides insights into the semantic similarity and differences between “Nok” and other words. (As used in other studies regarding language change such as Baidong, Ying, and Feicheng, 2018)

The findings reveal that the use of “Nok” as a verb peaked in 2016 and has since declined in popularity according to word frequency. However, when considering the measures of conditional probability, PMI, and cosine similarity of word embeddings, it becomes clear that the innovative usage of “Nok” has remained relatively stable. The proportion of verbal “Nok” to the overall use of “Nok” hasn’t changed significantly, despite the decrease in word frequency.

Both figures (taken from figure 3 and 4 in the article) demonstrate an increase in each co-occurrence after 2015. The conditional probability forms an S-curve, which shows a gradual uptake of the innovation, followed by a rapid increase in usage, and finally reaching a stable point of widespread acceptance. Initially, a few people adopt the feature, causing slow growth, but as more people adopt it, the rate of adoption accelerates until it becomes widely accepted within the community. Namely, both graphs show us that the conditional probability of both preceding and following words “laew” and “mai” hardly decays after reaching its peak. It shows the conditional probability for the preceding word.

This suggests that the lexical innovation of “Nok” as a verb has become an established part of the Thai linguistic system. Furthermore, these three probabilistic measures offer a more accurate way to analyze and quantify the spread of linguistic innovations, even in cases involving multiple meanings like “Nok.” These methods are also applicable to studying languages other than Thai. This study also explores the possibility of utilizing these measures to study ongoing semantic changes using data from social media, contributing to a deeper understanding of language dynamics in the digital age.

Related posts

To me, or not to me: How ASD children produce and comprehend personal reference terms

How to (more precisely) analyze vowel variation from acoustic data in Thai


Linguistic motivations behind tonal contour changes: A case of Bangkok Thai tones


Kathoey’s reinterpretation of Thai feminine pronouns and the construction of gender identity