Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Quantifying Diachronic Language Change via Word Embeddings: Analysis of Social Events using 11 Years News Articles in Japanese and English

Quantifying Diachronic Language Change via Word Embeddings: Analysis of Social Events using 11 Years News Articles in Japanese and English

Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (2023). Quantifying Diachronic Language Change via Word Embeddings: Analysis of Social Events using 11 Years News Articles in Japanese and English. 9th International Conference on Computational Social Science.
https://upura.github.io/pdf/ic2s2_2023_semantic_shift.pdf
https://www.ic2s2.org/

Shotaro Ishihara

July 11, 2023
Tweet

More Decks by Shotaro Ishihara

Other Decks in Research

Transcript

  1. Quantifying Diachronic Language Change via
    Word Embeddings: Analysis of Social Events using
    11 Years News Articles in Japanese and English
    Research Overview
    ● We quantitatively analyzed semantic
    shifts caused by social events across
    multiple corpora and years (using
    news articles published in Japanese
    and English between 2011-2021).
    ● Studies on the analysis of social
    events have often focused on a
    single event, and it is important to
    explore more comprehensive method.
    ● RQ1: Is the semantic shift caused by
    COVID-19 larger?
    ● RQ2: Are the trends of change in
    Japan and English similar?
    ● A1&2: Yes (at least in our approach)
    Shotaro Ishihara (Nikkei Inc. [email protected] ), Hiromu Takahashi, Hono Shirai
    [1] How COVID-19 is changing our language: Detecting
    semantic shift in twitter word embeddings.
    [2] Semantic Shift Stability: Efficient Way to Detect
    Performance Degradation of Word Embeddings and
    Pre-trained Language Models. (AACL2022)
    Acknowledgments
    We are grateful to Kunihiro Miyazaki for
    the useful research discussions.
    Experimental Results
    ● A1: The semantic shift stability for
    2019-2020 was observed to be the
    lowest for Nikkei (ja) and NOW (en), the
    degree of change was the greatest.
    ● A2: The correlation coefficient between
    Nikkei and NOW was calculated to be
    0.66, indicating a similar trend.
    Approach
    1. Corpora are divided by year, and
    word2vec models are trained.
    2. We take two trained word2vec
    models as input and derive rotation
    matrices (R) to align their
    coordinate axes.
    3. Stability can be calculated by the
    similarities in two directions. [1]
    4. We refer to the average value of
    stab of words as semantic shift
    stability, and adopted this as a
    representative value. [2]
    Inferring Reason of Semantic Shifts
    It has the advantage of identifying words
    that exhibited a significant semantic shift
    (words with the lowest stab) in 2019-2020.
    ● Nikkei: infection, spread, corona,
    vaccine, virus, mask, infected, North
    Korea, vaccination, and epidemic.
    ● NOW: king, Scott, de, virus, masks,
    wear, mask, pi, q, and wearing.
    => Words related to COVID-19 appeared at
    the top of the lists. Note that the analysis
    for 2015-2016 implied the impact of the
    U.S. presidential election.

    View full-size slide