Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Topic Modeling of Short Texts: A Pseudo-Documen...

Kento Nozawa
September 02, 2016

Topic Modeling of Short Texts: A Pseudo-Document View

Kento Nozawa

September 02, 2016
Tweet

More Decks by Kento Nozawa

Other Decks in Research

Transcript

  1. Title: Topic Modeling of Short Texts: A Pseudo-Document View Authors:

    Yuan Zuo, Junjie Wu, Hui Zhang, Hao Lin, Fei Wang, Ke Xu, Hui Xiong @KDD 2016ษڧձ NOZAWA Kento (ஜ೾େM1/AIST RA)
  2. LDAͷ՝୊: จॻ௕͕୹͍ͱτϐοΫͷֶशʹࣦഊ • ݪҼ: ڞى৘ใ͕े෼ʹಘΒΕͳ͍ͨΊ • ղܾࡦ: ڞى৘ใΛ૿΍͢Α͏ͳ޻෉ • จॻΛΫϥελϦϯάٖͨ͠ࣅจॻͰֶश

    ػցֶश -%" /-1 ػցֶश ϕΠζਪ࿦ ࠷దԽ ࠓى͖ͨ ຾Εͳ͍ շ຾ປ short text ٖࣅจॻ ΫϥελϦϯά ࠓى͖ͨ ຾Εͳ͍ շ຾ປ ػցֶश -%"/-1 ػցֶश ϕΠζਪ࿦ ࠷దԽ
  3. D N K z ✓ w ↵ D N K

    P l ✓ z w ↵ Graphical models • K: τϐοΫ਺ • D: จॻ਺ • N: จॻ಺ͷ୯ޠ਺ • P: ٖࣅจॻ਺ େখؔ܎: K<P<<D LDA [Blei+, 2003] PTM (ఏҊ๏)
  4. D N K P l ✓ z w ↵ Generative

    process of PTM PTM • ٖࣅจॻ: จॻͷϋʔυΫϥελ • ٖࣅจॻ1ͭʹτϐοΫ෼෍θ͕ఆٛ ٖࣅจॻID
  5. • 1จॻ͝ͱʹٖࣅจॻ  Λαϯϓϧ • ୯ޠ͝ͱʹτϐοΫ z Λαϯϓϧ ͸short textͩͱ΄΅0΍খ͍͞஋͔͠ͱΒͳ͍ ରͯ͠

    ͸ରԠ͢ΔٖࣅจॻશମͰͷස౓͕࢖͑Δ Inference by collapsed Gibbs sampling Nz lds Nz ds l p(zs,i = z|rest) / (Nz ds + ↵)( N wds,i z + Nz + V ) (LDA) p(zs,i = z|rest) / (Nz lds + ↵)( N wds,i z + Nz + V ) (PTM)
  6. • SPTM • Spike and Slab prior ΛٖࣅจॻͷτϐοΫ෼෍ʹ͍ΕΔ • EPTM

    • ෳ਺ͷٖࣅจॻʹଐͤΔ PTMΛ֦ுͨ͠Ϟσϧ΋ఏҊ
  7. ࣮ݧ಺༰ 1. จॻ෼ྨ • ෇༩ͨ͠τϐοΫΛ΋ͱʹSVMͰ෼ྨ 2. UCI topic Coherence •

    wikipediaͷσʔλΛ࢖ͬͯܭࢉ • NewsͱDBLPͷΈ 3. ύϥϝʔλൺֱ • ٖࣅจॻ਺ɼֶशσʔλ਺ɼϞσϧൺֱ 4. ෇༩ͨ͠τϐοΫͷྫ • লུ