Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improving Hypernymy Detection with an Integrate...

himkt
October 28, 2016

Improving Hypernymy Detection with an Integrated Path-based and Distributional Method

Journal Club at Kasuga Area (University of Tsukuba)
https://slis-ml.github.io/2016/10/journalclub

himkt

October 28, 2016
Tweet

More Decks by himkt

Other Decks in Science

Transcript

  1. Improving Hypernymy Detection with
 an Integrated Path-based and Distributional Method

    2016/10/28 @ Kasuga Area himkt <[email protected]> all figures are extracted from https://www.aclweb.org/anthology/P/P16/P16-1226.pdf Vered Shwartz Yoav Goldberg Ido Dagan
 ( Bar-Ilan University )
  2. Summary • Neural network to detect hyponymy relation • RNN

    (LSTM) to encode dependency-path features • Use word-embeddings as word features (GloVe) • Ability to take into account 
 both distributional and dependency-path information • Comparable result to state-of-the-art
 (Path-based LSTM only) • Significantly improved upon the state-of-the-art !
 (Integrated method) 2
  3. Previous Hyponymy Relation extractions • Distributional method (Baroni+[1], Roller+[2], Weeds[3])

    • given words x and y, decide whether x is 
 hyponym of y (isa(x, y)) • distributional vector are commonly used • Path-based method (Snow+[4], Nakashole+[5]) • hyponymy relations are expressed by
 dependency paths
 3
  4. Dependency Path and Edge representation • They represent each dependency

    path as a sequence of edges that leads from x to y in the dependency tree 4
  5. Edge and Path Representation • Edge representation 
 l: lemma,

    pos: part of speech, dep: dependency label,
 dir: dependency direction
 lemma: GloVe
 others including out-of vocabulary lemma: Randomly initialize • Path representation • are composed of 
 (where are edges) • : a vector from LSTM encoder 5 op p ve1 , ve2 , . . . , vek e1, e2, . . . , ek v e = [v l , v pos , v dep , v dir ]
  6. LSTM-based Hypernymy Detection • Path-based method • ɹɹɹɹ: the frequency

    of p in paths(x,y) • ɹ: LSTM output • Integrated network • : word embedding of x, y (GloVe) • Common classifier: 6 vxy = vpaths ( x,y ) = P p 2 paths ( x,y ) fp, ( x,y ) · op P p 2 paths ( x,y ) fp, ( x,y ) v xy = ⇥ v wx , v paths ( x,y ) , v wy ⇤ f p, ( x,y ) op v x , v y c = softmax ( W · vxy )
  7. Proposed method - Architecture (Integrated model) • Integrate distributional approach

    and 
 path-based approach • Path LSTM + Term-Pair Classifier 7
  8. Training dataset • Hyponymy relations • Distant Supervision • make

    use of an already existing database • WordNet, DBPedia, Wikidata, Yago • Corpus to train LSTM path encoder • wikipedia dump (May 2015) • dependency parser: spaCy 8
  9. Experiment and Discussion 1 - Lexical Memorization • Comparable to

    state-of-the-art (path-based) • Improve state-of-the-art (combined) • And there were some lexical memorization • the score of lexical split are worse than
 that of random split • this means some kind of overfit are occurred
 (called lexical memorization) 9
  10. Discussion 2 - Noun Phrase concepts • They do not

    detect noun phrase [maybe] • it seems x and y are words, not phrase 10
  11. References [1] Marco Baroni, Raffaella Bernardi, Ngoc-Quynh Do, and Chung-chieh

    Shan. 2012. Entailment above the word level in distributional semantics. In EACL, pages 23–32. [2] Stephen Roller, Katrin Erk, and Gemma Boleda. 2014. Inclusive yet selective: Supervised distributional hypernymy detection. In COLING, pages 1025–1036 [3] Julie Weeds, Daoud Clarke, Jeremy Reffin, David Weir, and Bill Keller. 2014. Learning to distinguish hypernyms and co-hyponyms. In COLING, pages 2249–2259. [4] Rion Snow, Daniel Jurafsky, and Andrew Y Ng. 2004. Learning syntactic patterns for automatic hypernym discovery. In NIPS. [5] Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. Patty: a taxonomy of relational patterns with semantic types. In EMNLP and CoNLL, pages 1135– 1145. 11