Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[論文輪講]Decomposed Meta-Learning for Few-Shot Nam...

tossy
June 05, 2023

[論文輪講]Decomposed Meta-Learning for Few-Shot Named Entity Recognition

研究室の論文輪講の資料です

tossy

June 05, 2023
Tweet

More Decks by tossy

Other Decks in Research

Transcript

  1. Decomposed Meta-Learning for Few-Shot Named Entity Recognition Tingting Ma1, Huiqiang

    Jiang2, Qianhui Wu2, Tiejun Zhao1, Chin-Yew Lin2 1 Harbin Institute of Technology, Harbin, China 2 Microsoft Research Asia ACL2022 Toshihiko Sakai 2023/6/5
  2. What is this paper about? Advantages compared with existing work

    Key point of the proposed method 2 Propose few-shot span detection and few-shot entity typing for few-shot Named Entity Recognition ・Define few-shot span detection as a sequence labeling problem ・Train the span detector by MAML(model-agnostic meta-learning) to find a good model parameter initialization ・Propose MAML-ProtoNet to find a good embedding space Decomposed meta-learning procedure to separately the span detection model and the entity typing model How to verify the advantage and effectiveness of the proposal Discussion point or remaining problems that should be improved Related papers should be read afterwards Evaluate two groups of datasets and validate by ablation study ・ Evaluated the proposed method a few datasets. ・Compare the proposed method to other few-shot NER method that use meta-learning Triantafillou+: Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples, ICLR ‘20
  3. Contents ▪ Meta-learning(Few-shot learning) ▪ Introduction ▪ Methodology Entity Span

    Detection Entity Typing ▪ Experiments ▪ Ablation Study ▪ Conclusion 3
  4. Meta-learning 4 O’Connor+: Meta-Learning, https://www.mit.edu/~jda/teaching/6.884/slides/nov_13.pdf ▪ Learning to learn from

    few examples(few-shot learning) Support Set Query Set Support Set Query Set Episode
  5. Benefit of meta-learning 5 1. Learn from a few examples(few-shot

    learning) 2. Adapting to novel tasks quickly 3. Build more generalizable systems Meta-Learning: https://meta-learning.fastforwardlabs.com/
  6. N-way K-shot setting N: the number of classes K: the

    number of examples 6 Meta-Learning: https://meta-learning.fastforwardlabs.com/ Support Set Query Set Entity Class ▪ Meta-Training ▪ Meta-Testing
  7. An example of 2-way 1-shot setting in NER 7 two

    entity class two entity classes and each class has one example(shot)
  8. Method of meta-learning 8 1. Gradient-based(Model-agnostic meta-learning) 2. Black-box adaptation(Neural

    process) 3. Model-based(Prototypical network) NTTコミュニケーション科学基礎研究所 岩田具治 メタ学習入門: https://www.kecl.ntt.co.jp/as/members/iwata/ibisml2021.pdf
  9. MAML(Model-agnostic meta-learning) 9 ▪ Meta-learning objective is to help the

    model quickly adapt to learn a new task ▪ The key idea in MAML is to establish initial model parameters in the meta-training phase that maximize its performance on the new task Finn+: Model-agnostic meta-learning for fast adaption of deep networks, PMLR ‘17
  10. ProtoNet ▪ Learn a class prototype in metric space ▪

    Compute the average of the feature vector for each class the support set ▪ Using the distance function, Calculate the distance between the query set and ▪ The class predicted and trained by Softmax 10 Snell+: Prototypical networks for few-shot learning, NIPS ‘17
  11. Contents ▪ Meta-learning(Few-shot learning) ▪ Introduction ▪ Methodology Entity Span

    Detection Entity Typing ▪ Experiments ▪ Ablation Study ▪ Conclusion 11
  12. Introduction 12 Sang+: Introduction to the conll-2003 shared task: Language

    independent named entity recognition, CoNLL ‘03 Ratinov+: Design challenges and misconceptions in named entity recognition, CoNLL ‘09 Named Entity Recognition[Sang+ 2003],[Ratinov+ 2009] Input morpa is a fully implemented parser for a text-to-speech system
  13. Introduction ▪ Deep neural architectures have shown great success in

    supervised NER with a amount of labeled data available ▪ In practical applications, NER system are usually expected to rapidly adapt to some new entity types unseen during training ▪ It is costly while not flexible to collect a number of additional labeled data for these types ▪ Few-shot NER has attracted in recent years 13
  14. Previous studies for few-shot NER ▪ Token-level metric-learning • ProtoNet[Snell+

    2017] compare each query token to the prototype of each entity class • compare each query token with each token of support examples and assign the label according to their distances[Fritzler+ 2019] ▪ Span-level metric-learning[Yu+ 2021] • Recently, bypass the issue of token-wise label dependency while explicitly utilizing phrasal representations 14 Snell+: Prototypical networks for few-shot learning, NIPS ‘17 Fritzler+: Few-shot classification in named entity recognition task, ACM/SIGAPP ‘19 Yu+: Few-shot intent classification and slot filling with retrieved examples, NAACL ‘21
  15. Challenges in Metric Learning Challenge 1: domain gaps ▪ Direct

    use of learned metrics without target domain adaptation ▪ Insufficient exploration of information from support examples Challenge 2: span-level metric learning methods ▪ Handling overlapping spans during decoding process requires careful handling ▪ Noisy class prototype for non-entities(e.g., “O”) Challenge 3: domain transfer ▪ Insufficient available information for domain transfer to different domains ▪ Support examples only used for similarity calculation during inference in previous method 15
  16. Challenges in Metric Learning Challenge 1: Limited effectiveness with large

    domain gaps ▪ Direct use of learned metrics without target domain adaptation ▪ Insufficient exploration of information from support examples ☑ Few-shot span detection: MAML[Finn+ 2017] to find a good model parameter initialization that could fast adapt to new entity classes ☑ Few-shot entity typing: MAML-ProtoNet to narrow the gap between source domains and the target domain Challenge 2: Limitations of span-level metric learning methods ▪ Handling overlapping spans during decoding process requires careful handling ▪ Noisy class prototype for non-entities(e.g., “O”) Challenge 3: Limited information for domain transfer and inference ▪ Insufficient available information for domain transfer to different domains 16
  17. Challenges in Metric Learning Challenge 1: Limited effectiveness with large

    domain gaps ▪ Direct use of learned metrics without target domain adaptation ▪ Insufficient exploration of information from support examples Challenge 2: Limitations of span-level metric learning methods ▪ Handling overlapping spans during decoding process requires careful handling ▪ Noisy class prototype for non-entities(e.g., “O”) ☑ Few-shot span detection: sequence labeling problem to avoid handling overlapping spans ☑ Span detection model locates named entities, class-agnostic. Feeds entity spans to typing model for class inference, eliminating noisy "O" prototype. Challenge 3: Limited information for domain transfer and inference ▪ Insufficient available information for domain transfer to different domains 17
  18. Challenges in Metric Learning Challenge 2: Limitations of span-level metric

    learning methods ▪ Handling overlapping spans during decoding process requires careful handling ▪ Noisy class prototype for non-entities(e.g., “O”) ▪ Challenge 3: Limited information for domain transfer and inference ▪ Insufficient available information for domain transfer to different domains ▪ Support examples only used for similarity calculation during inference in previous method ☑ Few-shot span detection: model could better transfer to the target domain ☑ Few-shot entity typing: MAML-ProtoNet can find a better embedding space than ProtoNet to represent entity spans from different classes 18
  19. Contents ▪ Meta-learning(Few-shot learning) ▪ Introduction ▪ Methodology Entity Span

    Detection Entity Typing ▪ Experiments ▪ Ablation Study ▪ Conclusion 19
  20. Entity Span Detector ▪ The span detection model aims at

    locating all the named entities ▪ Promote the learning of domain-invariant internal representations rather than domain-specific features by MAML[Finn+ 2017] ▪ meta-learned model is expected to be more sensitive to target-domain support examples ▪ Expected only a few fine-tune steps on new examples can make rapid progress without overfitting 22 Finn+: Model-agnostic meta-learning for fast adaption of deep networks, PMLR ‘17
  21. Entity Typing ▪ Entity typing model use ProtoNet for the

    backbone ▪ Learn training episodes and calculate the probability span belongs to an entity class based on the distance between span representation and the prototype ▪ MAML enhanced ProtoNet 25 ProtoNet
  22. Contents ▪ Meta-learning(Few-shot learning) ▪ Introduction ▪ Methodology Entity Span

    Detection Entity Typing ▪ Experiments ▪ Ablation Study ▪ Conclusion 26
  23. Experiments 27 ▪ Evaluate performance of named entities micro F1-score

    ▪ Datasets Few-NERD Cross-dataset • CoNLL-2003 • GUM • WNUT-2017 • Ontonotes ※ two domains for training, one for validation, the remaining for test
  24. Contents ▪ Meta-learning(Few-shot learning) ▪ Introduction ▪ Methodology Entity Span

    Detection Entity Typing ▪ Experiments ▪ Ablation Study ▪ Conclusion 29
  25. Ablation Study 30 Validate the contributions of method components. 1)

    Ours w/o MAML 2) Ours w/o Span Detector 3) Ours w/o Span Detector w/o MAML 4) Ours w/o ProtoNet
  26. 1)Ours w/o MAML 31 ▪ Train both the Span detection

    model and the ProtoNet in a conventional supervised learning. ▪ Fine-tune with few-shot examples.
  27. 2)Ours w/o Span Detector 32 ▪ Remove the mention detection

    step and integrate MAML with token-level prototypical networks
  28. 3)Ours w/o Span Detector w/o MAML 33 ▪ Eliminate the

    meta-learning procedure from 2) Ours w/o Span Detector ▪ Become the conventional token-level prototypical networks
  29. 4)Ours w/o ProtoNet 34 ▪ Apply the original MAML algorithm

    ▪ Train a BERT-based tagger for few-shot NER
  30. Ablation Study Result 35 Point1. meta-learning procedure is effective Exploring

    information contained in support examples with the proposed meta-learning procedure for few-shot transfer
  31. Ablation Study Result 36 Point2. Decomposed framework is effective(span detection

    and entity typing) mitigate the problem of noisy prototype for non-entities Ours > 2) 1) > 3)
  32. Ablation Study Result 37 Point3 ProtoNet is neccessary the model

    to adapt the up-most classification layer without sharing knowledge with training episodes leads to unsatisfactory results.
  33. How does MAML promote the span detector? 38 ▪ Sup-Span:

    train a span detector in the fully data ▪ Sup-Span-f.t.: fine-tune the model learned by Sup-Span ▪ MAML-Span-f.t: span detector with MAML ▪ Sup-Span only predicts “Broadway” missing the “New Century Theatre” → fully supervised manner can’t detect un-seen entity spans
  34. How does MAML promote the span detector? 39 ▪ Sup-Span-f.t.

    can successfully detect “New Century Theatre” However, still wrong detect “Broadway” → fine-tuning can benefit supervised model on new entity. But, it may bias too much to the training data ▪ MAML-Span-f.t.(Ours) can detect successfully
  35. How does MAML promote the span detector? 40 ▪ Proposed

    meta-learning procedure could better leverage support examples from novel episodes ▪ Help the model adapt to new episodes more effectively Few-NERD 5-way 1-2shot
  36. How does MAML enhance the ProtoNet? 41 ▪ MAML-ProtoNet achieves

    superior performance than the conventional ProtoNet ▪ verifies the effectiveness of leveraging the support examples to refine the learned embedding space at test time Analysis on entity typing under Few-NERD 5-way 1-2shot
  37. Contents ▪ Meta-learning(Few-shot learning) ▪ Introduction ▪ Methodology Entity Span

    Detection Entity Typing ▪ Experiments ▪ Ablation Study ▪ Conclusion 42
  38. Conclusion 43 ▪ This paper proposed decomposed meta-learning method for

    few-shot NER Entity span detection • formulate the few-shot span detection as a sequence labeling problem • employ MAML to learn a good parameter initialization Entity typing • propose MAML-ProtoNet • find a better embedding space than conventional ProtoNet to represent entity spans from different classes
  39. 44

  40. Meta-learning datasets 46 ▪ Few-NERD Training: 20,000 episodes Validation: 1,000

    episodes Test: 5,000 episodes ▪ Cross dataset Training episode: 2 datasets Validation episodes and test episodes: 1 datasets 5-shot: train/valid/test=200/100/100 1-shot: train/valid/test=400/100/200 • OntoNotes : train/valid/test=400/200/100