[輪講資料] A Frustratingly Easy Approach for Entity and Relation Extraction

A Frustratingly Easy Approach for Entity and Relation Extraction Zexuan
Zhong, Danqi Chen NAACL 2021(North American Chapter of the Association for Computational Linguistics) URL: https://aclanthology.org/2021.naacl-main.5/ citations:53 What is this paper about? Paper contributions Key points Validate advantages and effectiveness Related work 1 Propose a simple and effective approach for named entity recognition and relation extraction ・ Learns two independent encoders ・ Insert typed entity markers in training relation model Outperforms all previous joint model on three datasets Wadden+: Entity, relation, and event extraction with contextualized span representations, EMNLP ‘19 ・ 8-16x speedup with a small accuracy drop

Introduction 2

Introduction: Entity and Relation Extraction [01] Sang+: Introduction to the
conll-2003 shared task: Language independent named entity recognition, CoNLL ‘03 [02] Ratinov+: Design challenges and misconceptions in named entity recognition, CoNLL ‘09 [03] Zelenko+: Kernel methods for relation extraction, EMNLP ‘02 [04] Bunescu+: A shortest path dependency kernel for relation extraction, EMNLP '05 Named Entity Recognition[01],[02] Relation Extraction[03], [04] 3 Input morpa is a fully implemented parser for a text-to-speech system

Problem definition Input n tokens, token represents a single word
m spans, span is an ordered sequence of token X of up to length L=8, START(i) an END(i) denote start and end indices of 4 Output : span, : entity type : subject/object span, : relation type

Introduction: Early and Recently Work Early Work: pipelined approach •
Training entity model and relation model separately[05], [06], [07] Recently: end-to-end approach • modeling entity model and relation model jointly[08]~[17] • joint models can better capture the interactions between entities and relations [05] Zhou+: Exploring various knowledge in relation extraction, ACL ‘05 [06] Kambhatla+: Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction, ACL ‘04 [07] Chan+: Exploiting syntactico-semantic structures for relation extraction, ACL-HLT ‘11 [08] Li+: Incremental joint extraction of entity mentions and relations, ACL ‘14 [09] Miwa+: End-to-end relation extraction using LSTMs on sequences and tree structures, ACL ‘16 [10] Katiyar+: Going out on a limb: Joint extraction of entity mentions and relations without dependency trees, ACL ‘17 [11] Zhang(a)+: End-to-end neural relation extraction with global optimization, EMNLP ‘17 [12] Zhang(b)+: Position aware attention and supervised data improve slot filling, EMNLP ‘17 [13] Li+: Entity-relation extraction as multi-turn question answering, ACL ‘19 [14] Luan+: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction, EMNLP ‘18 [15] Wadden+: Entity, relation, and event extraction with contextualized span representations, EMNLP ‘19 [16] Lin+: A joint neural model for information extraction with global features, ACL ‘20 [17] Wang+: Two are better than one: Joint entity and relation extraction with table sequence encoders, EMNLP '20 5

Introduction: Joint models 6 multi-task learning[16] shared span representations between
two task [16] Luan+: A general framework for information extraction using dynamic span graphs, NAACL ‘19 The output vector as inputs to the relation propagation layer The output vector as inputs to the entity and relation prediction layer

This work approach • 「Is it good to share the
span representation with the entity and relation like the joint model?」 • → Learns two independent encoders • Insert typed entity markers in training relation model • An efficient approximation: 8-16x speed up with a small accuracy drop 7 [S:PER] Bill Smith [/S:PER] was in the [O:FAC] hotel [/O:FAC] room Insert Markers Insert Markers

Entity model 8

Entity model 9 Input Bill Smith was in the hotel
room Entity Encoder …… ・ bert-base-uncased ・ albert-xxlarge-v1 ・ scibert-scivocab-uncased Contextualized representations Span representations …… Bill Smith was in room …… ……

room Entity Encoder …… ・ bert-base-uncased ・ albert-xxlarge-v1 ・ scibert-scivocab-uncased Contextualized representations Span representations …… Bill Smith was in room …… …… FFNN …… 2-layer FFNN 150 hidden units Dense Softmax …… PER

room Entity Encoder …… ・ bert-base-uncased ・ albert-xxlarge-v1 ・ scibert-scivocab-uncased Contextualized representations Span representations …… Bill Smith was in room …… …… FFNN …… 2-layer FFNN 150 hidden units Dense Softmax …… PER …… Dense Softmax ……

Relation model 12

Relation model 13 Bill Smith was in the hotel room
Bill Smith hotel room PER FAC FAC Input Output Bill Smith room PER FAC hotel room FAC FAC PHYS PART-WHOLE ※ Person ※ Facilities ※ Facilities ※ Physical

Relation model: Inserting Markers 14 Bill Smith was in the
hotel room Bill Smith hotel room PER FAC FAC Bill Smith PER hotel FAC [S:PER] Bill Smith [/S:PER] was in the [O:FAC] hotel [/O:FAC] room Bill Smith PER room FAC [S:PER] Bill Smith [/S:PER] was in the hotel [O:FAC] room [/O:FAC] hotel FAC room FAC Bill Smith was in the [S:FAC] hotel [/S:FAC] [O:FAC] room [/O:FAC] …… …… Insert Markers Insert Markers

Relation model 15 Modified Input Relation Encoder …… ・ bert-base-uncased
・ albert-xxlarge-v1 ・ scibert-scivocab-uncased Contextualized representations Span-pair representations …… Bill Smith→room …… [S:PER] Bill Smith [/S:PER] was in the hotel [O:FAC] [/O:FAC] room [O:FAC] [S:PER]

Relation model 16 Modified Input Relation Encoder …… ・ bert-base-uncased
・ albert-xxlarge-v1 ・ scibert-scivocab-uncased Contextualized representations Span-pair representations …… Bill Smith→room …… [S:PER] Bill Smith [/S:PER] was in the hotel [O:FAC] [/O:FAC] room [O:FAC] Dense Softmax PHYS

Cross-sentence context • Cross-sentence help predict entity types and relations[15],[16]
• Related work • Add a 3-sentence context window[15] • This work • given an input sentence with n words augment the input with from the left context and right context → Entity: W=300, Relation: W=100 17 [15] Wadden+: Entity, relation, and event extraction with contextualized span representations, EMNLP ‘19 [16] Luan+: A general framework for information extraction using dynamic span graphs, NAACL ‘19

Experiments 18

Experiments: Dataset • ACE04[18], ACE05[19] • a variety of domains
(newswire and online forums) • Entity PER, ORG, GPE, LOC, FAC, VEH, WEA • Relation Physical, Part-whole, Personal-Social, ORG-Affiliation, Agent-Artifact, Gen-Affiliation • SciERC[20] • 500 AI paper abstracts, scientific terms and relations • Entity Task, Method, Evaluation Metric, Material, Other Scientific Terms, Generic • Relation Used-for, Feature-of, Hyponym-of, Part-of, Compare, Conjunction [18] https://catalog.ldc.upenn.edu/LDC2005T09 [19] https://catalog.ldc.upenn.edu/LDC2006T06 [20] Luan+: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction, EMNLP ‘18 http://nlp.cs.washington.edu/sciIE/ 19

Experiments: Evaluation metrics • micro F1 measure • Named Entity
Recognition • span boundaries and the predicted entity type are both correct • Relation extraction • (1) boundaries evaluation(Rel) the boundaries of two spans are correct and the predicted relation type is correct • (2) strict evaluation(Rel+) predicted entity types also must be correct in addition Rel 20

Result 21

Main results: Entity model • Entity model • cross-sentence information
is useful and pre-trained transformer encoders can capture long-range dependencies from a large context 22 Encoder L=LSTM L+E=LSTM+ELMo Bb=BERT-base Bl=BERT-large ALB=ALBERT-xxlarge-v1 SciB=SciBERT cross-sentence information trained with additional data(e.g., coreference)

Main results: Relation model • Relation model • learning representations
for entities and relations of different entity pairs, as well as early fusion of entity information in the relation model 23 Encoder L=LSTM L+E=LSTM+ELMo Bb=BERT-base Bl=BERT-large ALB=ALBERT-xxlarge-v1 SciB=SciBERT cross-sentence information trained with additional data(e.g., coreference)

Main results: compared to the previous SOTA • Compared to
the previous SOTA model without using context • This result clearly demonstrates the superiority of proposed model 24 Encoder L=LSTM L+E=LSTM+ELMo Bb=BERT-base Bl=BERT-large ALB=ALBERT-xxlarge-v1 SciB=SciBERT cross-sentence information trained with additional data(e.g., coreference)

Analysis 25

Analysis: Importance of Typed Text Markers 26 [S:PER] Bill Smith
[/S:PER] was in the hotel [O:FAC] room [/O:FAC] Typed markers Proposed method PHYS 72.6% Untyped markers 70.5% [S] Bill Smith [/S] was in the hotel [O] room [/O] PHYS No marker 67.6% Bill Smith was in the hotel room PHYS Relation F1 Markers + entity auxiliary loss[15],[16] 70.7% [S] Bill Smith [/S] was in the hotel [O] room [/O] PHYS PER FAC [15] Wadden+: Entity, relation, and event extraction with contextualized span representations, EMNLP ‘19 [16] Luan+: A general framework for information extraction using dynamic span graphs, NAACL ‘19

Analysis: Modeling Entity-Relation Interactions • Does sharing encoders help? 27
+ , • Two tasks have different input formats and require different features for predicting entity types and relations → separate encoders indeed learns better task-specific features

Approximation model with batch computations 28 [S:PER] Bill Smith [/S:PER]
was in the [O:FAC] hotel [/O:FAC] room [S:PER] Bill Smith [/S:PER] was in the hotel [O:FAC] room [/O:FAC] Bill Smith was in the hotel room Bill Smith hotel room PER FAC FAC Bill Smith was in the [S:FAC] hotel [/S:FAC] [O:FAC] room [/O:FAC] …… One shortcoming of this approach is that need to run for every pair of entities

Approximation model with batch computations 29 [S:PER] Bill Smith [/S:PER]
was in the [O:FAC] hotel [/O:FAC] room [S:PER] Bill Smith [/S:PER] was in the hotel [O:FAC] room [/O:FAC] [S:PER] Bill Smith [/S:PER] was in the hotel [O:FAC] [/O:FAC] room [S:PER] Bill Smith [/S:PER] was in the hotel [O:FAC] [/O:FAC] room [S:PER] Bill Smith [/S:PER] was in the hotel [O:FAC] [/O:FAC] room [S:PER] [/S:PER] [O:FAC] [/O:FAC] the same sentence in one run of the relation model → 8-16x speedup with only 1% accuracy drop

Conclusion 30

Conclusion • Proposed a simple and effective approach for named
entity recognition and relation extraction • Learns two independent encoders • Insert typed entity markers in training relation model • An efficient approximation: 8-16x speed up with a small accuracy drop 31 [S:PER] Bill Smith [/S:PER] was in the [O:FAC] hotel [/O:FAC] room Insert Markers Insert Markers

[輪講資料] A Frustratingly Easy Approach for Entity...

[輪講資料] A Frustratingly Easy Approach for Entity and Relation Extraction

tossy

More Decks by tossy

Other Decks in Research

Featured

Transcript

A Frustratingly Easy Approach for Entity and Relation Extraction Zexuan

Introduction 2

Introduction: Entity and Relation Extraction [01] Sang+: Introduction to the

Problem definition Input n tokens, token represents a single word

Introduction: Early and Recently Work Early Work: pipelined approach •

Introduction: Joint models 6 multi-task learning[16] shared span representations between

This work approach • 「Is it good to share the

Entity model 8

Entity model 9 Input Bill Smith was in the hotel

Entity model 10 Input Bill Smith was in the hotel

Entity model 11 Input Bill Smith was in the hotel

Relation model 12

Relation model 13 Bill Smith was in the hotel room

Relation model: Inserting Markers 14 Bill Smith was in the

Relation model 15 Modified Input Relation Encoder …… ・ bert-base-uncased

Relation model 16 Modified Input Relation Encoder …… ・ bert-base-uncased

Cross-sentence context • Cross-sentence help predict entity types and relations[15],[16]

Experiments 18

Experiments: Dataset • ACE04[18], ACE05[19] • a variety of domains

Experiments: Evaluation metrics • micro F1 measure • Named Entity

Result 21

Main results: Entity model • Entity model • cross-sentence information

Main results: Relation model • Relation model • learning representations

Main results: compared to the previous SOTA • Compared to

Analysis 25

Analysis: Importance of Typed Text Markers 26 [S:PER] Bill Smith

Analysis: Modeling Entity-Relation Interactions • Does sharing encoders help? 27

Approximation model with batch computations 28 [S:PER] Bill Smith [/S:PER]

Approximation model with batch computations 29 [S:PER] Bill Smith [/S:PER]

Conclusion 30

Conclusion • Proposed a simple and effective approach for named