Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A paper review of 
'submodular optimization-bas...

A paper review of 
'submodular optimization-based diverse paraphrasing and 
 its effectiveness in data augmentation' (Kumar+, 2019)

Avatar for Shuntaro Yada

Shuntaro Yada

July 22, 2019
Tweet

More Decks by Shuntaro Yada

Other Decks in Research

Transcript

  1. Shuntaro Yada UTokyo LISLab CSIRO Data61 LASC 22 July 2019

    A Review of 
 ‘Submodular Optimization-based 
 Diverse Paraphrasing and 
 its Effectiveness in Data Augmentation’ 
 (Kumar+, ’19) 5BCMFTBOEpHVSFTJOUIJTTMJEFBSFCPSSPXFE GSPNUIFPSJHJOBMQBQFSBOEQSFTFOUBUJPO
  2. • ‘Good’ data augmentation: – Fidelity = preserve the original

    meaning – Diversity = increase lexical/syntactical variation • Existing work of paraphrasing is basically fidelity-based and word-level • This work proposed a way to take diversity into account 2 (Kumar+, ’19) Overview ⇤2 † Manik Bhandari1 Partha Talukdar1 ence, Bangalore, India gy and Science, Pilani, India 55, mbbhandarimanik}@gmail.com SOURCE – how do i increase body height ? REFERENCE – what do i do to increase my height ? BEAM SEARCH – how do i increase my height ? – how do i increase my body height ? – how do i increase the height ? – how would i increase my body height ? DIPS (OURS) – how could i increase my height ? – what should i do to increase my height ? – what are the fastest ways to increase my height ? – is there any proven method to increase height ? Table 1: Sample paraphrases generated by beam search and DiPS (our method). DiPS offers lexically diverse paraphrases without compromising on fidelity.
  3. • This work adopts a seq2seq-based paraphrasing model • The

    decoder uses a diversity-aware function to choose a good subset of candidates (instead of standard beam search algorithms) • To make it mathematically possible, the function is implemented as a submodular function 4 Proposed method Proposed method Subset Selection how do i increase my … how can i decrease the … how can i grow the … what ways exist to increase … how would I increase the … how do I decrease the … i am 17 , what … are there ways to increase … Vt how do i increase my … how can i grow the … what ways exist to increase … are there ways to increase … X argmaxX⊆Vt, |X|= k F(X) k
  4. Proposed method 5 DiPS Induce Diversity while not compromising on

    Fidelity <sos> Where can I get that movie? can Where can I get that film? I <eos> How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that k- sequences Synonym (similar embeddings) Diversity Components Fidelity Components where , can , film , I , How , find that , that picture , .. I get , can I , Where can I Rewards unique n-grams Rewards Structural Coverage Source Sentence Where ENCODER DECODER
  5. Proposed method 6 DiPS Induce Diversity while not compromising on

    Fidelity <sos> Where can I get that movie? can Where can I get that film? I <eos> How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that k- sequences Synonym (similar embeddings) Diversity Components Fidelity Components where , can , film , I , How , find that , that picture , .. I get , can I , Where can I Rewards unique n-grams Rewards Structural Coverage Source Sentence Where ENCODER DECODER Diversity Encoder Decoder Fidelity
  6. Fidelity components 7 Fidelity Components Where can I get that

    film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I that that picture picture et get movie? Where can I Where can I that k- sequences Synonym (similar embeddings) nents Fidelity Components How , ure , can I que n-grams uctural Coverage Source Sentence ∑ x∈X (x, s) (x, s) = 1 |x| ∑ w i ∈x maxwj ∈s ψ(v wi , v wj ) Embedding based Similarity ∑ x∈X N ∑ n = 1 βn |xn-gram ∩ sn-gram| Lexical Similarity The hidden state of a time step t in the seq2seq’s decoder
  7. Diversity components 8 Where can I get that movie? Where

    can I get that film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that k- sequences Synonym (similar embeddings) Diversity Components Fidelity Components where , can , film , I , How , find that , that picture , .. I get , can I , Where can I Rewards unique n-grams Rewards Structural Coverage Source Sentence N ∑ n = 1 βn ⋃ x∈X x n −gram N-gram uniqueness ∑ x i ∈V(t) ∑ x j ∈X ℛ(x i , x j ) ℛ(x i , x j ) = 1 − EditDistance(x i , x j ) |xi | + |xj | Structural Coverage <sos> an I get that movie? can Where can I get that film? I <eos> How can I get that picture? find film? Where can I that Where can I ere How can can I I that that picture picture get find get movie? Where can I Where can I that k- sequences Synonym (similar embeddings) I get , can I , Where can I Rewards unique n-grams Rewards Structural Coverage Where ENCODER DECODER 5IFTVCTFU9 UPCFDIPTFO  TIPVMEDPWFSBTNBOZTUSVDUVSFT JOBMMDBOEJEBUFT7UBTQPTTJCMF .BOZVOJRVFOHSBNTXPVMENFBOUIBU TFOUFODFTJOUIFTVCTFU9BSFEJWFSTF Subset Selection how do i increase my … how can i decrease the … how can i grow the … what ways exist to increase … how would I increase the … how do I decrease the … i am 17 , what … are there ways to increase … Vt how do i increase my … how can i grow the … what ways exist to increase … are there ways to increase … X argmaxX⊆Vt, |X|= k F(X) k
  8. • Functions that take a set as the argument and

    hold diminishing returns property • ‘You have more to gain from something new, if you have less to begin with’ • Useful for economics, networks, machine learning – See this slide for more introduction 10 A bit mathematics Submodular functions Sub-modularity # Items = 4 F = 2 # Items = 4 + 1 F = 2 + 1 # Items = 5 + 1 F = 3 + 0 Diminishing Returns F = # Unique Coloured items
  9. • The authors implement fidelity and diversity as submodular functions

    of paraphrase candidate selection – i.e., functions introduced earlier • They are formed as the optimisation problem – There is an efficient algorithm (the left list) to solve maximisation problems of submodular functions if the functions also meet monotonicity 11 A bit mathematics Usage of submod. f. Monotonicity A B A F F ≤ Algorithm 1: Greedy selection for submodular opti- mization (Cardinality constraint) Input: Ground Set: V Budget: k Submodular Function: F 1 S ; 2 N V 3 while |S| < k do 4 x⇤ argmaxx2N F(S [ {x}) 5 S S [ {x⇤} 6 N N \ {x⇤} 7 end 8 return S approaches try to boost the generalization abili- ties of downstream classification models through word-level substitutions. However, they are in- herently restrictive in terms of the diversity they can offer. Our work offers a data-augmentation scheme via high quality paraphrases. Algorithm 2: DiPS Input: Input Sentence: Sin Max. decoding length: T Submodular objective: F No. of paraphrases required: k 1 Process Sin using the encoder of SEQ2SEQ 2 Start the decoder with input symbol sos 3 t 0; P ; 4 while t < T do 5 Generate top 3k most probable subsequences 6 P Select k based on argmax X✓V (t) F(X) using Algorithm 1 7 t = t + 1 8 end 9 return P The second criteria which the function needs to satisfy for Algorithm 1 to be applicable is of monotonicity. A set function F is said to be mono- tone non-decreasing if 8X ✓ Y, F(X)  F(Y ). Submodular functions are relevant in a large
  10. • To check fidelity, this work measures BLEU (, METEOR,

    and TER+) of generated paraphrases against reference paraphrases • For diversity, the count of unique 1-4 grams in generated paraphrases is measured • The result looks better than baselines – though I doubt BLEU works well with diverse paraphrases 13 Experimental results Intrinsic evaluation Fidelity & Diversity (Quora Dataset) 27 30 33 36 SBS DBS VAE-SVG DPP SSR DiPS (Ours) BLEU (Fidelity) 30 39 48 57 66 Models SBS DBS VAE-SVG DPP SSR DiPS (Ours) 4-Distinct (Diversity) IUUQTXXXLBHHMFDPNDRVPSBRVFTUJPOQBJST
  11. • Solve downstream tasks (text classification) using additional data augmented

    by the proposed method • Two simple models are used – LogReg = Logistic regression with simple features – LSTM = 1-layer LSTM • No information about how much data was augmented (!) 14 Experimental results Extrinsic evaluation Data Augmentation for Intent Classification Dataset : SNIPS Accuracy 93 94 95 96 97 98 Models No. Aug SBS DBS Syn. Rep Cont. Aug DiPS (Ours) LogReg LSTM Dataset : Yahoo-L31 Accuracy 62 63 64 65 66 67 Models No Aug. SBS DBS Syn.Rep Cont. Aug. DiPS (Ours) LogReg LSTM Data Augmentation for Intent Classification