Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Natural Language Processing Expert Briefing @ P...

Natural Language Processing Expert Briefing @ PyData Global 2021

Slides for the Expert Briefing session on Natural Language Processing at PyData Global 2021 https://pydata.org/global2021/expert-briefings/

Speaker: Marco Bonzanini https://twitter.com/marcobonzanini

Marco Bonzanini

October 20, 2021
Tweet

More Decks by Marco Bonzanini

Other Decks in Technology

Transcript

  1. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Nice to meet you

    • Consulting, training and coaching on Python + Data Science • Chair @ PyData London 2
  2. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Natural Language Processing 4

    Natural Language
 Understanding Natural Language
 Generation
  3. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 5 That that is

    is that that is not is not is that it it is (That’s proper English)
  4. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 6 That that is,

    is. That that is not, is not. Is that it? It is. More fun at: https://en.wikipedia.org/wiki/List_of_linguistic_example_sentences Pics: https://en.wikipedia.org/wiki/Socrates and https://en.wikipedia.org/wiki/Parmenides
  5. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Language is challenging •

    Language is evolving • Language is ambiguous • (Understanding) Language requires context 11
  6. © Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data

    • Variability: domains and languages • Available data: sparse 14
  7. © Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data

    • Variability: domains and languages • Available data: sparse • Available data: bias 15
  8. © Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data

    • Variability: domains and languages • Available data: sparse • Available data: bias • Annotating data is a bottleneck 16
  9. © Bonzanini Consulting Ltd — BonzaniniConsulting.com • 1950s Symbolic /

    rule-based • 1990s Stats / annotated data / Machine Learning 19 (Incomplete) History of NLP
  10. © Bonzanini Consulting Ltd — BonzaniniConsulting.com • 1950s Symbolic /

    rule-based • 1990s Stats / annotated data / Machine Learning • 2010s Neural Nets / Deep Learning 20 (Incomplete) History of NLP
  11. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 24 Evolution of Models

    Bag-of-words Word Embeddings (circa 2013) “Traditional” ML models
  12. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 25 Evolution of Models

    Bag-of-words Word Embeddings (circa 2013) “Traditional” ML models RNN/LSTM (circa 2015)
  13. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 26 Evolution of Models

    Bag-of-words Word Embeddings (circa 2013) “Traditional” ML models RNN/LSTM (circa 2015) Transformers (circa 2017)
  14. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers • Parallelisation →

    training on bigger dataset • Fine-tuning on specific task 29
  15. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers • Parallelisation →

    training on bigger dataset • Fine-tuning on specific task • Bigger and bigger models 30
  16. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers • Parallelisation →

    training on bigger dataset • Fine-tuning on specific task • Bigger and bigger models • Pre-trained models 31