Upgrade to Pro — share decks privately, control downloads, hide ads and more …

spaCy v3: State-of-the-art NLP from Prototype t...

spaCy v3: State-of-the-art NLP from Prototype to Production

Video: https://www.youtube.com/watch?v=9k_EfV7Cns0

spaCy is a popular open-source library for industrial-strength Natural Language Processing in Python. spaCy v3.0 features new transformer-based pipelines that get spaCy’s accuracy right up to the current state-of-the-art, and a new training config and workflow system to help you take projects from prototype to production.

Ines Montani

February 01, 2021
Tweet

Video

More Decks by Ines Montani

Other Decks in Programming

Transcript

  1. Benchmarks spacy.io/usage/facts- fi gures Parser Tagger NER Speed en_core_web_trf 95.8

    98.1 90.6 4k wps en_core_web_lg 92.0 97.2 87.0 30k wps SOTA 96.2 98.3 89.7 - NEW
  2. spacy.io/usage/embeddings-transformers Entity Recognizer Dependency Parser Transformer ... BERT DistilBERT RoBERTa

    ... easily use any transformer in your pipeline packages with state- of-the-art trained pipelines en _core _web _trf
  3. spacy.io/usage/embeddings-transformers Entity Recognizer Dependency Parser Transformer ... share one transformer

    across your whole pipeline BERT DistilBERT RoBERTa ... easily use any transformer in your pipeline packages with state- of-the-art trained pipelines en _core _web _trf
  4. spacy.io/usage/embeddings-transformers Entity Recognizer Dependency Parser Transformer ... share one transformer

    across your whole pipeline BERT DistilBERT RoBERTa ... easily use any transformer in your pipeline update one transformer from multiple components packages with state- of-the-art trained pipelines en _core _web _trf
  5. problem #1: nested defaults become hidden defaults problem #3: difficult

    to iterate, swap, mix and match problem #2: defaults will conflict problems.py
  6. optimize model settings for accuracy or efficiency components to train

    spacy.io/usage/training generate starter config
  7. step #1: train and package pipeline step #2: get pip-installable

    Python package en_your_pipeline step #3: ship and use pipeline
  8. con fi g.cfg built-in logger for tracking experiments log all

    config settings & discover correlations
  9. con fi g.cfg choose which components to update during training

    source components from trained pipelines
  10. Y: Floats3d Incompatible return value type (got "Tuple[Floats3d, Callable[[Any], Any]]",

    expected return types static analysis: catch errors as you type
  11. Relu: Relu Layer outputs type (thinc.types.Floats2d) but the next layer

    expects (thinc.types.Ragged) as an input mypy.ini optional mypy plugin for more checks
  12. Relu: Relu Layer outputs type (thinc.types.Floats2d) but the next layer

    expects (thinc.types.Ragged) as an input static analysis: catch errors as you type mypy.ini optional mypy plugin for more checks
  13. Base support & trained pipelines for many languages Multi-task learning

    with transformers like BERT State-of-the- art speed spacy.io
  14. Base support & trained pipelines for many languages Multi-task learning

    with transformers like BERT State-of-the- art speed Components for NER, tagging, parsing, text classification, entity linking & more spacy.io
  15. Base support & trained pipelines for many languages Multi-task learning

    with transformers like BERT State-of-the- art speed Custom trainable & rule-based components Components for NER, tagging, parsing, text classification, entity linking & more spacy.io
  16. Base support & trained pipelines for many languages Multi-task learning

    with transformers like BERT State-of-the- art speed Custom trainable & rule-based components Components for NER, tagging, parsing, text classification, entity linking & more Production-ready training system, model packaging & workflow management spacy.io
  17. Base support & trained pipelines for many languages Awesome ecosystem

    Multi-task learning with transformers like BERT State-of-the- art speed Custom trainable & rule-based components Components for NER, tagging, parsing, text classification, entity linking & more Production-ready training system, model packaging & workflow management spacy.io