Upgrade to Pro — share decks privately, control downloads, hide ads and more …

spaCy v3: Design concepts explained

spaCy v3: Design concepts explained

Video: https://www.youtube.com/watch?v=BWhh3r6W-qE
Blog: https://explosion.ai/blog/spacy-design-concepts

spaCy is a popular open-source library for industrial-strength Natural Language Processing in Python. spaCy v3.0 features new transformer-based pipelines that get spaCy’s accuracy right up to the current state-of-the-art, and a new training config and workflow system to help you take projects from prototype to production. In this video, I’ll show you some of the new design concepts and explain what’s going on under the hood, how we’ve implemented them and most importantly, why. I’ll also share some lessons we’ve learned about developer experience along the way.

Ines Montani

February 01, 2021
Tweet

Video


Resources

spaCy behind the scenes: library patterns & design concepts explained

https://explosion.ai/blog/spacy-design-concepts

Developer productivity has been central to our design of spaCy, both in smaller decisions and some of the bigger architectural questions. Read on to learn some of the design patterns within the library, how we've implemented them, and most importantly, why.

More Decks by Ines Montani

Other Decks in Programming

Transcript

  1. global function registry system programmable user-facing APIs type-based data validation

    “bottom-up” configuration system type hints & static analysis for model definitions
  2. advanced workflows for modern NLP & deep learning SCENARIO #1

    ease of use with pre-configured building blocks & good defaults SCENARIO #2
  3. serialized model save load custom code & settings How should

    I reconstruct this object? define how to create custom objects
  4. we always need to know how an object expects to

    be created spacy.io/usage/processing-pipelines
  5. Y: Floats3d Incompatible return value type (got "Tuple[Floats3d, Callable[[Any], Any]]",

    expected return types static analysis: catch errors as you type
  6. Relu: Relu Layer outputs type (thinc.types.Floats2d) but the next layer

    expects (thinc.types.Ragged) as an input mypy.ini optional mypy plugin for more checks
  7. Relu: Relu Layer outputs type (thinc.types.Floats2d) but the next layer

    expects (thinc.types.Ragged) as an input static analysis: catch errors as you type mypy.ini optional mypy plugin for more checks
  8. extensive documentation user-focused error handling & validation consistent naming avoid

    redundant shortcuts & competing abstractions Developer Productivity
  9. extensive documentation user-focused error handling & validation consistent naming avoid

    redundant shortcuts & competing abstractions smooth path from prototype to production Developer Productivity
  10. extensive documentation user-focused error handling & validation consistent naming avoid

    redundant shortcuts & competing abstractions smooth path from prototype to production provide building blocks to program with, not just abstractions Developer Productivity
  11. extensive documentation user-focused error handling & validation consistent naming avoid

    redundant shortcuts & competing abstractions smooth path from prototype to production provide building blocks to program with, not just abstractions Developer Productivity
  12. NLP