Upgrade to Pro — share decks privately, control downloads, hide ads and more …

spaCy meets LLMs: Using Generative AI for Struc...

spaCy meets LLMs: Using Generative AI for Structured Data

Large Language Models (LLMs) have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and structured data. In this talk, I'll present pragmatic and practical approaches for how to use the latest generative models beyond just chat bots. I'll dive deeper into spaCy's LLM integration, which lets you plug in open-source and proprietary models and provides a robust framework for extracting structured information from text, distilling large models into smaller task-specific components, and closing the gap between prototype and production.

Ines Montani

June 11, 2024
Tweet

Resources

Using LLMs for structured data in spaCy

https://spacy.io/usage/large-language-models

The spacy-llm package integrates LLMs into spaCy pipelines, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various NLP tasks.

A practical guide to human-in-the-loop distillation

https://explosion.ai/blog/human-in-the-loop-distillation

This blog post presents practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

Using LLMs for human-in-the-loop distillation in Prodigy

https://prodi.gy/docs/large-language-models

Prodigy comes with preconfigured workflows for using LLMs to speed up and automate annotation and create datasets for distilling large generative models into more accurate, smaller, faster and fully private task-specific components.

More Decks by Ines Montani

Other Decks in Programming

Transcript

  1. carefully designed and consistent API extensible and programmable serializable data

    structures good error handling designed for production
  2. carefully designed and consistent API extensible and programmable serializable data

    structures good error handling pipeline approach to combine techniques and share data designed for production
  3. carefully designed and consistent API extensible and programmable serializable data

    structures good error handling pipeline approach to combine techniques and share data designed for production “just works” pre-configured solutions
  4. Prototype HOW TO AVOID THE Plateau standardize inputs and outputs

    start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking
  5. Prototype HOW TO AVOID THE Plateau standardize inputs and outputs

    start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking work iteratively
  6. Prototype HOW TO AVOID THE Plateau standardize inputs and outputs

    start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking consider structure and properties of language work iteratively
  7. Doc llm ... other components spacy.io/usage/large-language-models Prompt Template Model Response

    Parser Structured Attributes unified, model- agnostic API con fi g.cfg
  8. Doc llm ... other components LLM trained model rules spacy.io/usage/large-language-models

    Prompt Template Model Response Parser Structured Attributes unified, model- agnostic API con fi g.cfg
  9. Doc llm ... other components LLM trained model rules spacy.io/usage/large-language-models

    Prompt Template Model Response Parser Structured Attributes unified, model- agnostic API con fi g.cfg Entity Linking, Summarization, Entity Recognition, Span Categorization, Relation Extraction, Text Categorization, Sentiment Analysis, Translation
  10. production Doc Doc llm Entity Recognizer Text Categorizer … human-in-

    the-loop distillation prototype continuous evaluation baseline
  11. production Doc Doc llm Entity Recognizer Text Categorizer … human-in-

    the-loop distillation prototype continuous evaluation baseline distilled model