Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

As the field of natural language processing advances and new ideas develop, we’re seeing more and more ways to use compute efficiently, producing AI systems that are cheaper to run and easier to control. Large Language Models (LLMs) have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and data privacy. In this talk, I'll show some practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

I'll share some real-world case studies and approaches for using large generative models at development time instead of runtime, curate their structured predictions with an efficient human-in-the-loop workflow and distill task-specific components as small as 6mb that run cheaply, privately and reliably, and that you can compose into larger NLP systems.

If you’re trying to build a system that does a particular thing, you don’t need to transform your request into arbitrary language and call into the largest model that understands arbitrary language the best. The people developing those models are telling that story, but the rest of us aren’t obliged to believe them.

Case Study #1: https://speakerdeck.com/inesmontani/workshop-half-hour-of-labeling-power-can-we-beat-gpt
Case Study #2: https://explosion.ai/blog/sp-global-commodities

Ines Montani

June 15, 2024
Tweet

Video


Resources

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

https://explosion.ai/blog/sp-global-commodities

A case study on S&P Global’s efficient information extraction pipelines for real-time commodities trading insights in a high-security environment using human-in-the-loop distillation.

Half hour of labeling power: Can we beat GPT?

https://speakerdeck.com/inesmontani/workshop-half-hour-of-labeling-power-can-we-beat-gpt

A case study using LLMs to create data and beating the few-shot baseline with a distilled task-specific model for extracting dishes, ingredients and equipment from r/cooking Reddit posts.

Applied NLP Thinking: How to Translate Problems into Solutions

https://explosion.ai/blog/applied-nlp-thinking

This blog post discusses some of the biggest challenges for applied NLP and translating business problems into machine learning solutions, including the distinction between utility and accuracy.

Using LLMs for structured data in spaCy

https://spacy.io/usage/large-language-models

The spacy-llm package integrates LLMs into spaCy pipelines, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various NLP tasks.

Using LLMs for human-in-the-loop distillation in Prodigy

https://prodi.gy/docs/large-language-models

Prodigy comes with preconfigured workflows for using LLMs to speed up and automate annotation and create datasets for distilling large generative models into more accurate, smaller, faster and fully private task-specific components.

More Decks by Ines Montani

Other Decks in Programming

Transcript

  1. Modern scriptable annotation tool for machine learning developers PRODIGY 900+

    companies prodigy.ai Alex Smith Developer Kim Miller Analyst 10k+ users
  2. Exceeds expectations kinda meh, really Just got the SpacePhone Nebula

    and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  3. Exceeds expectations kinda meh, really find mentions of products Just

    got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  4. Exceeds expectations kinda meh, really find mentions of products link

    mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  5. Exceeds expectations kinda meh, really extract sentiment for di erent

    attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  6. add results to database Exceeds expectations kinda meh, really extract

    sentiment for di erent attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  7. add results to database Exceeds expectations kinda meh, really extract

    sentiment for di erent attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  8. add results to database Exceeds expectations kinda meh, really extract

    sentiment for di erent attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  9. distilled task-specific model transfer learning ELECTRA T5 in-context learning Falcon

    MIXTRAL GPT-4 BERT-base is still very competitive! large generative model
  10. 📖 text 🔮 model raw output ⚙ parser task output

    💬 template prompt WORKflow in-context learning
  11. 📖 text 🔮 model raw output ⚙ parser task output

    💬 template prompt WORKflow in-context learning ⚗ distillation 🎯 annotation task dataset task-specific model transfer learning
  12. CLOSE THE GAP BETWEEN prototype AND production standardize inputs and

    outputs start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking
  13. CLOSE THE GAP BETWEEN prototype AND production standardize inputs and

    outputs start with evaluation work on data iteratively assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking
  14. CLOSE THE GAP BETWEEN prototype AND production standardize inputs and

    outputs start with evaluation work on data iteratively assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking consider structure and ambiguity of natural language
  15. processing pipeline prototype processing pipeline in production structured machine-facing Doc

    object github.com/explosion/spacy-llm prompt model & transform output to structured data structured machine-facing Doc object
  16. kinda meh, really the nebula surely looks nice and all

    but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  17. Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance null

    Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  18. prodigy.ai Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance

    null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  19. prodigy.ai Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance

    null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark! selection by generative model GPT-4 API
  20. prodigy.ai Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance

    null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark! selection by generative model GPT-4 API can be faster, not slower!
  21. CASE STUDY #1 400mb model size 2k+ words/second 8hr data

    dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts
  22. CASE STUDY #1 400mb model size 2k+ words/second 8hr data

    dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation
  23. CASE STUDY #1 400mb model size 2k+ words/second 8hr data

    dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • beat few-shot LLM baseline of 0.74 with task-specific model
  24. CASE STUDY #1 400mb model size 2k+ words/second 8hr data

    dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • beat few-shot LLM baseline of 0.74 with task-specific model • 20× inference time speedup
  25. • S&P Global: real-time commodities trading insights by extracting structured

    attributes explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  26. • S&P Global: real-time commodities trading insights by extracting structured

    attributes • high-security environment explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  27. • S&P Global: real-time commodities trading insights by extracting structured

    attributes • high-security environment • used LLM during annotation explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  28. • S&P Global: real-time commodities trading insights by extracting structured

    attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  29. • S&P Global: real-time commodities trading insights by extracting structured

    attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  30. • S&P Global: real-time commodities trading insights by extracting structured

    attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score
  31. THINK OF IT AS A refactoring PROCESS break down larger

    problems reassess dependencies make problem easier
  32. THINK OF IT AS A refactoring PROCESS break down larger

    problems reassess dependencies choose the best techniques make problem easier
  33. THINK OF IT AS A refactoring PROCESS factor out business

    logic break down larger problems reassess dependencies choose the best techniques make problem easier
  34. MAKE PROBLEM easier less operational complexity means less can go

    wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎
  35. MAKE PROBLEM easier less operational complexity means less can go

    wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research
  36. MAKE PROBLEM easier less operational complexity means less can go

    wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge
  37. MAKE PROBLEM easier less operational complexity means less can go

    wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations
  38. MAKE PROBLEM easier less operational complexity means less can go

    wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel
  39. 🛠 application MAKE PROBLEM easier less operational complexity means less

    can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel
  40. 🛠 application MAKE PROBLEM easier less operational complexity means less

    can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge
  41. 🛠 application MAKE PROBLEM easier less operational complexity means less

    can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge • align evaluation to project goals
  42. 🛠 application MAKE PROBLEM easier less operational complexity means less

    can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge • align evaluation to project goals • do whatever works
  43. FACTOR OUT business LOGIC SpacePhone Nebula Released: June 2024 P3204-W2130

    kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  44. FACTOR OUT business LOGIC result = business_logic(classification(text)) SpacePhone Nebula Released:

    June 2024 P3204-W2130 kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  45. FACTOR OUT business LOGIC result = business_logic(classification(text)) SpacePhone Nebula Released:

    June 2024 P3204-W2130 kinda meh, really products model phone comparison the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  46. FACTOR OUT business LOGIC result = business_logic(classification(text)) latest model catalog

    reference touchscreen worse than SpacePhone Nebula Released: June 2024 P3204-W2130 kinda meh, really products model phone comparison the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!
  47. REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the

    right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process.
  48. REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the

    right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong.
  49. REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the

    right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong. Expect surprises from the data, and plan for change.
  50. REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the

    right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong. Expect surprises from the data, and plan for change. There’s no need to compromise on development best practices or privacy.