Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Ines Montani Explosion A practical guide to human-in-the-loop distillation

Open-source library for industrial-strength natural language processing spacy.io SPACY 250m+
downloads

Open-source library for industrial-strength natural language processing spacy.io ChatGPT can
write spaCy code! SPACY 250m+ downloads

Modern scriptable annotation tool for machine learning developers PRODIGY 900+
companies prodigy.ai 10k+ users

Modern scriptable annotation tool for machine learning developers PRODIGY 900+
companies prodigy.ai Alex Smith Developer Kim Miller Analyst GPT-4 API 10k+ users

BACK TO OUR ROOTS explosion.ai/blog/back-to-our-roots We’re back to running Explosion
as a smaller, independent-minded and self-su ff icient company. Ines Montani Founder Matthew Honnibal Founder

BACK TO OUR ROOTS explosion.ai/blog/back-to-our-roots We’re back to running Explosion
as a smaller, independent-minded and self-su ff icient company. Consulting open source developer tools Ines Montani Founder Matthew Honnibal Founder

SOFTWARE IN Industry

modular SOFTWARE IN Industry

modular transparent SOFTWARE IN Industry

modular transparent explainable SOFTWARE IN Industry

modular transparent explainable data-private SOFTWARE IN Industry

modular transparent explainable data-private reliable SOFTWARE IN Industry

modular transparent explainable data-private reliable a ordable SOFTWARE IN Industry

black-box models modular transparent explainable data-private reliable a ordable SOFTWARE
IN Industry

third-party APIs black-box models modular transparent explainable data-private reliable a
ordable SOFTWARE IN Industry

Exceeds expectations kinda meh, really Just got the SpacePhone Nebula
and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Exceeds expectations kinda meh, really find mentions of products Just
got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Exceeds expectations kinda meh, really find mentions of products link
mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Exceeds expectations kinda meh, really extract sentiment for di erent
attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

add results to database Exceeds expectations kinda meh, really extract
sentiment for di erent attributes Battery Camera Performance Design camera battery design battery camera find mentions of products link mentions to catalog SpacePhone Nebula Released: June 2024 P3204-W2130 Just got the SpacePhone Nebula and I’m honestly blown away! The camera quality is amazing. And the battery life is incredible, easily lasting me a full day on a single charge. the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

large generative model

in-context learning Falcon MIXTRAL GPT-4 large generative model

distilled task-specific model in-context learning Falcon MIXTRAL GPT-4 large generative
model

distilled task-specific model transfer learning ELECTRA T5 in-context learning Falcon
MIXTRAL GPT-4 large generative model

distilled task-specific model transfer learning ELECTRA T5 in-context learning Falcon
MIXTRAL GPT-4 BERT-base is still very competitive! large generative model

📖 text 🔮 model raw output ⚙ parser task output
💬 template prompt WORKflow in-context learning explosion.ai/blog/human-in-the-loop-distillation

📖 text 🔮 model raw output ⚙ parser task output
💬 template prompt WORKflow in-context learning ⚗ distillation 🎯 annotation task dataset task-specific model transfer learning explosion.ai/blog/human-in-the-loop-distillation

CLOSE THE GAP BETWEEN prototype AND production

CLOSE THE GAP BETWEEN prototype AND production standardize inputs and
outputs

outputs start with evaluation

outputs start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking

outputs start with evaluation work on data iteratively assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking

outputs start with evaluation work on data iteratively assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking consider structure and ambiguity of natural language

processing pipeline prototype

processing pipeline prototype github.com/explosion/spacy-llm prompt model & transform output to
structured data structured machine-facing Doc object

processing pipeline prototype processing pipeline in production structured machine-facing Doc
object github.com/explosion/spacy-llm prompt model & transform output to structured data structured machine-facing Doc object

human IN THE LOOP

continuous evaluation baseline human IN THE LOOP

continuous evaluation baseline prompting human IN THE LOOP

continuous evaluation baseline prompting transfer learning human IN THE LOOP

continuous evaluation baseline prompting transfer learning human IN THE LOOP
distilled model

kinda meh, really the nebula surely looks nice and all
but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance null
Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

prodigy.ai Relevant Mention "nebula" Catalog ID P3204-W2130 Battery Camera Performance
null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark! selection by generative model GPT-4 API

null Design structured data kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark! selection by generative model GPT-4 API can be faster, not slower!

CASE STUDY #1 400mb model size 2k+ words/second 8hr data
dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts

dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation

dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • beat few-shot LLM baseline of 0.74 with task-specific model

dev time spacy.fyi/pydata-nyc • PyData NYC 2023 workshop: extracting dishes, ingredients and equipment from r/cooking Reddit posts • used LLM during annotation • beat few-shot LLM baseline of 0.74 with task-specific model • 20× inference time speedup

• S&P Global: real-time commodities trading insights by extracting structured
attributes explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score

attributes • high-security environment explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score

attributes • high-security environment • used LLM during annotation explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score

attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score

attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production explosion.ai/blog/sp-global-commodities CASE STUDY #2 6mb model size 16k+ words/second 99% F-score

THINK OF IT AS A refactoring PROCESS

THINK OF IT AS A refactoring PROCESS break down larger
problems

THINK OF IT AS A refactoring PROCESS break down larger
problems make problem easier

THINK OF IT AS A refactoring PROCESS factor out business
logic break down larger problems make problem easier

logic break down larger problems reassess dependencies make problem easier

logic break down larger problems reassess dependencies choose the best techniques make problem easier

MAKE PROBLEM easier

MAKE PROBLEM easier less operational complexity means less can go
wrong

wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎

wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research

wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge

wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations

wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel

🛠 application MAKE PROBLEM easier less operational complexity means less
can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel

can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge

can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge • align evaluation to project goals

can go wrong development complexity beginner 🤓 intermediate 🥸 advanced 😎 🎓 research • build a commons of knowledge • make direct comparisons using standard evaluations • standardize what isn’t novel • learn from commons of knowledge • align evaluation to project goals • do whatever works

FACTOR OUT business LOGIC SpacePhone Nebula Released: June 2024 P3204-W2130
kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

FACTOR OUT business LOGIC result = business_logic(classification(text)) SpacePhone Nebula Released:
June 2024 P3204-W2130 kinda meh, really the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

FACTOR OUT business LOGIC result = business_logic(classification(text)) SpacePhone Nebula Released:
June 2024 P3204-W2130 kinda meh, really products model phone comparison the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

FACTOR OUT business LOGIC result = business_logic(classification(text)) latest model catalog
reference touchscreen worse than SpacePhone Nebula Released: June 2024 P3204-W2130 kinda meh, really products model phone comparison the nebula surely looks nice and all but for that price tag i expected more tbh… never had to carry a powerbank with my old iphone 13 but now i need it all the time 🙃 and night mode doesn’t really work. my pics are way too dark!

CASE STUDY #3 1 year of support tickets 6× speedup
explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions

explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions • high-security environment

explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions

explosion.ai/blog/gitlab-support-insights • GitLab: extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions • separated general-purpose features from product-specific logic

REALITY IS NOT AN end-to-end PREDICTION PROBLEM explosion.ai/blog/human-in-the-loop-distillation

REALITY IS NOT AN end-to-end PREDICTION PROBLEM Human-in-the- loop distillation
is a refactoring process. explosion.ai/blog/human-in-the-loop-distillation

REALITY IS NOT AN end-to-end PREDICTION PROBLEM Iteration and the
right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. explosion.ai/blog/human-in-the-loop-distillation

right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong. explosion.ai/blog/human-in-the-loop-distillation

right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong. Expect surprises from the data, and plan for change. explosion.ai/blog/human-in-the-loop-distillation

right tooling can get you past the prototype plateau. Human-in-the- loop distillation is a refactoring process. Less operational complexity means less can go wrong. Expect surprises from the data, and plan for change. There’s no need to compromise on development best practices or privacy. explosion.ai/blog/human-in-the-loop-distillation

Explosion spaCy Prodigy Twitter Mastodon Bluesky explosion.ai spacy.io prodigy.ai @_inesmontani
@[email protected] @inesmontani.bsky.social LinkedIn

Taking LLMs out of the black box: A practical g...

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

Video

Resources

A practical guide to human-in-the-loop distillation

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

Half hour of labeling power: Can we beat GPT?

Applied NLP Thinking: How to Translate Problems into Solutions

How GitLab uses spaCy to analyze support tickets and empower their community

Using LLMs for structured data in spaCy

Using LLMs for human-in-the-loop distillation in Prodigy

InfoQ Dev Summit presentation

More Decks by Ines Montani

Other Decks in Programming

Featured

Transcript