Reality is not an End-to-End Prediction Problem: Applied NLP in the Age of Generative AI

Ines Montani Explosion LLM

de fi nition s E volution

de fi nition s E volution rules or instructions ✍
programming & rules

programming & rules machine learning examples 📝 supervised learning

programming & rules machine learning examples 📝 supervised learning in-context learning rules or instructions ✍ LLM prompt engineering

programming & rules machine learning examples 📝 supervised learning in-context learning rules or instructions ✍ LLM prompt engineering instructions: human-shaped, easy for non-experts, risk of data drift ✍

programming & rules machine learning examples 📝 supervised learning in-context learning rules or instructions ✍ LLM prompt engineering instructions: human-shaped, easy for non-experts, risk of data drift ✍ 📝 examples: nuanced and intuitive behaviors, specific to use case, labor-intensive

programming & rules machine learning examples 📝 supervised learning in-context learning rules or instructions ✍ LLM prompt engineering ? ? LLM instructions: human-shaped, easy for non-experts, risk of data drift ✍ 📝 examples: nuanced and intuitive behaviors, specific to use case, labor-intensive

Falcon MIXTRAL GPT-4 LLM

Falcon MIXTRAL GPT-4 good contextual results LLM

Falcon MIXTRAL GPT-4 good contextual results easy to use &
configure LLM

Falcon MIXTRAL GPT-4 good contextual results easy to use &
configure fast prototyping LLM

Falcon MIXTRAL GPT-4 good contextual results ⚠ transparency easy to
use & configure fast prototyping LLM

Falcon MIXTRAL GPT-4 good contextual results ⚠ transparency ⚠ e
iciency easy to use & configure fast prototyping LLM

Falcon MIXTRAL GPT-4 good contextual results ⚠ data privacy ⚠
transparency ⚠ e iciency easy to use & configure fast prototyping LLM

P rototype task-specific output 💬 prompt 📖 text LLM GPT-4
API

P rototype task-specific output 💬 prompt 📖 text LLM prompt
model & transform output to structured data github.com/explosion/spacy-llm GPT-4 API

📖 text task-specific output P roduction P rototype task-specific output
💬 prompt 📖 text LLM prompt model & transform output to structured data github.com/explosion/spacy-llm GPT-4 API

💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm GPT-4 API

💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm ✅ modular GPT-4 API

💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm ✅ small & fast ✅ modular GPT-4 API

💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm ✅ data-private ✅ small & fast ✅ modular GPT-4 API

in the loop H uma n explosion.ai/blog/human-in-the-loop-distillation LLM

in the loop H uma n explosion.ai/blog/human-in-the-loop-distillation continuous evaluation baseline
LLM

LLM prompting

LLM prompting transfer learning CO M PO N EN T

LLM prompting transfer learning CO M PO N EN T distilled model

Case Stud y : S&P Global 99% 99% • real-time
commodities trading insights by extracting structured attributes 6mb 6mb model size 16k+ 16k+ words/second F-score explosion.ai/blog/sp-global-commodities

commodities trading insights by extracting structured attributes • high-security environment 6mb 6mb model size 16k+ 16k+ words/second F-score explosion.ai/blog/sp-global-commodities

commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation 6mb 6mb model size 16k+ 16k+ words/second F-score explosion.ai/blog/sp-global-commodities

commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop 6mb 6mb model size 16k+ 16k+ words/second F-score explosion.ai/blog/sp-global-commodities

commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production 6mb 6mb model size 16k+ 16k+ words/second F-score explosion.ai/blog/sp-global-commodities

Refactor your code and data.

Software 1.0 Software 1.0 📄 code 💾 program compiler

Software 1.0 Software 1.0 📄 code 💾 program compiler Software
2.0 Software 2.0 📊 data 🔮 model algorithm

2.0 Software 2.0 📊 data 🔮 model algorithm ✅ tests 📈 evaluation

2.0 Software 2.0 📊 data 🔮 model algorithm ✅ tests 📈 evaluation refactoring refactoring iteration iteration

I lo v e cats. SIMILAR OR NOT? I ha
t e cats.

I lo v e cats. SIMILAR OR NOT? I ha
t e cats. Your application context always matters!

Case Stud y : GitLab 1 year 1 year 6×
• extract actionable insights from support tickets and usage questions 6× speedup of support tickets explosion.ai/blog/gitlab-support-insights

• extract actionable insights from support tickets and usage questions • high-security environment 6× speedup of support tickets explosion.ai/blog/gitlab-support-insights

• extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions 6× speedup of support tickets explosion.ai/blog/gitlab-support-insights

• extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions • separated general-purpose features from product-specific logic 6× speedup of support tickets explosion.ai/blog/gitlab-support-insights

Language is just another interface.

“knocker-uppers”

The Window K nocking Machine Tes t ines.io/blog/window-knocking-machine-test “knocker-uppers”

The Window K nocking Machine Tes t ines.io/blog/window-knocking-machine-test Are you
designing a window-knocking machine or an alarm clock? “knocker-uppers”

Hello, I ’ m Toni ’ s virtual assistant and
I help schedule appointments. Do you have time at 1pm on Monday? No, but Tuesday would work for me. Okay, please confirm: Tuesday at 1pm? 1pm is unideal but 3pm would work. Toni doesn ’ t have availability at 3pm but I could offer a slot at 4pm or 5 : 30pm. Which time zone is this by the way? I ’ m in CET. ines.io/blog/window-knocking-machine-test

I help schedule appointments. Do you have time at 1pm on Monday? No, but Tuesday would work for me. Okay, please confirm: Tuesday at 1pm? 1pm is unideal but 3pm would work. Toni doesn ’ t have availability at 3pm but I could offer a slot at 4pm or 5 : 30pm. Which time zone is this by the way? I ’ m in CET. Calendly ines.io/blog/window-knocking-machine-test

I help schedule appointments. Do you have time at 1pm on Monday? No, but Tuesday would work for me. Okay, please confirm: Tuesday at 1pm? 1pm is unideal but 3pm would work. Toni doesn ’ t have availability at 3pm but I could offer a slot at 4pm or 5 : 30pm. Which time zone is this by the way? I ’ m in CET. Calendly “window-knocking machine” “alarm clock” ines.io/blog/window-knocking-machine-test

What ’ s the total services revenue from 2023? $2,923,531
How many clients is that in total? 29 ⏺ ⏺ ⏺ ines.io/blog/window-knocking-machine-test

What ’ s the total services revenue from 2023? $2,923,531
How many clients is that in total? 29 ⏺ ⏺ ⏺ 🔮 LLM 📚 database 🤖 agents ⚙ query Retrieval-Augmented Generation ines.io/blog/window-knocking-machine-test

2023 Year Services Type ACME Inc. FooBar GmbH NLPCorp XKCD
Ltd. Python AG 432,032 82,000 1,500 193,000 91,320 $ 2,625,032 Clients (28) Revenue What ’ s the total services revenue from 2023? $2,923,531 How many clients is that in total? 29 ⏺ ⏺ ⏺ 🔮 LLM 📚 database 🤖 agents ⚙ query Retrieval-Augmented Generation ines.io/blog/window-knocking-machine-test

2023 Year Services Type ACME Inc. FooBar GmbH NLPCorp XKCD
Ltd. Python AG 432,032 82,000 1,500 193,000 91,320 $ 2,625,032 Clients (28) Revenue A I still needs produc t decisions! Kim Miller Analyst What ’ s the total services revenue from 2023? $2,923,531 How many clients is that in total? 29 ⏺ ⏺ ⏺ 🔮 LLM 📚 database 🤖 agents ⚙ query Retrieval-Augmented Generation ines.io/blog/window-knocking-machine-test

Summar y APPLIED NLP & GEN AI APPLIED NLP &
GEN AI

Reason and refactor. The key to success lies in your
data and may surprise you! Summar y APPLIED NLP & GEN AI APPLIED NLP & GEN AI

data and may surprise you! Summar y APPLIED NLP & GEN AI APPLIED NLP & GEN AI Think beyond chat bots. You don’t want to build a “window-knocking machine”.

data and may surprise you! LLM Stay ambitious. Don’t compromise on best practices, e iciency and privacy. Summar y APPLIED NLP & GEN AI APPLIED NLP & GEN AI Think beyond chat bots. You don’t want to build a “window-knocking machine”.

Explosion spaCy Prodigy Twitter Mastodon Bluesky explosion.ai spacy.io prodigy.ai @_inesmontani
@[email protected] @inesmontani.bsky.social LinkedIn

Reality is not an End-to-End Prediction Problem...

Reality is not an End-to-End Prediction Problem: Applied NLP in the Age of Generative AI

Video

Resources

A practical guide to human-in-the-loop distillation

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

How GitLab uses spaCy to analyze support tickets and empower their community

Applied NLP Thinking: How to Translate Problems into Solutions

The Window-Knocking Machine Test

Using LLMs for human-in-the-loop distillation in Prodigy

More Decks by Ines Montani

Other Decks in Technology

Featured

Transcript