The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs (QCon London)

Ines Montani Explosion

Developer tools company specializing in AI, machine learning and NLP.
explosion.ai EXPLOSION

explosion.ai EXPLOSION Ines Montani Founder and CEO

explosion.ai EXPLOSION Ines Montani Founder and CEO Matthew Honnibal Founder and CTO

Open-source library for industrial-strength natural language processing spacy.io SPACY 200m+
downloads

Open-source library for industrial-strength natural language processing spacy.io SPACY ChatGPT
can write spaCy code! 200m+ downloads

Modern scriptable annotation tool for machine learning developers prodigy.ai PRODIGY
9k+ 800+ users companies

Modern scriptable annotation tool for machine learning developers prodigy.ai PRODIGY
9k+ 800+ users companies Alex Smith Developer Kim Miller Analyst

Collaborative data development platform prodigy.ai/teams PRODIGY TEAMS BETA

Collaborative data development platform Alex Smith Developer Kim Miller Analyst
GPT-4 API prodigy.ai/teams PRODIGY TEAMS BETA

WHY OPEN SOURCE?

WHY OPEN SOURCE? transparent

WHY OPEN SOURCE? transparent no lock-in

WHY OPEN SOURCE? transparent no lock-in extensible

WHY OPEN SOURCE? transparent no lock-in extensible runs in-house

WHY OPEN SOURCE? transparent no lock-in extensible runs in-house easy
to get started

WHY OPEN SOURCE? transparent no lock-in extensible community-vetted runs in-house
easy to get started

WHY OPEN SOURCE? transparent no lock-in programmable extensible community-vetted runs
in-house easy to get started

WHY OPEN SOURCE? transparent no lock-in up to date programmable
extensible community-vetted runs in-house easy to get started

WHY OPEN SOURCE? transparent no lock-in up to date programmable
extensible community-vetted runs in-house easy to get started also free!

OPEN-SOURCE MODELS

task-specific models OPEN-SOURCE MODELS

task-specific models small, often fast, cheap to run, don’t always
generalize well, need data to fine-tune OPEN-SOURCE MODELS

encoder models ELECTRA T5 task-specific models small, often fast, cheap
to run, don’t always generalize well, need data to fine-tune OPEN-SOURCE MODELS

to run, don’t always generalize well, need data to fine-tune relatively small and fast, affordable to run, generalize & adapt well, need data to fine-tune OPEN-SOURCE MODELS

to run, don’t always generalize well, need data to fine-tune relatively small and fast, affordable to run, generalize & adapt well, need data to fine-tune OPEN-SOURCE MODELS large generative models Falcon MIXTRAL

to run, don’t always generalize well, need data to fine-tune relatively small and fast, affordable to run, generalize & adapt well, need data to fine-tune OPEN-SOURCE MODELS large generative models Falcon MIXTRAL very large, often slower, expensive to run, generalize & adapt well, need little to no data

encoder models large generative models ENCODING & DECODING TASKS

encoder models large generative models ENCODING & DECODING TASKS network
trained for specific tasks using model to encode input ! model " text vectors ! task model task output # task network labels

encoder models large generative models ENCODING & DECODING TASKS model
generates text that can be parsed into task-specific output " text ! model raw output ⚙ parser task output % template prompt network trained for specific tasks using model to encode input ! model " text vectors ! task model task output # task network labels

to run, don’t always generalize well, need data to fine-tune relatively small and fast, affordable to run, generalize & adapt well, need data to fine-tune OPEN-SOURCE MODELS large generative models Falcon MIXTRAL very large, often slower, expensive to run, generalize & adapt well, need little to no data

output costs ECONOMIES OF SCALE

output costs OpenAI Google ECONOMIES OF SCALE

output costs OpenAI Google ECONOMIES OF SCALE access to talent,
compute etc.

compute etc. API request batching

compute etc. API request batching high traffic & & & & & & & & low traffic batch & & & & & & & & …

output costs OpenAI Google you ' ECONOMIES OF SCALE access
to talent, compute etc. API request batching high traffic & & & & & & & & low traffic batch & & & & & & & & …

human-facing systems machine-facing models ChatGPT GPT-4 AI PRODUCTS ARE MORE
THAN JUST A MODEL

human-facing systems machine-facing models ChatGPT GPT-4 most important differentiation is
product, not just technology AI PRODUCTS ARE MORE THAN JUST A MODEL

human-facing systems machine-facing models ChatGPT GPT-4 most important differentiation is
product, not just technology UI / UX marketing customization AI PRODUCTS ARE MORE THAN JUST A MODEL

human-facing systems machine-facing models ChatGPT GPT-4 swappable components based on
research, impacts are quantifiable most important differentiation is product, not just technology UI / UX marketing customization AI PRODUCTS ARE MORE THAN JUST A MODEL

research, impacts are quantifiable most important differentiation is product, not just technology cost speed accuracy latency UI / UX marketing customization AI PRODUCTS ARE MORE THAN JUST A MODEL

research, impacts are quantifiable most important differentiation is product, not just technology cost speed accuracy latency UI / UX marketing customization AI PRODUCTS ARE MORE THAN JUST A MODEL But what about the data?

research, impacts are quantifiable most important differentiation is product, not just technology cost speed accuracy latency UI / UX marketing customization AI PRODUCTS ARE MORE THAN JUST A MODEL But what about the data? User data is an advantage for product, not the foundation for machine-facing tasks.

research, impacts are quantifiable most important differentiation is product, not just technology cost speed accuracy latency UI / UX marketing customization AI PRODUCTS ARE MORE THAN JUST A MODEL But what about the data? User data is an advantage for product, not the foundation for machine-facing tasks. You don’t need specific data to gain general knowledge.

USE CASES IN INDUSTRY predictive tasks ( entity recognition )
relation extraction * coreference resolution # grammar & morphology + semantic parsing % discourse structure , text classification generative tasks " single/multi-doc summarization - reasoning ✅ problem solving ✍ paraphrasing 0 style transfer ⁉ question answering

USE CASES IN INDUSTRY predictive tasks ( entity recognition )
relation extraction * coreference resolution # grammar & morphology + semantic parsing % discourse structure , text classification generative tasks " single/multi-doc summarization - reasoning ✅ problem solving ✍ paraphrasing 0 style transfer ⁉ question answering many industry problems have remained the same, they just changed in scale structured data

EVOLUTION OF PROBLEM DEFINITIONS

rules or instructions ✍ EVOLUTION OF PROBLEM DEFINITIONS

programming & rules rules or instructions ✍ EVOLUTION OF PROBLEM
DEFINITIONS

programming & rules rules or instructions ✍ machine learning examples
2 EVOLUTION OF PROBLEM DEFINITIONS

supervised learning programming & rules rules or instructions ✍ machine
learning examples 2 EVOLUTION OF PROBLEM DEFINITIONS

supervised learning programming & rules rules or instructions ✍ in-context
learning rules or instructions ✍ machine learning examples 2 EVOLUTION OF PROBLEM DEFINITIONS

supervised learning prompt engineering programming & rules rules or instructions
✍ in-context learning rules or instructions ✍ machine learning examples 2 EVOLUTION OF PROBLEM DEFINITIONS

✍ in-context learning rules or instructions ✍ machine learning examples 2 EVOLUTION OF PROBLEM DEFINITIONS instructions: human-shaped, easy for non-experts, risk of data drift ✍

✍ in-context learning rules or instructions ✍ machine learning examples 2 EVOLUTION OF PROBLEM DEFINITIONS instructions: human-shaped, easy for non-experts, risk of data drift ✍ 2 examples: nuanced and intuitive behaviors, specific to use case, labor-intensive

large general- purpose model domain- specific data WORKFLOW EXAMPLE

prompting large general- purpose model domain- specific data WORKFLOW EXAMPLE

prompting large general- purpose model continuous evaluation baseline domain- specific
data WORKFLOW EXAMPLE

prompting large general- purpose model continuous evaluation baseline domain- specific
data WORKFLOW EXAMPLE iterative model-assisted data annotation

prompting large general- purpose model distilled task- specific model transfer
learning continuous evaluation baseline domain- specific data WORKFLOW EXAMPLE iterative model-assisted data annotation

prompting large general- purpose model distilled task- specific model transfer
learning continuous evaluation baseline distilled model domain- specific data WORKFLOW EXAMPLE iterative model-assisted data annotation

processing pipeline prototype PROTOTYPE TO PRODUCTION

github.com/explosion/spacy-llm prompt model & transform output to structured data processing
pipeline prototype PROTOTYPE TO PRODUCTION

processing pipeline in production swap, replace and mix components github.com/explosion/spacy-llm
prompt model & transform output to structured data processing pipeline prototype PROTOTYPE TO PRODUCTION

processing pipeline in production swap, replace and mix components github.com/explosion/spacy-llm
prompt model & transform output to structured data structured machine-facing Doc object processing pipeline prototype PROTOTYPE TO PRODUCTION

RESULTS & CASE STUDIES F-Score Speed (words/s) GPT-3.5 1 78.6
< 100 GPT-4 1 83.5 < 100 spaCy 91.6 4,000 Flair 93.1 1,000 SOTA 2023 2 94.6 1,000 SOTA 2003 3 88.8 > 20,000 1. Ashok and Lipton (2023), 2. Wang et al. (2021), 3. Florian et al. (2003) CoNLL 2003 Named Entity Recognition

RESULTS & CASE STUDIES F-Score Speed (words/s) GPT-3.5 1 78.6
< 100 GPT-4 1 83.5 < 100 spaCy 91.6 4,000 Flair 93.1 1,000 SOTA 2023 2 94.6 1,000 SOTA 2003 3 88.8 > 20,000 1. Ashok and Lipton (2023), 2. Wang et al. (2021), 3. Florian et al. (2003) SOTA on few- shot prompting RoBERTa-base CoNLL 2003 Named Entity Recognition

FabNER Claude 2 accuracy on # of examples 10 20
30 40 50 60 70 80 90 100 0 100 200 300 400 500 20 examples RESULTS & CASE STUDIES F-Score Speed (words/s) GPT-3.5 1 78.6 < 100 GPT-4 1 83.5 < 100 spaCy 91.6 4,000 Flair 93.1 1,000 SOTA 2023 2 94.6 1,000 SOTA 2003 3 88.8 > 20,000 1. Ashok and Lipton (2023), 2. Wang et al. (2021), 3. Florian et al. (2003) SOTA on few- shot prompting RoBERTa-base CoNLL 2003 Named Entity Recognition

FabNER Claude 2 accuracy on # of examples 10 20
30 40 50 60 70 80 90 100 0 100 200 300 400 500 20 examples RESULTS & CASE STUDIES says more about crowd worker methodology than LLMs we don’t need crowd workers 3 F-Score Speed (words/s) GPT-3.5 1 78.6 < 100 GPT-4 1 83.5 < 100 spaCy 91.6 4,000 Flair 93.1 1,000 SOTA 2023 2 94.6 1,000 SOTA 2003 3 88.8 > 20,000 1. Ashok and Lipton (2023), 2. Wang et al. (2021), 3. Florian et al. (2003) SOTA on few- shot prompting RoBERTa-base CoNLL 2003 Named Entity Recognition

DISTILLED TASK-SPECIFIC COMPONENTS

modular DISTILLED TASK-SPECIFIC COMPONENTS

modular no lock-in DISTILLED TASK-SPECIFIC COMPONENTS

modular testable no lock-in DISTILLED TASK-SPECIFIC COMPONENTS

modular testable no lock-in extensible DISTILLED TASK-SPECIFIC COMPONENTS

modular testable flexible no lock-in extensible DISTILLED TASK-SPECIFIC COMPONENTS

modular testable flexible no lock-in extensible cheap to run DISTILLED
TASK-SPECIFIC COMPONENTS

modular testable flexible no lock-in extensible run in-house cheap to
run DISTILLED TASK-SPECIFIC COMPONENTS

modular testable flexible no lock-in programmable extensible run in-house cheap
to run DISTILLED TASK-SPECIFIC COMPONENTS

modular testable flexible predictable no lock-in programmable extensible run in-house
cheap to run DISTILLED TASK-SPECIFIC COMPONENTS

modular testable flexible predictable transparent no lock-in programmable extensible run
in-house cheap to run DISTILLED TASK-SPECIFIC COMPONENTS

control resource regulation compounding economies of scale network effects MONOPOLY
STRATEGIES

control resource regulation compounding economies of scale network effects MONOPOLY
STRATEGIES human-facing products vs. machine-facing models

THE AI REVOLUTION WON’T BE MONOPOLIZED

THE AI REVOLUTION WON’T BE MONOPOLIZED The software industry does
not run on secret sauce. Knowledge gets shared and published. Secrets won’t give anyone a monopoly.

not run on secret sauce. Knowledge gets shared and published. Secrets won’t give anyone a monopoly. Usage data is great for improving a product, but it doesn’t generalize. Data won’t give anyone a monopoly.

not run on secret sauce. Knowledge gets shared and published. Secrets won’t give anyone a monopoly. LLMs can be one part of a product or process, and swapped for different approaches. Interoperability is the opposite of monopoly. Usage data is great for improving a product, but it doesn’t generalize. Data won’t give anyone a monopoly.

not run on secret sauce. Knowledge gets shared and published. Secrets won’t give anyone a monopoly. LLMs can be one part of a product or process, and swapped for different approaches. Interoperability is the opposite of monopoly. Usage data is great for improving a product, but it doesn’t generalize. Data won’t give anyone a monopoly. Regulation could give someone a monopoly, if we let it. It should focus on products and actions, not components.

Explosion spaCy Prodigy Twitter Mastodon Bluesky explosion.ai spacy.io prodigy.ai @_inesmontani
@[email protected] @inesmontani.bsky.social LinkedIn

The AI Revolution Will Not Be Monopolized: How ...

The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs (QCon London)

Resources

Video recording and transcript

The AI Revolution Will Not Be Monopolized

Behind the scenes

A practical guide to human-in-the-loop distillation

Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation

More Decks by Ines Montani

Other Decks in Technology

Featured

Transcript