Applied NLP in the Age of Generative AI: Future-Proof Strategies for Banking and Finance

Ines Montani Explosion Future-Proof Strategies for Banking & Finance

Open-source library for industrial- strength natural language processing spacy.io 390m+
downloads

Generative AI can write spaCy code! Open-source library for industrial-
strength natural language processing spacy.io 390m+ downloads

Modern scriptable annotation tool for machine learning developers prodigy.ai 900+
companies 10k+ users

Modern scriptable annotation tool for machine learning developers prodigy.ai Alex
Smith Developer Kim Miller Analyst GPT-4 API 900+ companies 10k+ users

MIXTRAL GPT-4 LLM

MIXTRAL GPT-4 good contextual results LLM

MIXTRAL GPT-4 good contextual results easy to use & configure
LLM

MIXTRAL GPT-4 good contextual results easy to use & configure
fast prototyping LLM

MIXTRAL GPT-4 good contextual results ⚠ transparency easy to use
& configure fast prototyping LLM

MIXTRAL GPT-4 good contextual results ⚠ transparency ⚠ e iciency
easy to use & configure fast prototyping LLM

MIXTRAL GPT-4 good contextual results ⚠ data privacy ⚠ transparency
⚠ e iciency easy to use & configure fast prototyping LLM

Language is just another interface.

“knocker-uppers”

“knocker-uppers” ines.io/blog/window-knocking-machine-test

Are you designing a window-knocking machine or an alarm clock?
“knocker-uppers” ines.io/blog/window-knocking-machine-test

Hello, I ’ m Toni ’ s virtual assistant and
I help schedule appointments. Do you have time at 1pm on Monday? No, but Tuesday would work for me. Okay, please confirm: Tuesday at 1pm? 1pm is unideal but 3pm would work. Toni doesn ’ t have availability at 3pm but I could offer a slot at 4pm or 5 : 30pm. Which time zone is this by the way? I ’ m in CET. ines.io/blog/window-knocking-machine-test

I help schedule appointments. Do you have time at 1pm on Monday? No, but Tuesday would work for me. Okay, please confirm: Tuesday at 1pm? 1pm is unideal but 3pm would work. Toni doesn ’ t have availability at 3pm but I could offer a slot at 4pm or 5 : 30pm. Which time zone is this by the way? I ’ m in CET. Calendly ines.io/blog/window-knocking-machine-test

I help schedule appointments. Do you have time at 1pm on Monday? No, but Tuesday would work for me. Okay, please confirm: Tuesday at 1pm? 1pm is unideal but 3pm would work. Toni doesn ’ t have availability at 3pm but I could offer a slot at 4pm or 5 : 30pm. Which time zone is this by the way? I ’ m in CET. Calendly “window-knocking machine” “alarm clock” ines.io/blog/window-knocking-machine-test

ines.io/blog/window-knocking-machine-test What ’ s the total services revenue from 2023?
$2,923,531 How many clients is that in total? 29 ⏺ ⏺ ⏺

ines.io/blog/window-knocking-machine-test What ’ s the total services revenue from 2023?
$2,923,531 How many clients is that in total? 29 ⏺ ⏺ ⏺ 🔮 LLM 📚 database 🤖 agents ⚙ query Retrieval-Augmented Generation

ines.io/blog/window-knocking-machine-test 2023 Year Services Type ACME Inc. FooBar GmbH NLPCorp
XKCD Ltd. Python AG 432,032 82,000 1,500 193,000 91,320 $ 2,625,032 Clients (28) Revenue What ’ s the total services revenue from 2023? $2,923,531 How many clients is that in total? 29 ⏺ ⏺ ⏺ 🔮 LLM 📚 database 🤖 agents ⚙ query Retrieval-Augmented Generation

ines.io/blog/window-knocking-machine-test 2023 Year Services Type ACME Inc. FooBar GmbH NLPCorp
XKCD Ltd. Python AG 432,032 82,000 1,500 193,000 91,320 $ 2,625,032 Clients (28) Revenue AI still needs product decisions! What ’ s the total services revenue from 2023? $2,923,531 How many clients is that in total? 29 ⏺ ⏺ ⏺ 🔮 LLM 📚 database 🤖 agents ⚙ query Retrieval-Augmented Generation

explosion.ai/blog/history-web-future-ai static pages WEB

dynamic pages explosion.ai/blog/history-web-future-ai static pages WEB

dynamic pages static pages explosion.ai/blog/history-web-future-ai static pages WEB

dynamic pages static pages compile static data at build time
explosion.ai/blog/history-web-future-ai static pages WEB

dynamic pages static pages compile static data at build time
custom models AI explosion.ai/blog/history-web-future-ai static pages WEB

dynamic pages static pages pretrained models compile static data at
build time custom models AI explosion.ai/blog/history-web-future-ai static pages WEB

dynamic pages static pages custom models pretrained models compile static
data at build time custom models AI explosion.ai/blog/history-web-future-ai static pages WEB

dynamic pages static pages custom models distill models into smaller,
faster and private components pretrained models compile static data at build time custom models AI explosion.ai/blog/history-web-future-ai static pages WEB

explosion.ai/blog/sp-global-commodities 99% F-score 6mb model size 16k+ words/second • S&P
Global: real-time commodities trading insights by extracting structured attributes

Global: real-time commodities trading insights by extracting structured attributes • high-security environment

Global: real-time commodities trading insights by extracting structured attributes • high-security environment • 10× data development speedup with humans and LLM in the loop

Global: real-time commodities trading insights by extracting structured attributes • high-security environment • 10× data development speedup with humans and LLM in the loop • 8+ market pipelines in production

explosion.ai/blog/sp-global-commodities “heards”: trading activities

explosion.ai/blog/sp-global-commodities ⚠ data-private “heards”: trading activities

explosion.ai/blog/sp-global-commodities ⚠ data-private ⚠ real-time “heards”: trading activities

📖 text 🏷 entities explosion.ai/blog/sp-global-commodities ⚠ data-private ⚠ real-time “heards”:
trading activities

📖 text 🏷 entities explosion.ai/blog/sp-global-commodities ⚠ data-private ⚠ real-time rules
“heards”: trading activities

📖 text 🏷 entities explosion.ai/blog/sp-global-commodities ⚠ data-private ⚠ real-time rules
model + “heards”: trading activities

explosion.ai/blog/sp-global-commodities

explosion.ai/blog/sp-global-commodities 🧑💻 human experts in the loop

explosion.ai/blog/sp-global-commodities 🧑💻 human experts in the loop 📚 task-specific data
➕

explosion.ai/blog/sp-global-commodities 🧑💻 human experts in the loop 📦 model package
📚 task-specific data ➕

explosion.ai/blog/sp-global-commodities 🧑💻 human experts in the loop 🚀 structured data
📦 model package 📚 task-specific data ➕

explosion.ai/blog/sp-global-commodities 🧑💻 human experts in the loop 🚀 structured data
📦 model package 📚 task-specific data ➕ 🔮 suggestions from LLM

explosion.ai/blog/human-in-the-loop-distillation LLM

explosion.ai/blog/human-in-the-loop-distillation continuous evaluation baseline LLM

explosion.ai/blog/human-in-the-loop-distillation continuous evaluation baseline LLM prompting

explosion.ai/blog/human-in-the-loop-distillation continuous evaluation baseline LLM prompting transfer learning COM PONENT

distilled model

distilled model deploy 🚀

explosion.ai/blog/sp-global-commodities

explosion.ai/blog/sp-global-commodities 30min per attribute GPT-4 API

explosion.ai/blog/sp-global-commodities 10× faster reduce cognitive load 30min per attribute GPT-4
API

🧑💻 developer tooling

🧑💻 developer tooling 🧡 open source

🧑💻 developer tooling 🧡 open source 🤖 development support

🧑💻 developer tooling 🧡 open source 👩🔬 subject matter experts
🤖 development support

🔄 iteration 🤖 development support

🔄 iteration 🤖 development support 🛠 refactoring

🔄 iteration 🤖 development support 🛠 refactoring 🏢 in-house development

🔄 iteration 🤖 development support 🛠 refactoring 🧠 mindset 🏢 in-house development

explosion.ai/blog/pdfs-nlp-structured-data

explosion.ai/blog/pdfs-nlp-structured-data Businesses want electronic copies that map 1:1 to paper
documents.

⚠ PDFs are a bad “source of truth” explosion.ai/blog/pdfs-nlp-structured-data Businesses
want electronic copies that map 1:1 to paper documents.

github.com/explosion/spacy-layout

github.com/explosion/spacy-layout spaCy + Docling

github.com/explosion/spacy-layout process and create a spaCy Doc spaCy + Docling

github.com/explosion/spacy-layout text-based contents process and create a spaCy Doc spaCy
+ Docling

github.com/explosion/spacy-layout text-based contents document layout process and create a spaCy
Doc spaCy + Docling

github.com/explosion/spacy-layout text-based contents layout sections document layout process and create
a spaCy Doc spaCy + Docling

github.com/explosion/spacy-layout text-based contents layout sections document layout content, tokens, o
sets process and create a spaCy Doc spaCy + Docling

github.com/explosion/spacy-layout text-based contents layout sections section type document layout content,
tokens, o sets process and create a spaCy Doc spaCy + Docling

github.com/explosion/spacy-layout text-based contents layout sections section type document layout bounding
box content, tokens, o sets process and create a spaCy Doc spaCy + Docling

github.com/explosion/spacy-layout text-based contents layout sections section type document layout bounding
box content, tokens, o sets process and create a spaCy Doc spaCy + Docling annotation in context

At their core, many NLP systems consist of flat classifications.
You can shove them into a single prompt, or you can decompose them into smaller pieces. Many classification tasks are straightforward to solve nowadays – but they become vastly more complicated if one model needs to do them all at once. explosion.ai/blog/human-in-the-loop-distillation

Reason and refactor. The key to success lies in your
data and may surprise you!

data and may surprise you! Think beyond chat bots. You don’t want to build a “window-knocking machine”.

data and may surprise you! LLM Stay ambitious. Don’t compromise on best practices, e iciency and privacy. Think beyond chat bots. You don’t want to build a “window-knocking machine”.

Explosion spaCy Prodigy Bluesky Mastodon explosion.ai spacy.io prodigy.ai @inesmontani.bsky.social @[email protected]
LinkedIn

Applied NLP in the Age of Generative AI: Future...

Applied NLP in the Age of Generative AI: Future-Proof Strategies for Banking and Finance

Resources

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

The Window-Knocking Machine Test

What the history of the web can teach us about the future of AI

A practical guide to human-in-the-loop distillation

From PDFs to AI-ready structured data: a deep dive

Using LLMs for human-in-the-loop distillation in Prodigy

More Decks by Ines Montani

Other Decks in Technology

Featured

Transcript