Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Niek Bartholomeus
October 02, 2019
Technology
0
160
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
Tweet
Share
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.5k
From idea to production with NLP, Scala and Spark
niekbartho
3
470
Going DevOps with BMC
niekbartho
0
180
Orchestration in meatspace
niekbartho
4
2k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
450
DevOps for Dinosaurs
niekbartho
12
3k
Other Decks in Technology
See All in Technology
ネットワーク保護はどう変わるのか?re:Inforce 2025最新アップデート解説
tokushun
0
160
KubeCon + CloudNativeCon Japan 2025 に行ってきた! & containerd の新機能紹介
honahuku
0
120
Flutter向けPDFビューア、pdfrxのpdfium WASM対応について
espresso3389
0
110
使いたいMCPサーバーはWeb APIをラップして自分で作る #QiitaBash
bengo4com
0
1.4k
KiCadでPad on Viaの基板作ってみた
iotengineer22
0
210
Lazy application authentication with Tailscale
bluehatbrit
0
130
「良さそう」と「とても良い」の間には 「良さそうだがホンマか」がたくさんある / 2025.07.01 LLM品質Night
smiyawaki0820
1
450
GitHub Copilot の概要
tomokusaba
1
150
事業成長の裏側:エンジニア組織と開発生産性の進化 / 20250703 Rinto Ikenoue
shift_evolve
PRO
2
6.8k
論文紹介:LLMDet (CVPR2025 Highlight)
tattaka
0
250
「Chatwork」の認証基盤の移行とログ活用によるプロダクト改善
kubell_hr
1
240
Beyond Kaniko: Navigating Unprivileged Container Image Creation
f30
0
110
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
351
20k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Build The Right Thing And Hit Your Dates
maggiecrowley
36
2.8k
4 Signs Your Business is Dying
shpigford
184
22k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.7k
How to Think Like a Performance Engineer
csswizardry
24
1.7k
Faster Mobile Websites
deanohume
307
31k
A Modern Web Designer's Workflow
chriscoyier
694
190k
RailsConf 2023
tenderlove
30
1.1k
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.5k
Into the Great Unknown - MozCon
thekraken
39
1.9k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.4k
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com