Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Niek Bartholomeus
October 02, 2019
Technology
0
170
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
Tweet
Share
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.6k
From idea to production with NLP, Scala and Spark
niekbartho
3
480
Going DevOps with BMC
niekbartho
0
200
Orchestration in meatspace
niekbartho
4
2k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
470
DevOps for Dinosaurs
niekbartho
12
3k
Other Decks in Technology
See All in Technology
ローカルLLM基礎知識 / local LLM basics 2025
kishida
23
8.6k
ABEMAのCM配信を支えるスケーラブルな分散カウンタの実装
hono0130
4
1.1k
AI開発の定着を推進するために揃えるべき前提
suguruooki
1
330
持続可能なアクセシビリティ開発
azukiazusa1
6
330
技術広報のOKRで生み出す 開発組織への価値 〜 カンファレンス協賛を通して育む学びの文化 〜 / Creating Value for Development Organisations Through Technical Communications OKRs — Nurturing a Culture of Learning Through Conference Sponsorship —
pauli
5
550
Kubernetesと共にふりかえる! エンタープライズシステムのインフラ設計・テストの進め方大全
daitak
0
470
Android Studio Otter の最新 Gemini 機能 / Latest Gemini features in Android Studio Otter
yanzm
0
380
不確実性に備える ABEMA の信頼性設計とオブザーバビリティ基盤
nagapad
4
7.8k
AI × クラウドで シイタケの収穫時期を判定してみた
lamaglama39
1
400
2025 DORA Reportから読み解く!AIが映し出す、成果を出し続ける組織の共通点 #開発生産性_findy
takabow
0
130
Building AI Applications with Java, LLMs, and Spring AI
thomasvitale
1
240
リアーキテクティングのその先へ 〜品質と開発生産性の壁を越えるプラットフォーム戦略〜 / architecture-con2025
visional_engineering_and_design
0
7.1k
Featured
See All Featured
How STYLIGHT went responsive
nonsquared
100
5.9k
Agile that works and the tools we love
rasmusluckow
331
21k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.2k
Six Lessons from altMBA
skipperchong
29
4.1k
Thoughts on Productivity
jonyablonski
73
4.9k
The Pragmatic Product Professional
lauravandoore
36
7k
Making the Leap to Tech Lead
cromwellryan
135
9.6k
Into the Great Unknown - MozCon
thekraken
40
2.2k
Faster Mobile Websites
deanohume
310
31k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1k
Mobile First: as difficult as doing things right
swwweet
225
10k
jQuery: Nuts, Bolts and Bling
dougneiner
65
8k
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com