Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Niek Bartholomeus
October 02, 2019
Technology
0
170
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
Tweet
Share
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.5k
From idea to production with NLP, Scala and Spark
niekbartho
3
470
Going DevOps with BMC
niekbartho
0
190
Orchestration in meatspace
niekbartho
4
2k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
460
DevOps for Dinosaurs
niekbartho
12
3k
Other Decks in Technology
See All in Technology
ハードウェアとソフトウェアをつなぐ全てを内製している企業の E2E テストの作り方 / How to create E2E tests for a company that builds everything connecting hardware and software in-house
bitkey
PRO
1
130
未経験者・初心者に贈る!40分でわかるAndroidアプリ開発の今と大事なポイント
operando
5
590
Automating Web Accessibility Testing with AI Agents
maminami373
0
1.3k
AIのグローバルトレンド2025 #scrummikawa / global ai trend
kyonmm
PRO
1
280
ZOZOマッチのアーキテクチャと技術構成
zozotech
PRO
4
1.5k
今!ソフトウェアエンジニアがハードウェアに手を出すには
mackee
12
4.8k
【初心者向け】ローカルLLMの色々な動かし方まとめ
aratako
7
3.5k
roppongirb_20250911
igaiga
1
230
Evolución del razonamiento matemático de GPT-4.1 a GPT-5 - Data Aventura Summit 2025 & VSCode DevDays
lauchacarro
0
190
COVESA VSSによる車両データモデルの標準化とAWS IoT FleetWiseの活用
osawa
1
280
複数サービスを支えるマルチテナント型Batch MLプラットフォーム
lycorptech_jp
PRO
0
370
エラーとアクセシビリティ
schktjm
1
1.3k
Featured
See All Featured
Being A Developer After 40
akosma
90
590k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Rails Girls Zürich Keynote
gr2m
95
14k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
How to Ace a Technical Interview
jacobian
279
23k
Fireside Chat
paigeccino
39
3.6k
How GitHub (no longer) Works
holman
315
140k
The Straight Up "How To Draw Better" Workshop
denniskardys
236
140k
Making Projects Easy
brettharned
117
6.4k
Docker and Python
trallard
45
3.6k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
920
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com