Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Niek Bartholomeus
October 02, 2019
Technology
0
170
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
Tweet
Share
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.5k
From idea to production with NLP, Scala and Spark
niekbartho
3
470
Going DevOps with BMC
niekbartho
0
200
Orchestration in meatspace
niekbartho
4
2k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
460
DevOps for Dinosaurs
niekbartho
12
3k
Other Decks in Technology
See All in Technology
BirdCLEF+2025 Noir 5位解法紹介
myso
0
200
AIAgentの限界を超え、 現場を動かすWorkflowAgentの設計と実践
miyatakoji
0
140
How to achieve interoperable digital identity across Asian countries
fujie
0
120
pprof vs runtime/trace (FlightRecorder)
task4233
0
170
コンテキストエンジニアリングとは? 考え方と応用方法
findy_eventslides
4
900
定期的な価値提供だけじゃない、スクラムが導くチームの共創化 / 20251004 Naoki Takahashi
shift_evolve
PRO
3
310
SOC2取得の全体像
shonansurvivors
1
390
extension 現場で使えるXcodeショートカット一覧
ktombow
0
210
生成AIとM5Stack / M5 Japan Tour 2025 Autumn 東京
you
PRO
0
220
[2025-09-30] Databricks Genie を利用した分析基盤とデータモデリングの IVRy の現在地
wxyzzz
0
480
KMP の Swift export
kokihirokawa
0
330
Optuna DashboardにおけるPLaMo2連携機能の紹介 / PFN LLM セミナー
pfn
PRO
1
880
Featured
See All Featured
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.6k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Gamification - CAS2011
davidbonilla
81
5.5k
Code Review Best Practice
trishagee
72
19k
We Have a Design System, Now What?
morganepeng
53
7.8k
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.9k
Optimizing for Happiness
mojombo
379
70k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Faster Mobile Websites
deanohume
310
31k
Statistics for Hackers
jakevdp
799
220k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Facilitating Awesome Meetings
lara
56
6.6k
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com