Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Niek Bartholomeus
October 02, 2019
Technology
0
160
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
Tweet
Share
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.5k
From idea to production with NLP, Scala and Spark
niekbartho
3
470
Going DevOps with BMC
niekbartho
0
190
Orchestration in meatspace
niekbartho
4
2k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
450
DevOps for Dinosaurs
niekbartho
12
3k
Other Decks in Technology
See All in Technology
Amazon Q Developerを活用したアーキテクチャのリファクタリング
k1nakayama
2
210
形式手法特論:位相空間としての並行プログラミング #kernelvm / Kernel VM Study Tokyo 18th
ytaka23
3
1.3k
Oracle Exadata Database Service on Cloud@Customer X11M (ExaDB-C@C) サービス概要
oracle4engineer
PRO
2
6.3k
Jamf Connect ZTNAとMDMで実現! 金融ベンチャーにおける「デバイストラスト」実例と軌跡 / Kyash Device Trust
rela1470
1
200
アカデミーキャンプ 2025 SuuuuuuMMeR「燃えろ!!ロボコン」 / Academy Camp 2025 SuuuuuuMMeR "Burn the Spirit, Robocon!!" DAY 1
ks91
PRO
0
140
Amazon Bedrock AgentCoreのフロントエンドを探す旅 (Next.js編)
kmiya84377
1
140
生成AIによるソフトウェア開発の収束地点 - Hack Fes 2025
vaaaaanquish
23
11k
ファッションコーディネートアプリ「WEAR」における、Vertex AI Vector Searchを利用したレコメンド機能の開発・運用で得られたノウハウの紹介
zozotech
PRO
0
270
「AIと一緒にやる」が当たり前になるまでの奮闘記
kakehashi
PRO
3
140
【CEDEC2025】『Shadowverse: Worlds Beyond』二度目のDCG開発でゲームをリデザインする~遊びやすさと競技性の両立~
cygames
PRO
1
360
LLMで構造化出力の成功率をグンと上げる方法
keisuketakiguchi
0
800
Findy Freelance 利用シーン別AI活用例
ness
0
480
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
440
A better future with KSS
kneath
239
17k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Building an army of robots
kneath
306
45k
Being A Developer After 40
akosma
90
590k
Unsuck your backbone
ammeep
671
58k
Rebuilding a faster, lazier Slack
samanthasiow
83
9.1k
Optimizing for Happiness
mojombo
379
70k
How GitHub (no longer) Works
holman
314
140k
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.6k
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com