Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
openthebox.be - smart publications
Search
Niek Bartholomeus
October 02, 2019
Technology
190
0
Share
openthebox.be - smart publications
Extracting deep insights from boring documents: a real-life story
Niek Bartholomeus
October 02, 2019
More Decks by Niek Bartholomeus
See All by Niek Bartholomeus
openthebox.be
niekbartho
1
2.6k
From idea to production with NLP, Scala and Spark
niekbartho
3
510
Going DevOps with BMC
niekbartho
0
210
Orchestration in meatspace
niekbartho
4
2k
Self-organization vs. global optimization - a comparison between traditional and modern organizations
niekbartho
2
500
DevOps for Dinosaurs
niekbartho
12
3k
Other Decks in Technology
See All in Technology
プラットフォームエンジニアリングの実践 - AWS コンテナサービスで構築する社内プラットフォーム / AWS Containers Platform Meetup #1
literalice
1
210
Arcana: Production-Ready RAG in Elixir @ ElixirConf EU 2026
georgeguimaraes
0
120
Standards et agents IA : un tour d’horizon de MCP, A2A, ADK et plus encore
glaforge
0
210
AzureのIaC管理からログ調査まで、随所に役立つSkillsとCustom-Instructions / Boosting IaC and Log Analysis with Skills
aeonpeople
0
280
"おまじない"を卒業する ボイラープレート再入門
shunsuke_1b
1
110
Class.new is all you need
riseshia
1
190
Pure Intonation on Browser: Building a Sequencer with Ruby
nagachika
0
170
MLOps導入のための組織作りの第一歩
akasan
0
390
「責任あるAIエージェント」こそ自社で開発しよう!
minorun365
10
2.3k
AI バイブコーティングでキーボード不要?!
samakada
0
640
Oracle Cloud Infrastructure:2026年4月度サービス・アップデート
oracle4engineer
PRO
0
120
Percolatorを廃止し、マルチ検索サービスへ刷新した話 / Search Engineering Tech Talk 2026 Spring
visional_engineering_and_design
0
170
Featured
See All Featured
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1k
Balancing Empowerment & Direction
lara
6
1.1k
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.5k
Claude Code のすすめ
schroneko
67
220k
GitHub's CSS Performance
jonrohan
1032
470k
Designing for Timeless Needs
cassininazir
0
210
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.4k
RailsConf 2023
tenderlove
30
1.4k
The SEO Collaboration Effect
kristinabergwall1
1
430
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
62
53k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Accessibility Awareness
sabderemane
1
100
Transcript
openthebox.be Extracting deep insights from 'boring' documents: a real-life story
Me Niek Bartholomeus @niekbartho • Background as a software developer
• Switched to data science and natural language processing in 2016 • Founded openthebox.be in 2017
openthebox.be
openthebox.be Open data KBO NBB Belgian Official Gazette http://kbopub.economie.fgov.be/kbopub https://cri.nbb.be/bc9/web/catalog
http://www.ejustice.just.fgov.be/ tsv/tsvn.htm
knowledge graph Visualization Analytics Machine learning Knowledge graph Structured data
Unstructured data KBO NBB Belgian Official Gazette
Unstructured data - pipeline
Unstructured data - pipeline steps 1] OCR 2] NER 4]
Entity linking 3] Relation extraction
Unstructured data - pipeline steps 1] OCR
Unstructured data - pipeline steps 2] NER
Unstructured data - pipeline steps 2] NER Pre-processing rules: [“1.Jan”,
“Janssens”] 1.Jan Janssens [“Marktstraat”, “54,8450”, “Bredene”] Marktstraat 54,8450 Bredene
Unstructured data - pipeline steps 2] NER Post-processing rules: +
= General rules Legal rules Historic probabilities Faulty publication Context Improved publication
Unstructured data - pipeline steps 2] NER Organization Person Inheritance:
Notary Owner Representative Proxy holder Administrator Author : “is a” relationship Base labels Subclass labels
Unstructured data - pipeline steps 2] NER Gentstraat 69 Niek
Roger Camiel Bartholomeus Sub entity extraction: First name: Niek Middle names: Roger, Camiel Last name: Bartholomeus 9170 Sint-Pauwels Street: Gentstraat Number: 69 Zip code: 9170 City: Sint-Pauwels
Unstructured data - pipeline steps 3] Relation extraction
Unstructured data - pipeline steps 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Niek
Bartholomeus N. Bartholomeus Bartholomeus } Niek Roger Camiel Bartholomeus Deduplication: 4] Entity linking
Unstructured data - pipeline steps Niek Roger Camiel Bartholomeus Link
with knowledge graph: Gentstraat 69 9170 Sint-Pauwels 4] Entity linking
openthebox.be
openthebox.be Bigger picture
openthebox.be http://wpmlabs.com/ Academia Industry https://www.filter-concept.com/ +
openthebox.be https://opensenselabs.com