Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
300
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
440
Getting started with OCCRP Data
pudo
0
1.7k
#nr16: Recherche-Tools
pudo
1
120
data.occrp.org
pudo
0
180
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
260
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
310
Dr. Freezefile
pudo
2
460
Intro presentation for Naivasha
pudo
1
190
Other Decks in Technology
See All in Technology
AI駆動開発が変える、大規模開発の前提 ーHuman in the Loop から Human on the Loop へ / AIE2026
visional_engineering_and_design
27
18k
「速く作る」から「正しく作る」へ ─ 生成AI時代の開発フロー改革の ロードマップと実行 ─
starfish719
0
8.7k
美味しいスイスチーズを作ろう🧀🐭
taigamikami
1
260
Platform engineering for developers, architects & the rest of us (AI agents)
danielbryantuk
0
190
Building applications in the Gemini API family.
line_developers_tw
PRO
0
2.1k
AIプラットフォームを運用し続けるための可観測性
tanimuyk
4
1.2k
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
5
1.8k
Dario Amodi『Policy on the AI Exponential』を理解する
nagatsu
0
200
ITエンジニアを取り巻く環境とキャリアパス / A career path for Japanese IT engineers
takatama
4
1.8k
AgentGatewayを試してみたかった
tkikuchi
0
120
そのPoC、何を検証したつもりでしたか? AIプロダクトの価値検証で陥った落とし穴
techtekt
PRO
0
150
Amazon Bedrock AgentCore ワークショップ JAWS UG TOHOKU / amazon-bedrock-agentcore-workshop-jawsug-tohoku-2026
gawa
9
420
Featured
See All Featured
[SF Ruby Conf 2025] Rails X
palkan
2
1.1k
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
200
WCS-LA-2024
lcolladotor
0
620
SEO for Brand Visibility & Recognition
aleyda
0
4.6k
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
200
The browser strikes back
jonoalderson
0
1.2k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
560
Darren the Foodie - Storyboard
khoart
PRO
3
3.4k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
250
1.3M
Rebuilding a faster, lazier Slack
samanthasiow
85
9.5k
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
570
Facilitating Awesome Meetings
lara
57
6.9k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None