Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
270
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
420
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
100
data.occrp.org
pudo
0
160
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
240
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
290
Dr. Freezefile
pudo
2
420
Intro presentation for Naivasha
pudo
1
160
Other Decks in Technology
See All in Technology
マルチエージェントのチームビルディング_2025-10-25
shinoyamada
0
230
abema-trace-sampling-observability-cost-optimization
tetsuya28
0
380
dbtとAIエージェントを組み合わせて見えたデータ調査の新しい形
10xinc
7
1.5k
Oracle Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
0
390
ソースを読む時の思考プロセスの例-MkDocs
sat
PRO
1
340
Observability — Extending Into Incident Response
nari_ex
1
590
【SORACOM UG Explorer 2025】さらなる10年へ ~ SORACOM MVC 発表
soracom
PRO
0
180
ViteとTypeScriptのProject Referencesで 大規模モノレポのUIカタログのリリースサイクルを高速化する
shuta13
3
230
AIを使ってテストを楽にする
kworkdev
PRO
0
320
Amazon Q Developer CLIをClaude Codeから使うためのベストプラクティスを考えてみた
dar_kuma_san
0
160
アウトプットから始めるOSSコントリビューション 〜eslint-plugin-vueの場合〜 #vuefes
bengo4com
3
1.9k
251029 JAWS-UG AI/ML 退屈なことはQDevにやらせよう
otakensh
0
110
Featured
See All Featured
Understanding Cognitive Biases in Performance Measurement
bluesmoon
31
2.7k
Balancing Empowerment & Direction
lara
5
700
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Thoughts on Productivity
jonyablonski
71
4.9k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Embracing the Ebb and Flow
colly
88
4.9k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Done Done
chrislema
185
16k
Intergalactic Javascript Robots from Outer Space
tanoku
272
27k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
Docker and Python
trallard
46
3.6k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None