Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
270
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
410
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
100
data.occrp.org
pudo
0
160
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
240
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
280
Dr. Freezefile
pudo
2
400
Intro presentation for Naivasha
pudo
1
160
Other Decks in Technology
See All in Technology
Goss: New Production-Ready Go Binding for Faiss #coefl_go_jp
bengo4com
0
1.1k
実践アプリケーション設計 ②トランザクションスクリプトへの対応
recruitengineers
PRO
2
140
[CVPR2025論文読み会] Linguistics-aware Masked Image Modelingfor Self-supervised Scene Text Recognition
s_aiueo32
0
210
Evolution on AI Agent and Beyond - AGI への道のりと、シンギュラリティの3つのシナリオ
masayamoriofficial
0
170
株式会社ARAV 採用案内
maqui
0
340
Yahoo!広告ビジネス基盤におけるバックエンド開発
lycorptech_jp
PRO
1
270
Amazon Bedrock AgentCore でプロモーション用動画生成エージェントを開発する
nasuvitz
6
420
人と組織に偏重したEMへのアンチテーゼ──なぜ、EMに設計力が必要なのか/An antithesis to the overemphasis of people and organizations in EM
dskst
5
600
Webアクセシビリティ入門
recruitengineers
PRO
1
220
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
kzykmyzw
0
310
AIが住民向けコンシェルジュに?Amazon Connectと生成AIで実現する自治体AIエージェント!
yuyeah
0
260
事業価値と Engineering
recruitengineers
PRO
1
190
Featured
See All Featured
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
A designer walks into a library…
pauljervisheath
207
24k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.6k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
480
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
The Art of Programming - Codeland 2020
erikaheidi
55
13k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Reflections from 52 weeks, 52 projects
jeffersonlam
351
21k
GraphQLとの向き合い方2022年版
quramy
49
14k
Rails Girls Zürich Keynote
gr2m
95
14k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None