Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
270
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
410
Getting started with OCCRP Data
pudo
0
1.5k
#nr16: Recherche-Tools
pudo
1
98
data.occrp.org
pudo
0
150
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
240
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
280
Dr. Freezefile
pudo
2
390
Intro presentation for Naivasha
pudo
1
160
Other Decks in Technology
See All in Technology
アジャイル脅威モデリング#1(脅威モデリングナイト#8)
masakane55
3
160
10分でわかるfreeeのQA
freee
1
12k
入社後SREチームのミッションや課題の整理をした話
morix1500
1
240
.mdc駆動ナレッジマネジメント/.mdc-driven knowledge management
yodakeisuke
24
11k
YOLOv10~v12
tenten0727
3
860
バックオフィス向け toB SaaS バクラクにおけるレコメンド技術活用 / recommender-systems-in-layerx-bakuraku
yuya4
2
280
TopAppBar Composableをカスタムする
hunachi
0
170
DuckDB MCPサーバーを使ってAWSコストを分析させてみた / AWS cost analysis with DuckDB MCP server
masahirokawahara
0
590
改めて学ぶ Trait の使い方 / phpcon odawara 2025
meihei3
1
560
20250408 AI Agent workshop
sakana_ai
PRO
15
3.5k
All You Need Is Kusa 〜Slackデータで始めるデータドリブン〜
jonnojun
0
140
Vision Pro X Text to 3D Model ~How Swift and Generative Al Unlock a New Era of Spatial Computing~
igaryo0506
0
260
Featured
See All Featured
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Optimising Largest Contentful Paint
csswizardry
36
3.2k
Agile that works and the tools we love
rasmusluckow
328
21k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Side Projects
sachag
452
42k
KATA
mclloyd
29
14k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
32
5.1k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.3k
GraphQLの誤解/rethinking-graphql
sonatard
71
10k
For a Future-Friendly Web
brad_frost
176
9.7k
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None