Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
290
2
Share
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
430
Getting started with OCCRP Data
pudo
0
1.7k
#nr16: Recherche-Tools
pudo
1
120
data.occrp.org
pudo
0
180
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
300
Dr. Freezefile
pudo
2
450
Intro presentation for Naivasha
pudo
1
180
Other Decks in Technology
See All in Technology
Proxmox超入門
devops_vtj
0
160
試されDATA SAPPORO [LT]Claude Codeで「ゆっくりデータ分析」
ishikawa_satoru
0
340
システムは「動く」だけでは 足りない - 非機能要件・分散システム・トレードオフの基礎
nwiizo
25
7.8k
2026-04-02 IBM Bobオンボーディング入門
yutanonaka
0
260
Strands Agents × Amazon Bedrock AgentCoreで パーソナルAIエージェントを作ろう
yokomachi
2
260
あるアーキテクチャ決定と その結果/architecture-decision-and-its-result
hanhan1978
2
560
AIがコードを書く時代の ジェネレーティブプログラミング
polidog
PRO
3
670
デシリアライゼーションを理解する / Inside Deserialization
tomzoh
0
230
Databricksで構築するログ検索基盤とアーキテクチャ設計
cscengineer
0
120
Autonomous Database - Dedicated 技術詳細 / adb-d_technical_detail_jp
oracle4engineer
PRO
5
13k
DIPS2.0データに基づく森林管理における無人航空機の利用状況
naokimuroki
0
180
AIエージェントを構築して感じた、AI時代のCDKとの向き合い方
smt7174
1
110
Featured
See All Featured
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
120
Embracing the Ebb and Flow
colly
88
5k
The World Runs on Bad Software
bkeepers
PRO
72
12k
Navigating Weather and Climate Data
rabernat
0
160
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
170
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
23k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
110
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.2k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Paper Plane
katiecoart
PRO
1
49k
Thoughts on Productivity
jonyablonski
76
5.1k
Paper Plane (Part 1)
katiecoart
PRO
0
6.5k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None