$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
280
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
420
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
110
data.occrp.org
pudo
0
170
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
290
Dr. Freezefile
pudo
2
430
Intro presentation for Naivasha
pudo
1
170
Other Decks in Technology
See All in Technology
AWS運用を効率化する!AWS Organizationsを軸にした一元管理の実践/nikkei-tech-talk-202512
nikkei_engineer_recruiting
0
160
SREが取り組むデプロイ高速化 ─ Docker Buildを最適化した話
capytan
0
110
ペアーズにおけるAIエージェント 基盤とText to SQLツールの紹介
hisamouna
2
1.4k
202512_AIoT.pdf
iotcomjpadmin
0
130
[Data & AI Summit '25 Fall] AIでデータ活用を進化させる!Google Cloudで作るデータ活用の未来
kirimaru
0
370
100以上の新規コネクタ提供を可能にしたアーキテクチャ
ooyukioo
0
230
事業の財務責任に向き合うリクルートデータプラットフォームのFinOps
recruitengineers
PRO
2
180
2025-12-18_AI駆動開発推進プロジェクト運営について / AIDD-Promotion project management
yayoi_dd
0
150
さくらのクラウド開発ふりかえり2025
kazeburo
2
290
AI with TiDD
shiraji
1
240
mairuでつくるクレデンシャルレス開発環境 / Credential-less development environment using Mailru
mirakui
5
590
Bedrock AgentCore Evaluationsで学ぶLLM as a judge入門
shichijoyuhi
1
150
Featured
See All Featured
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
0
22
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
200
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
Git: the NoSQL Database
bkeepers
PRO
432
66k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.8k
Making Projects Easy
brettharned
120
6.5k
[SF Ruby Conf 2025] Rails X
palkan
0
550
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
69
Leadership Guide Workshop - DevTernity 2021
reverentgeek
0
160
Leo the Paperboy
mayatellez
0
1.2k
First, design no harm
axbom
PRO
1
1.1k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None