Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
290
2
Share
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
430
Getting started with OCCRP Data
pudo
0
1.7k
#nr16: Recherche-Tools
pudo
1
120
data.occrp.org
pudo
0
180
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
300
Dr. Freezefile
pudo
2
450
Intro presentation for Naivasha
pudo
1
180
Other Decks in Technology
See All in Technology
Webアクセシビリティは“もしも”に備える設計
tomokusaba
0
180
BIツール「Omni」の紹介 @Snowflake中部UG
sagara
0
260
2026-04-02 IBM Bobオンボーディング入門
yutanonaka
0
260
2026年度新卒技術研修 サイバーエージェントのデータベース 活用事例とパフォーマンス調査入門
cyberagentdevelopers
PRO
6
7.1k
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.3k
プロダクトを育てるように生成AIによる開発プロセスを育てよう
kakehashi
PRO
1
920
スクラムを支える内部品質の話
iij_pr
0
350
さくらのクラウドでつくるCloudNative Daysのオブザーバビリティ基盤
b1gb4by
0
140
機能・非機能の学びを一つに!Agent Skillsで月間レポート作成始めてみた / Unifying Bug & Infra Insights — Building Monthly Quality Reports with Agent Skills
bun913
5
3.9k
ログ基盤・プラグイン・ダッシュボード、全部整えた。でも最後は人だった。
makikub
5
1.3k
レガシーシステムをどう次世代に受け継ぐか
tachiiri
0
330
会社紹介資料 / Sansan Company Profile
sansan33
PRO
16
410k
Featured
See All Featured
Google's AI Overviews - The New Search
badams
0
960
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
450
Test your architecture with Archunit
thirion
1
2.2k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.2k
The Limits of Empathy - UXLibs8
cassininazir
1
290
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
How to Think Like a Performance Engineer
csswizardry
28
2.5k
Building an army of robots
kneath
306
46k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
260
Prompt Engineering for Job Search
mfonobong
0
250
Bash Introduction
62gerente
615
210k
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
1
260
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None