Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
300
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
440
Getting started with OCCRP Data
pudo
0
1.7k
#nr16: Recherche-Tools
pudo
1
120
data.occrp.org
pudo
0
180
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
260
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
310
Dr. Freezefile
pudo
2
460
Intro presentation for Naivasha
pudo
1
190
Other Decks in Technology
See All in Technology
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.5k
タクシーアプリ『GO』の実践的データ活用
mot_techtalk
3
170
新規ゲーム開発におけるAI駆動開発のリアル
202409e2
0
2.9k
Dynamic Workersについて
yusukebe
2
630
サイバーセキュリティ概論 / Introduction to Cybersecurity
ks91
PRO
0
170
作って終わりにしない タイミーのセマンティックレイヤー育成の現在地
chanyou0311
0
320
「嘘をつくテスト」の失敗例から学ぶ 良いテストコード #frontend_phpcon_do
asumikam
0
560
AI-DLCを活用した高品質・安全なAI駆動開発実践 / AI Driven Development
yoshidashingo
1
380
Rubyで音を視る
ydah
1
110
ルールやカスタム機能、どう使う?理想の出力を引き出すために今知りたいIBM Bob 5つの機能
muehara
1
360
チームで実践する AI-DLC 思考の軌跡を残すチェックポイント設計
belongadmin
0
2.9k
もりもり新機能を一挙紹介! AgentCoreに入門して、AWS上にAIエージェントを構築しよう
minorun365
PRO
6
850
Featured
See All Featured
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8.2k
It's Worth the Effort
3n
188
29k
How to Think Like a Performance Engineer
csswizardry
28
2.6k
Designing for humans not robots
tammielis
254
26k
WCS-LA-2024
lcolladotor
0
620
Docker and Python
trallard
47
3.9k
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
A Tale of Four Properties
chriscoyier
163
24k
The Cult of Friendly URLs
andyhume
79
6.9k
First, design no harm
axbom
PRO
2
1.2k
Context Engineering - Making Every Token Count
addyosmani
9
950
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
200
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None