Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
280
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
420
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
110
data.occrp.org
pudo
0
170
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
290
Dr. Freezefile
pudo
2
440
Intro presentation for Naivasha
pudo
1
170
Other Decks in Technology
See All in Technology
Frontier Agents (Kiro autonomous agent / AWS Security Agent / AWS DevOps Agent) の紹介
msysh
3
150
今日から始めるAmazon Bedrock AgentCore
har1101
4
400
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
3.8k
GitLab Duo Agent Platform × AGENTS.md で実現するSpec-Driven Development / GitLab Duo Agent Platform × AGENTS.md
n11sh1
0
130
Claude_CodeでSEOを最適化する_AI_Ops_Community_Vol.2__マーケティングx_AIはここまで進化した.pdf
riku_423
2
510
CDKで始めるTypeScript開発のススメ
tsukuboshi
1
340
顧客の言葉を、そのまま信じない勇気
yamatai1212
1
340
ファインディの横断SREがTakumi byGMOと取り組む、セキュリティと開発スピードの両立
rvirus0817
1
1.2k
Ruby版 JSXのRuxが気になる
sansantech
PRO
0
130
予期せぬコストの急増を障害のように扱う――「コスト版ポストモーテム」の導入とその後の改善
muziyoshiz
1
1.7k
データ民主化のための LLM 活用状況と課題紹介(IVRy の場合)
wxyzzz
2
690
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
10k
Featured
See All Featured
We Have a Design System, Now What?
morganepeng
54
8k
Into the Great Unknown - MozCon
thekraken
40
2.2k
GraphQLとの向き合い方2022年版
quramy
50
14k
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.4k
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
61
52k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.6k
Odyssey Design
rkendrick25
PRO
1
490
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
10
1.1k
Are puppies a ranking factor?
jonoalderson
1
2.7k
Producing Creativity
orderedlist
PRO
348
40k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None