Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
280
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
420
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
110
data.occrp.org
pudo
0
170
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
250
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
290
Dr. Freezefile
pudo
2
430
Intro presentation for Naivasha
pudo
1
170
Other Decks in Technology
See All in Technology
Digitization部 紹介資料
sansan33
PRO
1
6.5k
新米スクラムマスターの4ヶ月 -「スクラムイベントを回しているのに手応えがない」からの脱出 / Four Months as a New Scrum Master — When Scrum Events Were Running, but Nothing Felt Right
owata
0
140
たかがボタン、されどボタン ~button要素から深ぼるボタンUIの定義について~ / BuriKaigi 2026
yamanoku
1
270
2025年 山梨の技術コミュニティを振り返る
yuukis
0
160
Introduction to Sansan Meishi Maker Development Engineer
sansan33
PRO
0
330
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
2.9k
次世代AIコーディング:OpenAI Codex の最新動向 進行スライド/nikkei-tech-talk-40
nikkei_engineer_recruiting
0
160
SwiftDataを覗き見る
akidon0000
0
270
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
4
22k
複雑さを受け入れるか、拒むか? - 事業成長とともに育ったモノリスを前に私が考えたこと #RSGT2026
murabayashi
1
2k
自己管理型チームと個人のセルフマネジメント 〜モチベーション編〜
kakehashi
PRO
5
3k
SES向け、生成AI時代におけるエンジニアリングとセキュリティ
longbowxxx
0
320
Featured
See All Featured
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
69
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Tell your own story through comics
letsgokoyo
1
780
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Ethics towards AI in product and experience design
skipperchong
1
170
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.1k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
130
Google's AI Overviews - The New Search
badams
0
890
Information Architects: The Missing Link in Design Systems
soysaucechin
0
740
[SF Ruby Conf 2025] Rails X
palkan
0
710
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
880
Learning to Love Humans: Emotional Interface Design
aarron
274
41k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None