Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
270
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
410
Getting started with OCCRP Data
pudo
0
1.5k
#nr16: Recherche-Tools
pudo
1
98
data.occrp.org
pudo
0
150
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
240
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
280
Dr. Freezefile
pudo
2
400
Intro presentation for Naivasha
pudo
1
160
Other Decks in Technology
See All in Technology
Redmineの意外と知らない便利機能 (Redmine 6.0対応版)
vividtone
0
1.2k
OTel meets Wasm: プラグイン機構としてのWebAssemblyから見る次世代のObservability
lycorptech_jp
PRO
1
300
ゴリラ.vim #36 ~ Vim x SNS ~ スポンサーセッション
yasunori0418
1
360
libsyncrpcってなに?
uhyo
0
160
うちの会社の評判は?SNSの投稿分析にAIを使ってみた
doumae
0
290
Houtou.pm #1
papix
0
670
プラットフォームとしての Datadog / Datadog as Platforms
aoto
PRO
1
340
SmartHRの複数のチームにおけるMCPサーバーの活用事例と課題
yukisnow1823
2
1.2k
Swiftは最高だよの話
yuukiw00w
2
290
カンファレンスのつくりかた / The Conference Code: What Makes It All Work
tomzoh
8
930
AIのための オンボーディングドキュメントを整備する - hirotea
hirotea
9
2.3k
エンジニアが組織に馴染むために勉強会を主催してチームの壁を越える
ohmori_yusuke
2
120
Featured
See All Featured
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
750
The World Runs on Bad Software
bkeepers
PRO
68
11k
Reflections from 52 weeks, 52 projects
jeffersonlam
349
20k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.7k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
15
890
Faster Mobile Websites
deanohume
307
31k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
21k
Fireside Chat
paigeccino
37
3.5k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
123
52k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
32
5.8k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None