Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data doesn't grow in tables
Search
Friedrich Lindenberg
July 16, 2014
Technology
2
270
Data doesn't grow in tables
Friedrich Lindenberg
July 16, 2014
Tweet
Share
More Decks by Friedrich Lindenberg
See All by Friedrich Lindenberg
Introducción a OCCRP Data
pudo
0
420
Getting started with OCCRP Data
pudo
0
1.6k
#nr16: Recherche-Tools
pudo
1
100
data.occrp.org
pudo
0
160
Tools for Data Journalism | MediaLab Prado DDJ Workshop
pudo
0
240
Digitial Research Tools for Investigative Reporters
pudo
0
11k
Grano: A Python tool for investigating influence
pudo
1
280
Dr. Freezefile
pudo
2
410
Intro presentation for Naivasha
pudo
1
160
Other Decks in Technology
See All in Technology
メルカリIBISの紹介
0gm
0
920
「何となくテストする」を卒業するためにプロダクトが動く仕組みを理解しよう
kawabeaver
0
450
Generative AI Japan 第一回生成AI実践研究会「AI駆動開発の現在地──ブレイクスルーの鍵を握るのはデータ領域」
shisyu_gaku
0
350
なぜテストマネージャの視点が 必要なのか? 〜 一歩先へ進むために 〜
moritamasami
0
250
Modern Linux
oracle4engineer
PRO
0
170
LLM時代のパフォーマンスチューニング:MongoDB運用で試したコンテキスト活用の工夫
ishikawa_pro
0
180
スクラムガイドに載っていないスクラムのはじめかた - チームでスクラムをはじめるときに知っておきたい勘所を集めてみました! - / How to start Scrum that is not written in the Scrum Guide 2nd
takaking22
2
270
スタートアップこそ全員で Platform Engineering スピードと持続性を両立する文化の作り方
anizozina
1
410
エンジニアが主導できる組織づくり ー 製品と事業を進化させる体制へのシフト
ueokande
1
130
Snowflake Intelligenceにはこうやって立ち向かう!クラシルが考えるAI Readyなデータ基盤と活用のためのDataOps
gappy50
0
290
品質視点から考える組織デザイン/Organizational Design from Quality
mii3king
0
220
エンジニアがデザインまで担うための AI駆動UIデザイン/フロントエンド開発実践
kitami
2
140
Featured
See All Featured
Docker and Python
trallard
46
3.6k
Designing for Performance
lara
610
69k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Writing Fast Ruby
sferik
628
62k
Done Done
chrislema
185
16k
Balancing Empowerment & Direction
lara
3
630
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
Building Better People: How to give real-time feedback that sticks.
wjessup
368
19k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
131
19k
We Have a Design System, Now What?
morganepeng
53
7.8k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
3k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Transcript
Data doesn’t grow in tables Dealing with large sets of
documents
–An investigative reporter “We're working with 40 GB of XXX
and would like to search within the documents for certain keywords (like XXX) so we can identify XXX. Ideally we should be able to tag the docs..”
Some lingo • OCR (Optical Character Recognition) • NLP (Natural
Language Processing) • NER (Named Entity Recognition) • Regular Expressions
Cases
Exhibit A
Exhibit B
Exhibit C
Exhibit D
Tools
Tables in disguise http://tabula.nerdpower.org
Docs in a cloud http://documentcloud.org
Clustering, tagging, mining http://overview.ap.org
Let them eat PDF https://github.com/CrowData
All the visuals Jigsaw
Spoken word magic http://sayit.mysociety.org/
Whats missing? Easy-to-use ElasticSearch Commercial-grade OCR Configurable pipelines
Stefan Wehrmeyer, correctiv.org, @stefanwehrmeyer ! ! ! ! ! !
! Friedrich Lindenberg, codeforafrica.org, @pudo
None
None