Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
330
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
250
The troubles of modern dependency management and what to do about them
gousiosg
0
480
Mining Repositories with Apache Spark
gousiosg
0
600
My adventures with open everything
gousiosg
0
250
Structure and Evolution of Package Dependency Networks
gousiosg
0
710
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
880
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
250
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
PHPからGoへのマイグレーション for DMMアフィリエイト
yabakokobayashi
1
170
KnowledgeBaseDocuments APIでベクトルインデックス管理を自動化する
iidaxs
1
270
大幅アップデートされたRagas v0.2をキャッチアップ
os1ma
2
540
マルチプロダクト開発の現場でAWS Security Hubを1年以上運用して得た教訓
muziyoshiz
3
2.4k
継続的にアウトカムを生み出し ビジネスにつなげる、 戦略と運営に対するタイミーのQUEST(探求)
zigorou
0
590
NW-JAWS #14 re:Invent 2024(予選落ち含)で 発表された推しアップデートについて
nagisa53
0
270
kargoの魅力について伝える
magisystem0408
0
210
UI State設計とテスト方針
rmakiyama
2
620
20241220_S3 tablesの使い方を検証してみた
handy
4
610
オプトインカメラ:UWB測位を応用したオプトイン型のカメラ計測
matthewlujp
0
180
WACATE2024冬セッション資料(ユーザビリティ)
scarletplover
0
210
プロダクト開発を加速させるためのQA文化の築き方 / How to build QA culture to accelerate product development
mii3king
1
270
Featured
See All Featured
Optimizing for Happiness
mojombo
376
70k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
2
170
VelocityConf: Rendering Performance Case Studies
addyosmani
326
24k
GitHub's CSS Performance
jonrohan
1030
460k
Producing Creativity
orderedlist
PRO
341
39k
Fashionably flexible responsive web design (full day workshop)
malarkey
405
66k
Imperfection Machines: The Place of Print at Facebook
scottboms
266
13k
Git: the NoSQL Database
bkeepers
PRO
427
64k
A Philosophy of Restraint
colly
203
16k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
232
17k
A better future with KSS
kneath
238
17k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis