Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
410
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
320
The troubles of modern dependency management and what to do about them
gousiosg
0
600
Mining Repositories with Apache Spark
gousiosg
0
690
My adventures with open everything
gousiosg
0
330
Structure and Evolution of Package Dependency Networks
gousiosg
0
830
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
950
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
320
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
2.9k
VRTと真面目に向き合う
hiragram
1
200
Introduction to Bill One Development Engineer
sansan33
PRO
0
350
Git Training GitHub
yuhattor
1
220
GitHub Copilot CLI 現状確認会議
torumakabe
10
3.3k
Riverpod3.xで実現する実践的UI実装
fumiyasac0921
1
130
エンジニアとして長く走るために気づいた2つのこと_大賀愛一郎
nanaism
0
210
Eight Engineering Unit 紹介資料
sansan33
PRO
0
6.3k
さくらのクラウドでのシークレット管理を考える/tamachi.sre#2
fujiwara3
1
210
たかがボタン、されどボタン ~button要素から深ぼるボタンUIの定義について~ / BuriKaigi 2026
yamanoku
1
290
AWS Network Firewall Proxyで脱Squid運用⁈
nnydtmg
1
140
BiDiってなんだ?
tomorrowkey
1
200
Featured
See All Featured
brightonSEO & MeasureFest 2025 - Christian Goodrich - Winning strategies for Black Friday CRO & PPC
cargoodrich
2
83
Art, The Web, and Tiny UX
lynnandtonic
304
21k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.2k
Context Engineering - Making Every Token Count
addyosmani
9
620
Mobile First: as difficult as doing things right
swwweet
225
10k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1.1k
The Curious Case for Waylosing
cassininazir
0
220
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
1
1.4k
ラッコキーワード サービス紹介資料
rakko
1
2.1M
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.7k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis