Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
430
0
Share
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
340
The troubles of modern dependency management and what to do about them
gousiosg
0
670
Mining Repositories with Apache Spark
gousiosg
0
710
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
890
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
"SQLは書けません"から始まる データドリブン
kubell_hr
2
450
Introduction to Bill One Development Engineer
sansan33
PRO
0
410
🀄️ on swiftc
giginet
PRO
0
380
CDK Insightsで見る、AIによるCDKコード静的解析(+AI解析)
k_adachi_01
2
170
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
74k
Code Interpreter で、AIに安全に コードを書かせる。
yokomachi
0
6.4k
終盤で崩壊させないAI駆動開発
j5ik2o
2
2.2k
JEDAI in Osaka 2026イントロ
taka_aki
0
230
Azure Static Web Apps の自動ビルドがタイムアウトしやすくなった状況に対応した件/global-azure2026
thara0402
0
320
2026年に相応しい 最先端プラグインホストの設計<del>と実装</del>
atsushieno
0
120
ある製造業の会社全体のAI化に1エンジニアが挑んだ話
kitami
2
990
Eight Engineering Unit 紹介資料
sansan33
PRO
3
7.2k
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
190
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2.1k
Exploring anti-patterns in Rails
aemeredith
3
320
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4k
Mind Mapping
helmedeiros
PRO
1
150
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.3k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
130
KATA
mclloyd
PRO
35
15k
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
100
Color Theory Basics | Prateek | Gurzu
gurzu
0
290
Making Projects Easy
brettharned
120
6.6k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis