Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
350
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
270
The troubles of modern dependency management and what to do about them
gousiosg
0
510
Mining Repositories with Apache Spark
gousiosg
0
630
My adventures with open everything
gousiosg
0
270
Structure and Evolution of Package Dependency Networks
gousiosg
0
740
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
910
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
270
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
更新系と状態
uhyo
8
2.1k
意思決定を支える検索体験を目指してやってきたこと
hinatades
PRO
0
360
2025-04-14 Data & Analytics 井戸端会議 Multi tenant log platform with Iceberg
kamijin_fanta
0
140
AIにおけるソフトウェアテスト_ver1.00
fumisuke
1
310
もう難しくない!誰でもカンタンDocker入門 〜30分であなたのPCにアプリを立ち上げる〜
devops_vtj
0
160
Microsoft の SSE の現在地
skmkzyk
0
260
Perl歴約10年のエンジニアがフルスタックTypeScriptに出会ってみた
papix
1
230
DjangoCon Europe 2025 Keynote - Django for Data Science
wsvincent
0
330
OpenLane-V2ベンチマークと代表的な手法
kzykmyzw
0
130
AWS全冠芸人が見た世界 ~資格取得より大切なこと~
masakiokuda
6
6.5k
Oracle Cloud Infrastructure:2025年4月度サービス・アップデート
oracle4engineer
PRO
0
250
Web Intelligence and Visual Media Analytics
weblyzard
PRO
1
5.9k
Featured
See All Featured
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
770
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
32
5.4k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
Build your cross-platform service in a week with App Engine
jlugia
230
18k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
GraphQLとの向き合い方2022年版
quramy
46
14k
4 Signs Your Business is Dying
shpigford
183
22k
What's in a price? How to price your products and services
michaelherold
245
12k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
14
1.4k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis