Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
350
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
270
The troubles of modern dependency management and what to do about them
gousiosg
0
500
Mining Repositories with Apache Spark
gousiosg
0
630
My adventures with open everything
gousiosg
0
270
Structure and Evolution of Package Dependency Networks
gousiosg
0
740
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
900
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
260
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Zabbixチョットデキルとは!?
kujiraitakahiro
0
120
滑らかなユーザー体験も目指す注文管理のマイクロサービス化〜注文情報CSVダウンロード機能の事例〜
demaecan
0
120
マルチアカウント管理で必須!AWS Organizationsの機能とユースケース解説
nrinetcom
PRO
1
120
モンテカルロ木探索のパフォーマンスを予測する Kaggleコンペ解説 〜生成AIによる未知のゲーム生成〜
rist
4
1.2k
GitHub MCP Serverを使って Pull Requestを作る、レビューする
hiyokose
2
550
大規模プロジェクトにおける 品質管理の要点と実践 / 20250327 Suguru Ishii
shift_evolve
0
320
MCP Documentation Server @AI Coding Meetup #1
yyoshiki41
1
1.4k
生成AI時代のセキュアCI/CDとソース管理
yuriemori
0
100
LINEギフトのLINEミニアプリアクセシビリティ改善事例
lycorptech_jp
PRO
0
320
Medmain FACTBOOK
akinaootani
0
150
Restarting_SRE_Road_to_SRENext_.pdf
_awache
1
220
チームビルディング「脅威モデリング」ワークショップ
koheiyoshikawa
0
180
Featured
See All Featured
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
21k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
4
490
How STYLIGHT went responsive
nonsquared
99
5.4k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Being A Developer After 40
akosma
90
590k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
What's in a price? How to price your products and services
michaelherold
245
12k
Testing 201, or: Great Expectations
jmmastey
42
7.4k
Designing for humans not robots
tammielis
251
25k
A better future with KSS
kneath
239
17k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis