Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
370
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
290
The troubles of modern dependency management and what to do about them
gousiosg
0
530
Mining Repositories with Apache Spark
gousiosg
0
650
My adventures with open everything
gousiosg
0
290
Structure and Evolution of Package Dependency Networks
gousiosg
0
780
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
920
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
290
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Infrastructure as Prompt実装記 〜Bedrock AgentCoreで作る自然言語インフラエージェント〜
yusukeshimizu
1
140
夏休みWebアプリパフォーマンス相談室/web-app-performance-on-radio
hachi_eiji
0
220
マルチプロダクト×マルチテナントを支えるモジュラモノリスを中心としたアソビューのアーキテクチャ
disc99
1
580
Kiroでインフラ要件定義~テスト を実施してみた
nagisa53
3
370
Agent Development Kitで始める生成 AI エージェント実践開発
danishi
0
150
Rubyの国のPerlMonger
anatofuz
3
740
Bet "Bet AI" - Accelerating Our AI Journey #BetAIDay
layerx
PRO
4
1.8k
はじめての転職講座/The Guide of First Career Change
kwappa
5
4.2k
OPENLOGI Company Profile for engineer
hr01
1
38k
PL/pgSQLの基本と使い所
tameguro
2
210
【CEDEC2025】『Shadowverse: Worlds Beyond』二度目のDCG開発でゲームをリデザインする~遊びやすさと競技性の両立~
cygames
PRO
1
370
UDDのススメ - 拡張版 -
maguroalternative
1
580
Featured
See All Featured
Code Review Best Practice
trishagee
69
19k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Statistics for Hackers
jakevdp
799
220k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
283
13k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
Intergalactic Javascript Robots from Outer Space
tanoku
272
27k
Code Reviewing Like a Champion
maltzj
524
40k
Thoughts on Productivity
jonyablonski
69
4.8k
Making the Leap to Tech Lead
cromwellryan
134
9.5k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis