Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
340
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
260
The troubles of modern dependency management and what to do about them
gousiosg
0
480
Mining Repositories with Apache Spark
gousiosg
0
610
My adventures with open everything
gousiosg
0
250
Structure and Evolution of Package Dependency Networks
gousiosg
0
710
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
890
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
250
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
「隙間家具OSS」に至る道/Fujiwara Tech Conference 2025
fujiwara3
7
6.5k
完全自律型AIエージェントとAgentic Workflow〜ワークフロー構築という現実解
pharma_x_tech
0
350
実践! ソフトウェアエンジニアリングの価値の計測 ── Effort、Output、Outcome、Impact
nomuson
0
2.1k
2025年に挑戦したいこと
molmolken
0
160
Godot Engineについて調べてみた
unsoluble_sugar
0
410
iPadOS18でフローティングタブバーを解除してみた
sansantech
PRO
1
150
GoogleのAIエージェント論 Authors: Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic
customercloud
PRO
0
160
シフトライトなテスト活動を適切に行うことで、無理な開発をせず、過剰にテストせず、顧客をビックリさせないプロダクトを作り上げているお話 #RSGT2025 / Shift Right
nihonbuson
3
2.2k
生成AI × 旅行 LLMを活用した旅行プラン生成・チャットボット
kominet_ava
0
160
タイミーのデータ活用を支えるdbt Cloud導入とこれから
ttccddtoki
1
180
Amazon Q Developerで.NET Frameworkプロジェクトをモダナイズしてみた
kenichirokimura
1
200
いま現場PMのあなたが、 経営と向き合うPMになるために 必要なこと、腹をくくること
hiro93n
9
7.8k
Featured
See All Featured
GraphQLとの向き合い方2022年版
quramy
44
13k
Become a Pro
speakerdeck
PRO
26
5.1k
The Art of Programming - Codeland 2020
erikaheidi
53
13k
Adopting Sorbet at Scale
ufuk
74
9.2k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
3
180
Being A Developer After 40
akosma
89
590k
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Build The Right Thing And Hit Your Dates
maggiecrowley
33
2.5k
Code Review Best Practice
trishagee
65
17k
Mobile First: as difficult as doing things right
swwweet
222
9k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis