Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
370
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
290
The troubles of modern dependency management and what to do about them
gousiosg
0
540
Mining Repositories with Apache Spark
gousiosg
0
660
My adventures with open everything
gousiosg
0
300
Structure and Evolution of Package Dependency Networks
gousiosg
0
780
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
920
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
290
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Browser
recruitengineers
PRO
8
2.2k
20250903_1つのAWSアカウントに複数システムがある環境におけるアクセス制御をABACで実現.pdf
yhana
2
280
Kiroと学ぶコンテキストエンジニアリング
oikon48
6
7.3k
JavaScript 研修
recruitengineers
PRO
6
1.4k
Kubernetes における cgroup v2 でのOut-Of-Memory 問題の解決
pfn
PRO
0
450
【 LLMエンジニアがヒューマノイド開発に挑んでみた 】 - 第104回 Machine Learning 15minutes! Hybrid
soneo1127
0
250
進捗
ydah
2
230
AI エージェントとはそもそも何か? - 技術背景から Amazon Bedrock AgentCore での実装まで- / AI Agent Unicorn Day 2025
hariby
3
590
オブザーバビリティが広げる AIOps の世界 / The World of AIOps Expanded by Observability
aoto
PRO
0
260
DuckDB-Wasmを使って ブラウザ上でRDBMSを動かす
hacusk
1
140
Snowflakeの生成AI機能を活用したデータ分析アプリの作成 〜Cortex AnalystとCortex Searchの活用とStreamlitアプリでの利用〜
nayuts
0
150
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
30k
Featured
See All Featured
How to Ace a Technical Interview
jacobian
279
23k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Speed Design
sergeychernyshev
32
1.1k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
6.1k
Build your cross-platform service in a week with App Engine
jlugia
231
18k
We Have a Design System, Now What?
morganepeng
53
7.8k
A Tale of Four Properties
chriscoyier
160
23k
Become a Pro
speakerdeck
PRO
29
5.5k
Site-Speed That Sticks
csswizardry
10
800
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
131
19k
Designing for humans not robots
tammielis
253
25k
It's Worth the Effort
3n
187
28k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis