Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
380
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
290
The troubles of modern dependency management and what to do about them
gousiosg
0
550
Mining Repositories with Apache Spark
gousiosg
0
660
My adventures with open everything
gousiosg
0
310
Structure and Evolution of Package Dependency Networks
gousiosg
0
790
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
930
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
290
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
生成AIで「お客様の声」を ストーリーに変える 新潮流「Generative ETL」
ishikawa_satoru
1
250
LLMアプリケーション開発におけるセキュリティリスクと対策 / LLM Application Security
flatt_security
7
1.6k
Goのビルドシステムの変遷 / The history of Go's build system
ymotongpoo
12
3.5k
Geospatialの世界最前線を探る [2025年版]
dayjournal
2
450
Go Conference 2025: GoのinterfaceとGenericsの内部構造と進化 / Go type system internals
ryokotmng
3
560
DataOpsNight#8_Terragruntを用いたスケーラブルなSnowflakeインフラ管理
roki18d
1
310
PyCon JP 2025 DAY1 「Hello, satellite data! ~Pythonではじめる衛星データ解析~」
ra0kley
0
870
後進育成のしくじり〜任せるスキルとリーダーシップの両立〜
matsu0228
1
610
[2025-09-30] Databricks Genie を利用した分析基盤とデータモデリングの IVRy の現在地
wxyzzz
0
420
Azure Well-Architected Framework入門
tomokusaba
0
120
BtoBプロダクト開発の深層
16bitidol
0
150
実装で解き明かす並行処理の歴史
zozotech
PRO
1
140
Featured
See All Featured
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Navigating Team Friction
lara
189
15k
Making Projects Easy
brettharned
118
6.4k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Six Lessons from altMBA
skipperchong
28
4k
The Language of Interfaces
destraynor
162
25k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
358
30k
Being A Developer After 40
akosma
90
590k
We Have a Design System, Now What?
morganepeng
53
7.8k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
570
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
30
2.9k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis