Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
360
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
280
The troubles of modern dependency management and what to do about them
gousiosg
0
520
Mining Repositories with Apache Spark
gousiosg
0
640
My adventures with open everything
gousiosg
0
290
Structure and Evolution of Package Dependency Networks
gousiosg
0
760
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
910
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
280
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Prox Industries株式会社 会社紹介資料
proxindustries
0
210
標準技術と独自システムで作る「つらくない」SaaS アカウント管理 / Effortless SaaS Account Management with Standard Technologies & Custom Systems
yuyatakeyama
2
1k
2年でここまで成長!AWSで育てたAI Slack botの軌跡
iwamot
PRO
2
250
ひとり情シスなCTOがLLMと始めるオペレーション最適化 / CTO's LLM-Powered Ops
yamitzky
0
380
VISITS_AIIoTビジネス共創ラボ登壇資料.pdf
iotcomjpadmin
0
150
Clineを含めたAIエージェントを 大規模組織に導入し、投資対効果を考える / Introducing AI agents into your organization
i35_267
4
1.4k
より良いプロダクトの開発を目指して - 情報を中心としたプロダクト開発 #phpcon #phpcon2025
bengo4com
1
370
AWS Summit Japan 2025 Community Stage - App workflow automation by AWS Step Functions
matsuihidetoshi
1
140
CI/CDとタスク共有で加速するVibe Coding
tnbe21
0
230
Amazon S3標準/ S3 Tables/S3 Express One Zoneを使ったログ分析
shigeruoda
2
390
AI技術トレンド勉強会 #1MCPの基礎と実務での応用
nisei_k
1
240
AWS アーキテクチャ作図入門/aws-architecture-diagram-101
ma2shita
29
9.5k
Featured
See All Featured
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
137
34k
Stop Working from a Prison Cell
hatefulcrawdad
270
20k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
181
53k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
How to train your dragon (web standard)
notwaldorf
92
6.1k
Automating Front-end Workflow
addyosmani
1370
200k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.8k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.2k
KATA
mclloyd
29
14k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
920
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis