Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
390
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
300
The troubles of modern dependency management and what to do about them
gousiosg
0
560
Mining Repositories with Apache Spark
gousiosg
0
670
My adventures with open everything
gousiosg
0
310
Structure and Evolution of Package Dependency Networks
gousiosg
0
800
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
940
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
300
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
フライトコントローラPX4の中身(制御器)を覗いてみた
santana_hammer
1
140
Flutterコントリビューションのススメ
d_r_1009
1
350
コンピューティングリソース何を使えばいいの?
tomokusaba
1
130
ステートレスなLLMでステートフルなAI agentを作る - YAPC::Fukuoka 2025
gfx
1
380
“それなりに”安全なWebアプリケーションの作り方
xryuseix
0
290
よくわからない人向けの IAM Identity Center とちょっとした落とし穴
kazzpapa3
2
710
エンタープライズ企業における開発効率化のためのコンテキスト設計とその活用
sergicalsix
1
330
メタプログラミングRuby問題集の活用
willnet
2
770
ユーザーストーリー x AI / User Stories x AI
oomatomo
0
170
なぜThrottleではなくDebounceだったのか? 700並列リクエストと戦うサーバーサイド実装のすべて
yoshiori
9
3.3k
AWS資格は取ったけどIAMロールを腹落ちできてなかったので、年内に整理してみた
hiro_eng_
0
190
品質保証の取り組みを広げる仕組みづくり〜スキルの移譲と自律を支える実践知〜
tarappo
2
840
Featured
See All Featured
Building Flexible Design Systems
yeseniaperezcruz
329
39k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
140
34k
Git: the NoSQL Database
bkeepers
PRO
432
66k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Bash Introduction
62gerente
615
210k
Documentation Writing (for coders)
carmenintech
76
5.1k
It's Worth the Effort
3n
187
28k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
How to Ace a Technical Interview
jacobian
280
24k
How to train your dragon (web standard)
notwaldorf
97
6.4k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.2k
[RailsConf 2023] Rails as a piece of cake
palkan
57
6k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis