Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
380
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
290
The troubles of modern dependency management and what to do about them
gousiosg
0
550
Mining Repositories with Apache Spark
gousiosg
0
660
My adventures with open everything
gousiosg
0
310
Structure and Evolution of Package Dependency Networks
gousiosg
0
790
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
940
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
300
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Biz職でもDifyでできる! 「触らないAIワークフロー」を実現する方法
igarashikana
3
1.6k
OSSで50の競合と戦うためにやったこと
yamadashy
3
940
Railsの話をしよう
yahonda
0
170
今この時代に技術とどう向き合うべきか
gree_tech
PRO
2
2.1k
Zephyr(RTOS)にEdge AIを組み込んでみた話
iotengineer22
1
270
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
3
20k
Azureコストと向き合った、4年半のリアル / Four and a half years of dealing with Azure costs
aeonpeople
1
250
FinOps について (ちょっと) 本気出して考えてみた
skmkzyk
0
190
難しいセキュリティ用語をわかりやすくしてみた
yuta3110
0
370
Linux カーネルが支えるコンテナの仕組み / LF Japan Community Days 2025 Osaka
tenforward
1
110
「改善」ってこれでいいんだっけ?
ukigmo_hiro
0
400
AI駆動で進める依存ライブラリ更新 ─ Vue プロジェクトの品質向上と開発スピード改善の実践録
sayn0
1
110
Featured
See All Featured
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.2k
Speed Design
sergeychernyshev
32
1.2k
Practical Orchestrator
shlominoach
190
11k
Why Our Code Smells
bkeepers
PRO
340
57k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
What's in a price? How to price your products and services
michaelherold
246
12k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.7k
Build your cross-platform service in a week with App Engine
jlugia
232
18k
Optimizing for Happiness
mojombo
379
70k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Building a Scalable Design System with Sketch
lauravandoore
463
33k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis