Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
430
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
340
The troubles of modern dependency management and what to do about them
gousiosg
0
660
Mining Repositories with Apache Spark
gousiosg
0
700
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
880
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Phase09_自動化_仕組み化
overflowinc
0
1.6k
開発チームとQAエンジニアの新しい協業モデル -年末調整開発チームで実践する【QAリード施策】-
qa
0
270
Datadog で実現するセキュリティ対策 ~オブザーバビリティとセキュリティを 一緒にやると何がいいのか~
a2ush
0
120
Physical AI on AWS リファレンスアーキテクチャ / Physical AI on AWS Reference Architecture
aws_shota
1
130
夢の無限スパゲッティ製造機 #phperkaigi
o0h
PRO
0
360
Phase10_組織浸透_データ活用
overflowinc
0
1.6k
テストプロセスにおけるAI活用 :人間とAIの共存
hacomono
PRO
0
160
Phase05_ClaudeCode入門
overflowinc
0
2.1k
スピンアウト講座06_認証系(API-OAuth-MCP)入門
overflowinc
0
1.1k
スピンアウト講座02_ファイル管理
overflowinc
0
1.3k
Copilot 宇宙へ 〜生成AIで「専門データの壁」を壊す方法〜
nakasho
0
180
スケールアップ企業でQA組織が機能し続けるための組織設計と仕組み〜ボトムアップとトップダウンを両輪としたアプローチ〜
tarappo
4
370
Featured
See All Featured
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.9k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
150
エンジニアに許された特別な時間の終わり
watany
106
240k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
86
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.2k
How to build a perfect <img>
jonoalderson
1
5.3k
A designer walks into a library…
pauljervisheath
210
24k
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
150
Skip the Path - Find Your Career Trail
mkilby
1
88
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
300
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
180
Optimising Largest Contentful Paint
csswizardry
37
3.6k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis