Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
420
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
320
The troubles of modern dependency management and what to do about them
gousiosg
0
640
Mining Repositories with Apache Spark
gousiosg
0
700
My adventures with open everything
gousiosg
0
340
Structure and Evolution of Package Dependency Networks
gousiosg
0
840
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
960
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
330
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
量子クラウドサービスの裏側 〜Deep Dive into OQTOPUS〜
oqtopus
0
150
予期せぬコストの急増を障害のように扱う――「コスト版ポストモーテム」の導入とその後の改善
muziyoshiz
1
2.1k
会社紹介資料 / Sansan Company Profile
sansan33
PRO
15
400k
Cosmos World Foundation Model Platform for Physical AI
takmin
0
980
AIが実装する時代、人間は仕様と検証を設計する
gotalab555
1
650
今日から始めるAmazon Bedrock AgentCore
har1101
4
420
ECS障害を例に学ぶ、インシデント対応に備えたAIエージェントの育て方 / How to develop AI agents for incident response with ECS outage
iselegant
4
460
10Xにおける品質保証活動の全体像と改善 #no_more_wait_for_test
nihonbuson
PRO
2
340
日本の85%が使う公共SaaSは、どう育ったのか
taketakekaho
1
250
配列に見る bash と zsh の違い
kazzpapa3
3
170
We Built for Predictability; The Workloads Didn’t Care
stahnma
0
150
ファインディの横断SREがTakumi byGMOと取り組む、セキュリティと開発スピードの両立
rvirus0817
1
1.7k
Featured
See All Featured
The SEO Collaboration Effect
kristinabergwall1
0
360
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.7k
Un-Boring Meetings
codingconduct
0
200
Claude Code のすすめ
schroneko
67
210k
Testing 201, or: Great Expectations
jmmastey
46
8.1k
Into the Great Unknown - MozCon
thekraken
40
2.3k
GitHub's CSS Performance
jonrohan
1032
470k
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
430
Chasing Engaging Ingredients in Design
codingconduct
0
120
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
160
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.6k
First, design no harm
axbom
PRO
2
1.1k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis