Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
330
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
250
The troubles of modern dependency management and what to do about them
gousiosg
0
480
Mining Repositories with Apache Spark
gousiosg
0
590
My adventures with open everything
gousiosg
0
250
Structure and Evolution of Package Dependency Networks
gousiosg
0
700
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
880
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
250
The #issue32 incident
gousiosg
2
15k
Other Decks in Technology
See All in Technology
初心者向けAWS Securityの勉強会mini Security-JAWSを9ヶ月ぐらい実施してきての近況
cmusudakeisuke
0
130
TypeScriptの次なる大進化なるか!? 条件型を返り値とする関数の型推論
uhyo
2
1.7k
Lexical Analysis
shigashiyama
1
150
テストコード品質を高めるためにMutation Testingライブラリ・Strykerを実戦導入してみた話
ysknsid25
7
2.7k
The Rise of LLMOps
asei
7
1.7k
強いチームと開発生産性
onk
PRO
35
11k
Lambdaと地方とコミュニティ
miu_crescent
2
370
適材適所の技術選定 〜GraphQL・REST API・tRPC〜 / Optimal Technology Selection
kakehashi
1
690
Incident Response Practices: Waroom's Features and Future Challenges
rrreeeyyy
0
160
Terraform Stacks入門 #HashiTalks
msato
0
360
サイバーセキュリティと認知バイアス:対策の隙を埋める心理学的アプローチ
shumei_ito
0
390
なぜ今 AI Agent なのか _近藤憲児
kenjikondobai
4
1.4k
Featured
See All Featured
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
159
15k
Navigating Team Friction
lara
183
14k
Visualization
eitanlees
145
15k
Art, The Web, and Tiny UX
lynnandtonic
297
20k
The Art of Programming - Codeland 2020
erikaheidi
52
13k
How To Stay Up To Date on Web Technology
chriscoyier
788
250k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5k
Building Flexible Design Systems
yeseniaperezcruz
327
38k
A better future with KSS
kneath
238
17k
Optimizing for Happiness
mojombo
376
70k
Imperfection Machines: The Place of Print at Facebook
scottboms
265
13k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
16
2.1k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis