Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
320
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
250
The troubles of modern dependency management and what to do about them
gousiosg
0
470
Mining Repositories with Apache Spark
gousiosg
0
560
My adventures with open everything
gousiosg
0
240
Structure and Evolution of Package Dependency Networks
gousiosg
0
700
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
880
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
240
The #issue32 incident
gousiosg
2
15k
Other Decks in Technology
See All in Technology
いまならこう作りたい AWSコンテナ[本格]入門ハンズオン 〜2024年版 ハンズオンの構想〜
horsewin
9
2.1k
サイバーエージェントにおける生成AIのリスキリング施策の取り組み / cyber-ai-reskilling
cyberagentdevelopers
PRO
2
200
急成長中のWINTICKETにおける品質と開発スピードと向き合ったQA戦略と今後の展望 / winticket-autify
cyberagentdevelopers
PRO
1
160
小規模に始めるデータメッシュとデータガバナンスの実践
kimujun
3
590
「最高のチューニング」をしないために / hack@delta 24.10
fujiwara3
21
3.5k
【技術書典17】OpenFOAM(自宅で極める流体解析)2次元円柱まわりの流れ
kamakiri1225
0
220
ネット広告に未来はあるか?「3rd Party Cookie廃止とPrivacy Sandboxの効果検証の裏側」 / third-party-cookie-privacy
cyberagentdevelopers
PRO
1
130
分布で見る効果検証入門 / ai-distributional-effect
cyberagentdevelopers
PRO
4
700
ABEMA のコンテンツ制作を最適化!生成 AI x クラウド映像編集システム / abema-ai-editor
cyberagentdevelopers
PRO
1
180
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
5
49k
バクラクにおける可観測性向上の取り組み
yuu26
3
420
初心者に Vue.js を 教えるには
tsukuha
5
390
Featured
See All Featured
Done Done
chrislema
181
16k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
41
2.1k
What’s in a name? Adding method to the madness
productmarketing
PRO
22
3.1k
Put a Button on it: Removing Barriers to Going Fast.
kastner
59
3.5k
For a Future-Friendly Web
brad_frost
175
9.4k
How to Ace a Technical Interview
jacobian
275
23k
Into the Great Unknown - MozCon
thekraken
31
1.5k
What's new in Ruby 2.0
geeforr
342
31k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
37
1.8k
How GitHub (no longer) Works
holman
311
140k
Keith and Marios Guide to Fast Websites
keithpitt
408
22k
A Philosophy of Restraint
colly
203
16k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis