Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Georgios Gousios
May 19, 2016
Technology
440
0
Share
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
370
The troubles of modern dependency management and what to do about them
gousiosg
0
690
Mining Repositories with Apache Spark
gousiosg
0
720
My adventures with open everything
gousiosg
0
360
Structure and Evolution of Package Dependency Networks
gousiosg
0
910
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
980
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
350
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
イベントで大活躍する電子ペーパー名札 〜その3〜 / ビジュアルプログラミングIoTLT vol.23
you
PRO
0
160
GitHub Copilot CLIでWebアクセシビリティを改善した話
tomokusaba
0
120
Kiro CLI v2.0.0がやってきた!
kentapapa
0
200
Kaggle未経験社員をメダリストに育てる「AIドラゴン桜」
lycorptech_jp
PRO
0
630
Anthropic AIネイティブ・スタートアップ構築のプレイブック を理解する
nagatsu
0
200
layerx-fde-practices
cipepser
6
2.8k
Claude Code x Accounting
kawaguti
PRO
1
330
AIガバナンス実践 - 生成AIコネクタのデータ漏洩リスクと実務対策
knishioka
0
110
はじめてのDatadog
kairim0
0
140
管理アカウント単一運用からAWS Organizationsに移行するの大変で滅
hiramax
0
270
Javaコミュニティをもっと楽しむための9箇条
takasyou
0
300
AI とサービス・デザイン / AI and Service Design
ks91
PRO
0
180
Featured
See All Featured
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.7k
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
300
Ruling the World: When Life Gets Gamed
codingconduct
0
240
Bash Introduction
62gerente
615
210k
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
170
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
2k
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
Un-Boring Meetings
codingconduct
0
300
Why Our Code Smells
bkeepers
PRO
340
58k
Abbi's Birthday
coloredviolet
2
7.7k
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
210
sira's awesome portfolio website redesign presentation
elsirapls
0
260
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis