Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
430
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
340
The troubles of modern dependency management and what to do about them
gousiosg
0
660
Mining Repositories with Apache Spark
gousiosg
0
700
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
880
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Phase12_総括_自走化
overflowinc
0
1.4k
「AIエージェントで変わる開発プロセス―レビューボトルネックからの脱却」
lycorptech_jp
PRO
0
110
20260320_JaSST26_Tokyo_登壇資料.pdf
mura_shin
0
120
脳が溶けた話 / Melted Brain
keisuke69
1
980
Bref でサービスを運用している話
sgash708
0
190
Phase11_戦略的AI経営
overflowinc
0
1.5k
20年以上続く PHP 大規模プロダクトを Kubernetes へ ── クラウド基盤刷新プロジェクトの4年間
oogfranz
PRO
0
180
スピンアウト講座04_ルーティン処理
overflowinc
0
1.2k
スピンアウト講座03_CLAUDE-MDとSKILL-MD
overflowinc
0
1.2k
Kiroで見直す開発プロセスとAI-DLC
k_adachi_01
0
130
Physical AI on AWS リファレンスアーキテクチャ / Physical AI on AWS Reference Architecture
aws_shota
1
130
Agent Skill 是什麼?對軟體產業帶來的變化
appleboy
0
230
Featured
See All Featured
The Curse of the Amulet
leimatthew05
1
10k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.2k
Embracing the Ebb and Flow
colly
88
5k
Navigating Team Friction
lara
192
16k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
250
Chasing Engaging Ingredients in Design
codingconduct
0
150
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
150
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
0
240
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis