Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
320
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
240
The troubles of modern dependency management and what to do about them
gousiosg
0
460
Mining Repositories with Apache Spark
gousiosg
0
550
My adventures with open everything
gousiosg
0
240
Structure and Evolution of Package Dependency Networks
gousiosg
0
690
Mining Github for fun and profit
gousiosg
9
62k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
870
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
240
The #issue32 incident
gousiosg
2
15k
Other Decks in Technology
See All in Technology
watsonx.ai Dojo 環境準備について
oniak3ibm
PRO
0
350
たった1人からはじめる【Agile Community of Practice】~ソース原理とFearless Changeを添えて~
ktc_corporate_it
1
500
LLVM/ASMを使った有限体の高速実装
herumi
0
120
『GRANBLUE FANTASY: Relink』専任エンジニアチームで回す大規模開発QAサイクル
cygames
0
100
Functional TypeScript
naoya
11
4.8k
サプライチェーン攻撃に備える
ryunen344
0
300
Cloud Run と GitHub Template Repository による軽量なアプリケーションプラットフォーム/ #nikkei_tech_talk
nikkei_engineer_recruiting
0
110
Technical Writing Meetup vol.35
soracom
PRO
2
130
『GRANBLUE FANTASY: Relink』最高の「没入感」を実現するカットシーン制作手法とそれを支える技術
cygames
1
150
『GRANBLUE FANTASY Relink』ソフトウェアラスタライザによる実践的なオクルージョンカリング
cygames
0
180
あなたの知らないiOS開発の世界
recruitengineers
PRO
3
180
サーバレスでモバイルアプリ開発! NTTコム「ビジネスdアプリ」のアーキテクチャ / The architecture of business d app
nttcom
12
250
Featured
See All Featured
Six Lessons from altMBA
skipperchong
26
3.4k
Statistics for Hackers
jakevdp
794
220k
5 minutes of I Can Smell Your CMS
philhawksworth
202
19k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
32k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.3k
RailsConf 2023
tenderlove
28
810
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
89
16k
What's new in Ruby 2.0
geeforr
340
31k
KATA
mclloyd
27
13k
Building an army of robots
kneath
302
42k
Raft: Consensus for Rubyists
vanstee
136
6.5k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
248
20k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis