Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
360
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
270
The troubles of modern dependency management and what to do about them
gousiosg
0
510
Mining Repositories with Apache Spark
gousiosg
0
640
My adventures with open everything
gousiosg
0
280
Structure and Evolution of Package Dependency Networks
gousiosg
0
750
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
910
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
280
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
人とAIとの共創を夢見た2か月 #共創AIミートアップ / Co-Creation with Keito-chan
kondoyuko
1
720
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
25k
mnt_data_とは?ChatGPTコード実行環境を深堀りしてみた
icck
0
210
Swiftは最高だよの話
yuukiw00w
2
290
SmartHRの複数のチームにおけるMCPサーバーの活用事例と課題
yukisnow1823
2
1.2k
会社紹介資料 / Sansan Company Profile
sansan33
PRO
6
360k
コードの考古学 〜労務システムから発掘した成長の糧〜
kenta_smarthr
1
1.2k
積み上げられた技術資産と向き合いながら、プロダクトの信頼性をどう守るか
plaidtech
PRO
0
940
Java で学ぶ 代数的データ型
ysknsid25
1
540
FastMCPでSQLをチェックしてくれるMCPサーバーを自作してCursorから動かしてみた
nayuts
1
220
CSSDay, Amsterdam
brucel
0
140
Zero Data Loss Autonomous Recovery Service サービス概要
oracle4engineer
PRO
2
7.2k
Featured
See All Featured
Unsuck your backbone
ammeep
671
58k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
34
2.3k
Embracing the Ebb and Flow
colly
85
4.7k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
750
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
For a Future-Friendly Web
brad_frost
178
9.7k
Typedesign – Prime Four
hannesfritz
41
2.6k
Raft: Consensus for Rubyists
vanstee
137
7k
Building Better People: How to give real-time feedback that sticks.
wjessup
368
19k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
123
52k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
137
34k
Writing Fast Ruby
sferik
628
61k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis