Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Georgios Gousios
May 19, 2016
Technology
0
420
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
330
The troubles of modern dependency management and what to do about them
gousiosg
0
650
Mining Repositories with Apache Spark
gousiosg
0
700
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
850
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
トップマネジメントとコンピテンシーから考えるエンジニアリングマネジメント
zigorou
4
550
型を書かないRuby開発への挑戦
riseshia
0
190
EMからVPoEを経てCTOへ:マネジメントキャリアパスにおける葛藤と成長
kakehashi
PRO
7
920
モブプログラミング再入門 ー 基本から見直す、AI時代のチーム開発の選択肢 ー / A Re-introduction of Mob Programming
takaking22
1
120
Startups on Rails: 2026 at RubyConf Thailand
irinanazarova
0
130
Evolution of Claude Code & How to use features
oikon48
1
340
Ultra Ethernet (UEC) v1.0 仕様概説
markunet
3
210
白金鉱業Meetup_Vol.22_Orbital Senseを支える衛星画像のマルチモーダルエンベディングと地理空間のあいまい検索技術
brainpadpr
2
220
OpenClawで回す組織運営
jacopen
2
560
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
3k
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
44k
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
10k
Featured
See All Featured
エンジニアに許された特別な時間の終わり
watany
106
240k
Building Applications with DynamoDB
mza
96
6.9k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.4k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
Site-Speed That Sticks
csswizardry
13
1.1k
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
Speed Design
sergeychernyshev
33
1.6k
Rails Girls Zürich Keynote
gr2m
96
14k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
99
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
68
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
360
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis