Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Georgios Gousios
May 19, 2016
Technology
440
0
Share
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
350
The troubles of modern dependency management and what to do about them
gousiosg
0
680
Mining Repositories with Apache Spark
gousiosg
0
710
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
900
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
コミュニティ・勉強会を作るのは目的じゃない
ohmori_yusuke
0
290
「SaaSの次の時代」に重要性を増すステークホルダーマネジメントの要諦 ~解像度を圧倒的に高めPdMの価値を最大化させる方法~
kakehashi
PRO
3
3.7k
ServiceNow Knowledge 26 の歩き方
manarobot
0
330
20260428_Product Management Summit_Loglass_JoeHirose
loglassjoe
4
6.7k
Fabric MCPの紹介と使い分け
ryomaru0825
1
110
Angular Architecture Revisited Modernizing Angular Architectural Patterns
rainerhahnekamp
0
120
生成AIが変える SaaS の競争原理と弁護士ドットコムのプロダクト戦略
bengo4com
1
3.4k
M5Stack CoreS3とZephyr(RTOS)で Edge AIっぽいことしてみた
iotengineer22
0
430
FessのAI検索モード:検索システムとLLMへの取り組み
marevol
0
240
エージェントスキルを作って自分のインプットに役立てよう
tsubakimoto_s
0
530
ブラウザの投機的読み込みと投機ルールAPIを理解し、Webサービスのパフォーマンスを最適化する
shuta13
3
270
プラットフォームエンジニアリングの実践 - AWS コンテナサービスで構築する社内プラットフォーム / AWS Containers Platform Meetup #1
literalice
1
240
Featured
See All Featured
Leo the Paperboy
mayatellez
7
1.7k
Art, The Web, and Tiny UX
lynnandtonic
304
21k
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
290
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.7k
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
230
Ruling the World: When Life Gets Gamed
codingconduct
0
220
Unsuck your backbone
ammeep
672
58k
Mobile First: as difficult as doing things right
swwweet
225
10k
BBQ
matthewcrist
89
10k
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
61
43k
My Coaching Mixtape
mlcsv
0
120
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
160
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis