Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
450
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
380
The troubles of modern dependency management and what to do about them
gousiosg
0
700
Mining Repositories with Apache Spark
gousiosg
0
730
My adventures with open everything
gousiosg
0
360
Structure and Evolution of Package Dependency Networks
gousiosg
0
920
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
990
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
350
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
AIの性能が向上しても未解決な組織の重大問題は何か?/An Unsolved Organizational Problem in the Age of AI
moriyuya
4
670
自律型AIエージェントは何を破壊するのか
kojira
0
160
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.5k
20260619 私の日常業務での生成 AI 活用
masaruogura
1
200
手塩にかけりゃいいってもんじゃない
ming_ayami
0
570
MCP Appsを作ってみよう
iwamot
PRO
4
640
非エンジニアがClaudeと挑んだ「1ヶ月間プロダクト30本ノック」
askokc
0
510
エンジニアリング戦略の作り方 / Crafting Engineering Strategy
iwashi86
21
6.9k
Bucharest Tech Week 2026 - Reinventing testing practices in the AI era
edeandrea
PRO
1
160
AIエージェントが名古屋の猛暑からあなたを守る
happysamurai294
0
120
【NRUG vol.18】KubernetesにおけるNew Relicデータ取得量削減の考え方
nrug_member
0
110
脆弱性対応、どこで線を引くか
rymiyamoto
1
390
Featured
See All Featured
How to build a perfect <img>
jonoalderson
1
5.6k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
540
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
Design in an AI World
tapps
1
240
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
1
350
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
940
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
590
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
310
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
180
Thoughts on Productivity
jonyablonski
76
5.2k
Balancing Empowerment & Direction
lara
6
1.2k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis