Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
370
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
280
The troubles of modern dependency management and what to do about them
gousiosg
0
530
Mining Repositories with Apache Spark
gousiosg
0
650
My adventures with open everything
gousiosg
0
290
Structure and Evolution of Package Dependency Networks
gousiosg
0
760
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
920
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
280
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
cdk initで生成されるあのファイル達は何なのか/cdk-init-generated-files
tomoki10
1
670
振り返りTransit Gateway ~VPCをいい感じでつなげるために~
masakiokuda
3
210
〜『世界中の家族のこころのインフラ』を目指して”次の10年”へ〜 SREが導いたグローバルサービスの信頼性向上戦略とその舞台裏 / Towards the Next Decade: Enhancing Global Service Reliability
kohbis
3
1.5k
今だから言えるセキュリティLT_Wordpress5.7.2未満を一斉アップデートせよ
cuebic9bic
2
170
[SRE NEXT 2025] すみずみまで暖かく照らすあなたの太陽でありたい
carnappopper
2
470
AWS CDK 入門ガイド これだけは知っておきたいヒント集
anank
5
760
Talk to Someone At Delta Airlines™️ USA Contact Numbers
travelcarecenter
0
160
SREの次のキャリアの道しるべ 〜SREがマネジメントレイヤーに挑戦して、 気づいたこととTips〜
coconala_engineer
1
4.4k
ABEMAの本番環境負荷試験への挑戦
mk2taiga
5
1.3k
CDK Toolkit Libraryにおけるテストの考え方
smt7174
1
550
How to Quickly Call American Airlines®️ U.S. Customer Care : Full Guide
flyaahelpguide
0
240
AWS 怖い話 WAF編 @fillz_noh #AWSStartup #AWSStartup_Kansai
fillznoh
0
130
Featured
See All Featured
Docker and Python
trallard
45
3.5k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
47
9.6k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.7k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
48
2.9k
The Power of CSS Pseudo Elements
geoffreycrofte
77
5.9k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
108
19k
Fireside Chat
paigeccino
37
3.5k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
282
13k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
50
5.5k
Code Reviewing Like a Champion
maltzj
524
40k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis