$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
390
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
300
The troubles of modern dependency management and what to do about them
gousiosg
0
570
Mining Repositories with Apache Spark
gousiosg
0
680
My adventures with open everything
gousiosg
0
310
Structure and Evolution of Package Dependency Networks
gousiosg
0
800
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
940
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
310
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
一億総業務改善を支える社内AIエージェント基盤の要諦
yukukotani
8
2.8k
段階的に進める、 挫折しない自宅サーバ入門
yu_kod
5
2.2k
pmconf2025 - データを活用し「価値」へ繋げる
glorypulse
0
440
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
37k
プロダクトマネージャーが押さえておくべき、ソフトウェア資産とAIエージェント投資効果 / pmconf2025
i35_267
2
340
生成AI・AIエージェント時代、データサイエンティストは何をする人なのか?そして、今学生であるあなたは何を学ぶべきか?
kuri8ive
2
1.8k
Data Hubグループ 紹介資料
sansan33
PRO
0
2.3k
なぜフロントエンド技術を追うのか?なぜカンファレンスに参加するのか?
sakito
9
1.9k
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
9.9k
Microsoft Agent 365 を 30 分でなんとなく理解する
skmkzyk
1
290
AI/MLのマルチテナント基盤を支えるコンテナ技術
pfn
PRO
5
720
知っていると得する!Movable Type 9 の新機能を徹底解説
masakah
0
200
Featured
See All Featured
Building Applications with DynamoDB
mza
96
6.8k
Music & Morning Musume
bryan
46
7k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
690
Statistics for Hackers
jakevdp
799
230k
Facilitating Awesome Meetings
lara
57
6.7k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Unsuck your backbone
ammeep
671
58k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Building an army of robots
kneath
306
46k
How to Think Like a Performance Engineer
csswizardry
28
2.3k
Mobile First: as difficult as doing things right
swwweet
225
10k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis