$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Insights: Understanding Open Source
Search
Georgios Gousios
May 19, 2016
Technology
0
400
GitHub Insights: Understanding Open Source
Talk given at OSCON 2016
Georgios Gousios
May 19, 2016
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
310
The troubles of modern dependency management and what to do about them
gousiosg
0
570
Mining Repositories with Apache Spark
gousiosg
0
680
My adventures with open everything
gousiosg
0
320
Structure and Evolution of Package Dependency Networks
gousiosg
0
810
Mining Github for fun and profit
gousiosg
9
63k
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
940
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
310
The #issue32 incident
gousiosg
2
16k
Other Decks in Technology
See All in Technology
Oracle Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
1
760
事業の財務責任に向き合うリクルートデータプラットフォームのFinOps
recruitengineers
PRO
2
180
Snowflake だけで実現する “自立的データ品質管理” ~Data Quality Monitoring 解説 ~@ BUILD Meetup: TOKYO 2025
ryo_suzuki
0
130
マイクロサービスへの5年間 ぶっちゃけ何をしてどうなったか
joker1007
18
7.5k
AWSの新機能をフル活用した「re:Inventエージェント」開発秘話
minorun365
2
420
Identity Management for Agentic AI 解説
fujie
0
440
AIエージェント開発と活用を加速するワークフロー自動生成への挑戦
shibuiwilliam
4
810
AWS運用を効率化する!AWS Organizationsを軸にした一元管理の実践/nikkei-tech-talk-202512
nikkei_engineer_recruiting
0
170
Bedrock AgentCore Evaluationsで学ぶLLM as a judge入門
shichijoyuhi
2
190
20251222_サンフランシスコサバイバル術
ponponmikankan
2
140
【U/Day Tokyo 2025】Cygames流 最新スマートフォンゲームの技術設計 〜『Shadowverse: Worlds Beyond』におけるアーキテクチャ再設計の挑戦~
cygames
PRO
2
1.4k
Amazon Bedrock Knowledge Bases × メタデータ活用で実現する検証可能な RAG 設計
tomoaki25
6
2.2k
Featured
See All Featured
Reality Check: Gamification 10 Years Later
codingconduct
0
1.9k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
A Soul's Torment
seathinner
1
2k
Building an army of robots
kneath
306
46k
A Modern Web Designer's Workflow
chriscoyier
698
190k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
65
Being A Developer After 40
akosma
91
590k
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
1
25
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.2k
How GitHub (no longer) Works
holman
316
140k
Transcript
GitHub Insights Understanding Open Source @jeffmcaffer–Microsoft Georgios Gousios –Delft University
of Technology (TU Delft) Kevin Lewis – Microsoft
Snapshot overview
Inspire confidence
How open is a project? http://ghtorrent.org/pullreq-perf/
Commits (core vs community)
Commits (origin)
Comments (core vs community)
PR lifelines
Are we using git in a distributed way?
How may devs are there per country?
Insights
Business insights
Research insights
Cross-domain insights
Operational insights
Approach Data for the masses
GitHub by the numbers (Mid 2016)
Approach http://ghtorrent.org
How does it work? http://api.github.com/events
Example event (condensed) https://api.github.com/users/Cephei https://api.github.com/repos/PowerDMS/Owin.Scim https://api.github.com/repos/PowerDMS/Owin.Scim/commits/c751014f634d73e0b72f78a53c8cf137888b3 https://api.github.com/orgs/PowerDMS
Entities
GHTorrent architecture Github API Event Retrieval Commits Queue Project Events
Queue Events Data Retrieval Projects Commits evt.commit evt.watch evt.fork Data Retrieval Data Retrieval Data Retrieval Mirroring Cluster
GHTorrent by the numbers
Using the data You can do it too!
Using the data: Hosted http://ghtorrent.org
Using the data: Download
Using the data: Self-service https://github.com/ghtorrent/ghtorrent-webhook
Using the data: Azure Data Lake
Resources http://ghtorrent.org https://github.com/Microsoft/ghinsights @gousiosg @jeffmcaffer @kelewis