Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
The GHTorrent dataset and toolsuite
Search
Georgios Gousios
May 17, 2013
Technology
4
130k
The GHTorrent dataset and toolsuite
MSR2013 data paper presentation
Georgios Gousios
May 17, 2013
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
250
The troubles of modern dependency management and what to do about them
gousiosg
0
480
Mining Repositories with Apache Spark
gousiosg
0
590
My adventures with open everything
gousiosg
0
250
Structure and Evolution of Package Dependency Networks
gousiosg
0
700
Mining Github for fun and profit
gousiosg
9
63k
GitHub Insights: Understanding Open Source
gousiosg
0
330
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
880
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
250
Other Decks in Technology
See All in Technology
あなたの知らない Function.prototype.toString() の世界
mizdra
PRO
4
1.1k
DynamoDB でスロットリングが発生したとき_大盛りver/when_throttling_occurs_in_dynamodb_long
emiki
1
480
データプロダクトの定義からはじめる、データコントラクト駆動なデータ基盤
chanyou0311
3
370
生成AIが変えるデータ分析の全体像
ishikawa_satoru
0
200
20241120_JAWS_東京_ランチタイムLT#17_AWS認定全冠の先へ
tsumita
2
320
アジャイルでの品質の進化 Agile in Motion vol.1/20241118 Hiroyuki Sato
shift_evolve
0
190
【LT】ソフトウェア産業は進化しているのか? #Agilejapan
takabow
0
110
10XにおけるData Contractの導入について: Data Contract事例共有会
10xinc
7
720
Adopting Jetpack Compose in Your Existing Project - GDG DevFest Bangkok 2024
akexorcist
0
120
Platform Engineering for Software Developers and Architects
syntasso
1
530
リンクアンドモチベーション ソフトウェアエンジニア向け紹介資料 / Introduction to Link and Motivation for Software Engineers
lmi
4
300k
SRE×AIOpsを始めよう!GuardDutyによるお手軽脅威検出
amixedcolor
1
230
Featured
See All Featured
Building Flexible Design Systems
yeseniaperezcruz
327
38k
Designing on Purpose - Digital PM Summit 2013
jponch
115
7k
Done Done
chrislema
181
16k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
250
21k
[RailsConf 2023] Rails as a piece of cake
palkan
52
4.9k
Building Your Own Lightsaber
phodgson
103
6.1k
Documentation Writing (for coders)
carmenintech
65
4.4k
Building a Scalable Design System with Sketch
lauravandoore
459
33k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5k
How to train your dragon (web standard)
notwaldorf
88
5.7k
Speed Design
sergeychernyshev
25
620
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
1.9k
Transcript
The GHTorrent Dataset and Tool Suite Georgios Gousios Software Engineering
Research Group TU Delft
All data from Github
Ready to be queried
ghtorrent.org
Repositories
Commits
Pull requests
Issues
Users and Organizations
Mirror event stream
<<event>> PushEvent <<api>> /users/:user ensure_user <<api>> /repos/:user/:repo/ ensure_repo <<api>> /repos/:user/:repo/commits
ensure_commits ensure_user <<api>> /:user/:repo/sha ensure_commit ensure_user <<api>> /users/:user/ followers ensure_followers <<api>> /repos/:user/:repo/ commits/:sha/comments ensure_commit_comments <<api>> /users/:user/orgs ensure_orgs <<api>> /orgs/:org/teams ensure_teams Recursive dependency retrieval
Build relational database to query
repositories users organizations issues /users/:user /user/repos /repos/:user/:repo/issues /orgs/:org { 88"type":8"User",
88"public_gists":80, 88"login":8"gousiosg", 88"followers":88, 88"name":8"Georgios8Gousios", 88"public_repos":84, 88"created_at":8..., 88"id":8386172, 88"following":84, } { . . . CoSQL database as cache
Periodic dumps of DBs online
Query relational DB online
$ gem install sqlite3 ghtorrent $ ght-retrieve-repo mojombo jekyll $
(edit config.yaml) Roll your own tools
Research !
Single developer identities
Single developer identities
Single developer identities
Single developer identities
Single developer identities
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Network analysis
Distributed development Text Text TUD-SERG-2013-10 An Exploratory Study of the
Pull- based Software Development model
None
None
None
None
None
ghtorrent.org Octicons font: courtesy Github