Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
The GHTorrent dataset and toolsuite
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Georgios Gousios
May 17, 2013
Technology
130k
3
Share
The GHTorrent dataset and toolsuite
MSR2013 data paper presentation
Georgios Gousios
May 17, 2013
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
350
The troubles of modern dependency management and what to do about them
gousiosg
0
680
Mining Repositories with Apache Spark
gousiosg
0
710
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
900
Mining Github for fun and profit
gousiosg
9
63k
GitHub Insights: Understanding Open Source
gousiosg
0
440
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
Other Decks in Technology
See All in Technology
雑談は、センサーだった
bitkey
PRO
2
190
AgentCore×VPCでの設計パターンn選と勘所
har1101
4
370
AWS Transform CustomでIaCコードを自由自在に変換しよう
duelist2020jp
0
240
FessのAI検索モード:検索システムとLLMへの取り組み
marevol
0
240
AIの揺らぎに“コシ”を与える階層化品質設計
ickx
0
210
ボトムアップの改善の火を灯し続けろ!〜支援現場で学んだ、消えないための3つの打ち手〜 / 20260509 Kazuki Mori
shift_evolve
PRO
2
420
UIライブラリに依存しすぎないReact Native設計を目指して
grandbig
0
190
コードや知識を組み込む / Incorporate Code and Knowledge
ks91
PRO
0
210
エージェントスキルを作って自分のインプットに役立てよう
tsubakimoto_s
0
530
AIはハッカーを減らすのか、増やすのか?──現役ホワイトハッカーから見るAI時代のリアル【MEGU-Meet】
cscengineer
PRO
0
270
M5Stack CoreS3とZephyr(RTOS)で Edge AIっぽいことしてみた
iotengineer22
0
430
Google Cloud Next '26 の裏でこっそりリリースされたCloud Number Registry & Cloud Hub コスト分析 を試してみた
hikaru1001
0
150
Featured
See All Featured
My Coaching Mixtape
mlcsv
0
110
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
70
39k
ラッコキーワード サービス紹介資料
rakko
1
3.2M
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.2k
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
130
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
10k
Building Adaptive Systems
keathley
44
3k
Typedesign – Prime Four
hannesfritz
42
3k
How to Talk to Developers About Accessibility
jct
2
190
Color Theory Basics | Prateek | Gurzu
gurzu
0
300
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Transcript
The GHTorrent Dataset and Tool Suite Georgios Gousios Software Engineering
Research Group TU Delft
All data from Github
Ready to be queried
ghtorrent.org
Repositories
Commits
Pull requests
Issues
Users and Organizations
Mirror event stream
<<event>> PushEvent <<api>> /users/:user ensure_user <<api>> /repos/:user/:repo/ ensure_repo <<api>> /repos/:user/:repo/commits
ensure_commits ensure_user <<api>> /:user/:repo/sha ensure_commit ensure_user <<api>> /users/:user/ followers ensure_followers <<api>> /repos/:user/:repo/ commits/:sha/comments ensure_commit_comments <<api>> /users/:user/orgs ensure_orgs <<api>> /orgs/:org/teams ensure_teams Recursive dependency retrieval
Build relational database to query
repositories users organizations issues /users/:user /user/repos /repos/:user/:repo/issues /orgs/:org { 88"type":8"User",
88"public_gists":80, 88"login":8"gousiosg", 88"followers":88, 88"name":8"Georgios8Gousios", 88"public_repos":84, 88"created_at":8..., 88"id":8386172, 88"following":84, } { . . . CoSQL database as cache
Periodic dumps of DBs online
Query relational DB online
$ gem install sqlite3 ghtorrent $ ght-retrieve-repo mojombo jekyll $
(edit config.yaml) Roll your own tools
Research !
Single developer identities
Single developer identities
Single developer identities
Single developer identities
Single developer identities
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Network analysis
Distributed development Text Text TUD-SERG-2013-10 An Exploratory Study of the
Pull- based Software Development model
None
None
None
None
None
ghtorrent.org Octicons font: courtesy Github