Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
The GHTorrent dataset and toolsuite
Search
Georgios Gousios
May 17, 2013
Technology
4
130k
The GHTorrent dataset and toolsuite
MSR2013 data paper presentation
Georgios Gousios
May 17, 2013
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = ❤️
gousiosg
0
330
The troubles of modern dependency management and what to do about them
gousiosg
0
650
Mining Repositories with Apache Spark
gousiosg
0
700
My adventures with open everything
gousiosg
0
350
Structure and Evolution of Package Dependency Networks
gousiosg
0
850
Mining Github for fun and profit
gousiosg
9
63k
GitHub Insights: Understanding Open Source
gousiosg
0
420
Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective
gousiosg
0
970
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
340
Other Decks in Technology
See All in Technology
白金鉱業Meetup_Vol.22_Orbital Senseを支える衛星画像のマルチモーダルエンベディングと地理空間のあいまい検索技術
brainpadpr
2
220
Kiro のクレジットを使い切る!
otanikohei2023
0
110
型を書かないRuby開発への挑戦
riseshia
0
190
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
3k
わたしがセキュアにAWSを使えるわけないじゃん、ムリムリ!(※ムリじゃなかった!?)
cmusudakeisuke
0
170
EMからVPoEを経てCTOへ:マネジメントキャリアパスにおける葛藤と成長
kakehashi
PRO
7
920
元エンジニアPdM、IDEが恋しすぎてCursorに全業務を集約したら、スライド作成まで爆速になった話
doiko123
1
200
Exadata Fleet Update
oracle4engineer
PRO
0
1.3k
タスク管理も1on1も、もう「管理」じゃない ― KiroとBedrock AgentCoreで変わった"判断の仕事"
yusukeshimizu
0
220
ビズリーチにおける検索・推薦の取り組み / DEIM2026
visional_engineering_and_design
1
110
LLM のプロダクト導入における開発の裏側と技術的挑戦
recruitengineers
PRO
1
110
EMからICへ、二周目人材としてAI全振りのプロダクト開発で見つけた武器
yug1224
4
430
Featured
See All Featured
Tell your own story through comics
letsgokoyo
1
830
Google's AI Overviews - The New Search
badams
0
930
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
100
The Invisible Side of Design
smashingmag
302
51k
What does AI have to do with Human Rights?
axbom
PRO
1
2k
Ethics towards AI in product and experience design
skipperchong
2
210
Mind Mapping
helmedeiros
PRO
1
110
Git: the NoSQL Database
bkeepers
PRO
432
66k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
110
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
68
Everyday Curiosity
cassininazir
0
150
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
62
51k
Transcript
The GHTorrent Dataset and Tool Suite Georgios Gousios Software Engineering
Research Group TU Delft
All data from Github
Ready to be queried
ghtorrent.org
Repositories
Commits
Pull requests
Issues
Users and Organizations
Mirror event stream
<<event>> PushEvent <<api>> /users/:user ensure_user <<api>> /repos/:user/:repo/ ensure_repo <<api>> /repos/:user/:repo/commits
ensure_commits ensure_user <<api>> /:user/:repo/sha ensure_commit ensure_user <<api>> /users/:user/ followers ensure_followers <<api>> /repos/:user/:repo/ commits/:sha/comments ensure_commit_comments <<api>> /users/:user/orgs ensure_orgs <<api>> /orgs/:org/teams ensure_teams Recursive dependency retrieval
Build relational database to query
repositories users organizations issues /users/:user /user/repos /repos/:user/:repo/issues /orgs/:org { 88"type":8"User",
88"public_gists":80, 88"login":8"gousiosg", 88"followers":88, 88"name":8"Georgios8Gousios", 88"public_repos":84, 88"created_at":8..., 88"id":8386172, 88"following":84, } { . . . CoSQL database as cache
Periodic dumps of DBs online
Query relational DB online
$ gem install sqlite3 ghtorrent $ ght-retrieve-repo mojombo jekyll $
(edit config.yaml) Roll your own tools
Research !
Single developer identities
Single developer identities
Single developer identities
Single developer identities
Single developer identities
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Source tracking
Network analysis
Distributed development Text Text TUD-SERG-2013-10 An Exploratory Study of the
Pull- based Software Development model
None
None
None
None
None
ghtorrent.org Octicons font: courtesy Github