Upgrade to PRO for Only $50/YearâLimited-Time Offer! đĽ
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A dataset for pull based development research
Search
Georgios Gousios
May 31, 2014
0
910
A dataset for pull based development research
MSR 2014 best data paper award presentation
Georgios Gousios
May 31, 2014
Tweet
Share
More Decks by Georgios Gousios
See All by Georgios Gousios
NLP + SE = â¤ď¸
gousiosg
0
300
The troubles of modern dependency management and what to do about them
gousiosg
0
570
Mining Repositories with Apache Spark
gousiosg
0
680
My adventures with open everything
gousiosg
0
310
Structure and Evolution of Package Dependency Networks
gousiosg
0
800
Mining Github for fun and profit
gousiosg
9
63k
GitHub Insights: Understanding Open Source
gousiosg
0
390
Work Practices and Challenges in Pull-Based Development: The Contributorâs Perspective
gousiosg
0
940
Big Data in Software Engineering panel and Privacy: Should we care?
gousiosg
0
310
Featured
See All Featured
Balancing Empowerment & Direction
lara
5
780
Music & Morning Musume
bryan
46
7k
A better future with KSS
kneath
240
18k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Design and Strategy: How to Deal with People Who Donât "Get" Design
morganepeng
132
19k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
Code Review Best Practice
trishagee
73
19k
Fireside Chat
paigeccino
41
3.7k
Building a Modern Day â¨E-commerce SEO Strategy
aleyda
45
8.3k
How STYLIGHT went responsive
nonsquared
100
5.9k
Side Projects
sachag
455
43k
Transcript
A dataset for pull request research Georgios Gousios and Andy
Zaidman @gousiosg
None
900 projects
350,000 pull reqs
40 features! (patch size, code reviews, testing, social) lifetime_minutes mergetime_minutes
num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touch test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers
suite of tools in R! (manipulation, selection, machine learning)
None
Wed, Jun 4! 17:30! Hall 3 Software Engineering Research Group
http://swerl.tudelft.nl/ Delft University of Technology A dataset for pull-based development research Georgios Gousios and Andy Zaidman {g.gousios, a.e.zaidman}@tudelft.nl Diamond Markus fuel mongoâpythonâdriver openproject pentahoâreporting hazelcast DIRAC railsâi18n gxa hibernateâogm celery homebrewâphp www.gittip.com flask serverspec pybossa head cyder cython maraschino Vanilla SynapseâRepositoryâServices growstuff oozie SocorroâTests RxJava omeâdocumentation sâramp Terasology mrjob firefoxâflicks onepercentclubâsite virtâtest zanataâserver paperclip heroappâLetsHire cylc cryptography Resteasy conductor cultureâhub openengsbâframework pivotal_workstation PyBitmessage topaz linguist jbosstoolsâopenshift sunpy XPrivacy DSpace rspecârails middlemanâguides shinken nagios githubâservices ncs_navigator_core katelloâapi djangoâcms cucumberâjvm veewee jbosstoolsâintegrationâtests retrofit metrics gaffer nrv scikitâimage SilverpeasâCore mev mneâpython infinispan unknownâhorizons iris ra1stats lorsource scalding formaâclj marketplaceâtests neutrino bosh modeshape hector mcomâtests refinerycmsâblog spokenvote SickâBeard pyon geotrellis ztltest jbossâeapâquickstarts fail2ban gateinâportal bioformats simple_form imagefactory goldrush frontend sequel metasploitâframework CocoaPods platform logstash scrapy docs SciELOâManager sidekiq chef wakameâvdc ggrcâcore hibernateâorm omniauth ckan stacktach sunspot nova brooklyn teiidâdesigner sentry grape missionhub playframework okhttp pry 24pullrequests progit originâserver OpenSlides scalaâide commcareâhq sbt dataâaccess spree website phoenix CatacombâSnatch summingbird ralph activerecordâjdbcâadapter rootpy jekyll pelican matplotlib geotools dynmap qiime Spout cobbler compass pyramid tatami biopython hornetq gatling rails_admin ravenâpython reddit limsâcore basho_docs liferayâportal asciidoctor SalesforceMobileSDKâAndroid theâblueâalliance vumiâgo tire METAâSHARE GravityBox emesene web2py developer.github.com rubocop bitHopper dagger configuration ninjaâide ruby fog vumi rudder chiliproject netty bedrock zf2âdocumentation play active_merchant kaâlite HiggsAnalysisâHiggsToTauTau ursula youtubeâdl buildbot nltk pyes AddonâTests tools grailsâcore psychopy erpnext karmaâexchange alaveteli totalfinderâi18n WMCore coworfing CouchPotatoServer rose ecms webpay sympy nexusâoss Osmand active_admin EquivalentâExchangeâ3 iSENSEâHardware zamboni spray gitlabhq errbit usergridâstack Printrun rspecâcore celluloid homebrewâscience candlepin core gunicorn hy travisâcore socorro BPSF railo geocoder kâ9 ADL_LRS otter OTM2 katello openmicroscopy oiâuserland zipkin geoserver foreman djangoârestâframework django pandas mezzanine whitehall mifosx pyzmq cookbooks mongoengine pulp_rpm werkzeug middleman pentahoâcommonsâgwtâmodules nikola right_link dropwizard rosdistro ycpâkiller rspecâexpectations salt miro rstat.us components eden Bukkit basex engine mavenâandroidâplugin djangoâsocialâauth appscale kotlin hibernateâvalidator padrinoâframework rSENSE refinerycms amu_automata_2011 chillingeffects SynapseWebClient spree_i18n riakâjavaâclient wonder socorroâcrashstats homebrew vagrant loomio scalaz sagecell neo4j wildfly jagger formtastic awsâsdkâruby ipython sufia cas storm exercism.io heroku androidâsync kuma carrierwave c2cgeoportal bundler pylearn2 meniscus rspecâmocks hibernateâsearch jbossâasâquickstart junit Theano jruby oqâengine mcMMO Essentials repose smartâanswers androidannotations groovyâcore akka elasticsearch ssGWTâlib homebrewâcask sveditor mongoârubyâdriver câgeoâopensource EMS ENdoSnipe mopidy rails adhocracy chefâcookbooks saltâcloud formhub SimpleCV bitmask_client requests ActionBarSherlock jbosstoolsâjst dcache OpenGenesis subscriptionâmanager sensuâcommunityâplugins socode nipype linkit appscaleâtools addonâsdk sumoâtests tornado capybara Superdesk pentahoâplatform resque sqlâlayer openâbuildâservice draper springâbatch otwarchive berkshelf shouldaâmatchers rubygems unisubs edxâplatform liquibase fpm play1 capybaraâwebkit www.rubyâlang.org pip TFCraft floodlight qtile limsâapi sched.do nexus rubinius octopress fuse MinecraftâOverviewer reddeer opennaas mail djangoâtimepiece picketlink leap_client sequencescape xwikiâplatform springâintegration MinecraftForge cgeo OpenMDAOâFramework boto djangoâtastypie ESPâWebsite devise orbisgis sensu Catroid MyJobs kitsune djangoâextensions active_model_serializers statsmodels calcentral rhc jbosstoolsâbase SilverpeasâComponents autotest pyload autopsy addmeto.cc coiâservices springâframework remo quickstarts elephantâbird Razor brakeman atlas mule andlytics diaspora jclouds drools druid dipy mongoid CONNECT djangoâoscar bigbluebutton BuildCraft pythonâguide spark scala narayana twitterâbootstrapârails kivy cucumber molgenis diffa Play20 CraftBukkit pulp regulationsâsite OWD_TEST_TOOLKIT droidplanner openstreetmapâwebsite 900 projects 350,000 pull reqs lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers 40 features Churn src, test Participants Project popularity Commits Files src, doc Forward links Pull request submitter followers, track record Code reviews Merges (also those outside Github) https://github.com/gousiosg/pullreqs ML suite in R @gousiosg