Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A dataset for pull based development research

Georgios Gousios
May 31, 2014
830

A dataset for pull based development research

MSR 2014 best data paper award presentation

Georgios Gousios

May 31, 2014
Tweet

More Decks by Georgios Gousios

Transcript

  1. 40 features! (patch size, code reviews, testing, social) lifetime_minutes mergetime_minutes

    num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touch test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers
  2. Wed, Jun 4! 17:30! Hall 3 Software Engineering Research Group

    http://swerl.tudelft.nl/ Delft University of Technology A dataset for pull-based development research Georgios Gousios and Andy Zaidman {g.gousios, a.e.zaidman}@tudelft.nl Diamond Markus fuel mongo−python−driver openproject pentaho−reporting hazelcast DIRAC rails−i18n gxa hibernate−ogm celery homebrew−php www.gittip.com flask serverspec pybossa head cyder cython maraschino Vanilla Synapse−Repository−Services growstuff oozie Socorro−Tests RxJava ome−documentation s−ramp Terasology mrjob firefox−flicks onepercentclub−site virt−test zanata−server paperclip heroapp−LetsHire cylc cryptography Resteasy conductor culture−hub openengsb−framework pivotal_workstation PyBitmessage topaz linguist jbosstools−openshift sunpy XPrivacy DSpace rspec−rails middleman−guides shinken nagios github−services ncs_navigator_core katello−api django−cms cucumber−jvm veewee jbosstools−integration−tests retrofit metrics gaffer nrv scikit−image Silverpeas−Core mev mne−python infinispan unknown−horizons iris ra1stats lorsource scalding forma−clj marketplace−tests neutrino bosh modeshape hector mcom−tests refinerycms−blog spokenvote Sick−Beard pyon geotrellis ztltest jboss−eap−quickstarts fail2ban gatein−portal bioformats simple_form imagefactory goldrush frontend sequel metasploit−framework CocoaPods platform logstash scrapy docs SciELO−Manager sidekiq chef wakame−vdc ggrc−core hibernate−orm omniauth ckan stacktach sunspot nova brooklyn teiid−designer sentry grape missionhub playframework okhttp pry 24pullrequests progit origin−server OpenSlides scala−ide commcare−hq sbt data−access spree website phoenix Catacomb−Snatch summingbird ralph activerecord−jdbc−adapter rootpy jekyll pelican matplotlib geotools dynmap qiime Spout cobbler compass pyramid tatami biopython hornetq gatling rails_admin raven−python reddit lims−core basho_docs liferay−portal asciidoctor SalesforceMobileSDK−Android the−blue−alliance vumi−go tire META−SHARE GravityBox emesene web2py developer.github.com rubocop bitHopper dagger configuration ninja−ide ruby fog vumi rudder chiliproject netty bedrock zf2−documentation play active_merchant ka−lite HiggsAnalysis−HiggsToTauTau ursula youtube−dl buildbot nltk pyes Addon−Tests tools grails−core psychopy erpnext karma−exchange alaveteli totalfinder−i18n WMCore coworfing CouchPotatoServer rose ecms webpay sympy nexus−oss Osmand active_admin Equivalent−Exchange−3 iSENSE−Hardware zamboni spray gitlabhq errbit usergrid−stack Printrun rspec−core celluloid homebrew−science candlepin core gunicorn hy travis−core socorro BPSF railo geocoder k−9 ADL_LRS otter OTM2 katello openmicroscopy oi−userland zipkin geoserver foreman django−rest−framework django pandas mezzanine whitehall mifosx pyzmq cookbooks mongoengine pulp_rpm werkzeug middleman pentaho−commons−gwt−modules nikola right_link dropwizard rosdistro ycp−killer rspec−expectations salt miro rstat.us components eden Bukkit basex engine maven−android−plugin django−social−auth appscale kotlin hibernate−validator padrino−framework rSENSE refinerycms amu_automata_2011 chillingeffects SynapseWebClient spree_i18n riak−java−client wonder socorro−crashstats homebrew vagrant loomio scalaz sagecell neo4j wildfly jagger formtastic aws−sdk−ruby ipython sufia cas storm exercism.io heroku android−sync kuma carrierwave c2cgeoportal bundler pylearn2 meniscus rspec−mocks hibernate−search jboss−as−quickstart junit Theano jruby oq−engine mcMMO Essentials repose smart−answers androidannotations groovy−core akka elasticsearch ssGWT−lib homebrew−cask sveditor mongo−ruby−driver c−geo−opensource EMS ENdoSnipe mopidy rails adhocracy chef−cookbooks salt−cloud formhub SimpleCV bitmask_client requests ActionBarSherlock jbosstools−jst dcache OpenGenesis subscription−manager sensu−community−plugins socode nipype linkit appscale−tools addon−sdk sumo−tests tornado capybara Superdesk pentaho−platform resque sql−layer open−build−service draper spring−batch otwarchive berkshelf shoulda−matchers rubygems unisubs edx−platform liquibase fpm play1 capybara−webkit www.ruby−lang.org pip TFCraft floodlight qtile lims−api sched.do nexus rubinius octopress fuse Minecraft−Overviewer reddeer opennaas mail django−timepiece picketlink leap_client sequencescape xwiki−platform spring−integration MinecraftForge cgeo OpenMDAO−Framework boto django−tastypie ESP−Website devise orbisgis sensu Catroid MyJobs kitsune django−extensions active_model_serializers statsmodels calcentral rhc jbosstools−base Silverpeas−Components autotest pyload autopsy addmeto.cc coi−services spring−framework remo quickstarts elephant−bird Razor brakeman atlas mule andlytics diaspora jclouds drools druid dipy mongoid CONNECT django−oscar bigbluebutton BuildCraft python−guide spark scala narayana twitter−bootstrap−rails kivy cucumber molgenis diffa Play20 CraftBukkit pulp regulations−site OWD_TEST_TOOLKIT droidplanner openstreetmap−website 900 projects 350,000 pull reqs lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers lifetime_minutes mergetime_minutes num_commits src_churn test_churn files_added files_modified files_changed src_files doc_files other_files num_commit_comments num_issue_comments num_comments num_participants sloc team_size perc_external_contribs commits_on_files_touched test_lines_per_kloc test_cases_per_kloc asserts_per_kloc watchers prev_pullreqs requester_succ_rate followers 40 features Churn src, test Participants Project popularity Commits Files src, doc Forward links Pull request submitter followers, track record Code reviews Merges (also those outside Github) https://github.com/gousiosg/pullreqs ML suite in R @gousiosg