GitHub API ! • Since Feb 2012 • 5,6TB in MongoDB • 600M rows in MySQL • 1GB per hour Georgios Gousios: The GHTorrent dataset and tool suite. MSR 2013: 233-236
1 day; are briefly discussed Georgios Gousios, Martin Pinzger and Arie van Deursen:An exploratory study of the pull-based software development model. ICSE 2014: 345-355
1 day; are briefly discussed are merged when they affect a hot project area Georgios Gousios, Martin Pinzger and Arie van Deursen:An exploratory study of the pull-based software development model. ICSE 2014: 345-355
1 day; are briefly discussed are merged when they affect a hot project area are processed fast when project has test suite Georgios Gousios, Martin Pinzger and Arie van Deursen:An exploratory study of the pull-based software development model. ICSE 2014: 345-355
1 day; are briefly discussed are merged when they affect a hot project area are processed fast when project has test suite are processed fast when contributor has good track record Georgios Gousios, Martin Pinzger and Arie van Deursen:An exploratory study of the pull-based software development model. ICSE 2014: 345-355
1 day; are briefly discussed are merged when they affect a hot project area are processed fast when project has test suite are processed fast when contributor has good track record are rejected mostly due to insufficient task articulation Georgios Gousios, Martin Pinzger and Arie van Deursen:An exploratory study of the pull-based software development model. ICSE 2014: 345-355
• 25 questions, 7 open-ended • By personal invitation to 3,400 projects Georgios Gousios, Andy Zaidman, Margaret-Anne Storey and Arie van Deursen. Work practices and Challenges in pull based development: The integrator’s perspective. Tech Report TUD-SERG-2014-13
more work bikeshedding hit 'n' run RPs poor documentation age syncing feature isolation developer availability conflicts differences in opinion motivating contributors generalizing solutions tools git knoweledge size review tools testing responsiveness maintain vision volume explaining rejection reviewing maintaining quality time rank Top Second Third Project owners
LOT of features and fixes ALL AT ONCE!’ that are hell to review and that I’d like to *partially* reject if only the parts were in any way separable.. R42: Lack of knowledge of git from contributors; most don’t know how to resolve a merge conflict. R514: Sifting through the GitHub information flood to find what, if any, I should address. R635: I worry about alienating our valued contributors R449: Dealing with loud and trigger-happy developers.
predicting acceptance correctness fairness personal skills time code quality appreciation turtle awareness testing politics fear of rejection code review impact analysis tools desirability communication explain rationale conflicts understanding code base project compliance responsiveness rank Top Second Third Contributors
actually being accepted R202: Ensuring that my pull request doesn't have unintended side effects due to not being intimately familiar with the entire code base. R711: The supercommitter issue. The code you are modifying was probably written by the gatekeeper for the pull requests. R635: Navigating through forks of an abandoned project to see if someone implemented that already. R145: Facing nazi project maintainers. R299: Cockyness
branches • Coding style • Commit guidelines • Communication options. PRs are post-hoc communication. • Include updated list of low hanging fruit for beginners People generally expect a CONTRIBUTING.md file
infrastructure setup accept design changes guidelines predicting acceptance correctness fairness personal skills time code quality appreciation turtle awareness testing politics fear of rejection code review impact analysis tools desirability communication explain rationale conflicts understanding code base project compliance responsiveness 0.0 2.5 5.0 7.5 Percentage of responses rank Top Second Third What is the biggest challenge with PRs? accepting blame communicating goals and standards context switching multiple communication channels reaching consensus poor notifications project speed process ignorance timezones coordination among contributors coordination among integrators impact politeness asking more work bikeshedding hit 'n' run RPs poor documentation age syncing feature isolation developer availability conflicts differences in opinion motivating contributors generalizing solutions tools git knoweledge size review tools testing responsiveness maintain vision volume explaining rejection reviewing maintaining quality time 0.0 2.5 5.0 7.5 Percentage of responses rank Top Second Third @gousiosg