Upgrade to Pro — share decks privately, control downloads, hide ads and more …

UTexas GSPS Jan 31, 2014 (revised). Data Scien...

gully
January 31, 2014

UTexas GSPS Jan 31, 2014 (revised). Data Science in Astronomy; git/GitHub

A presentation for the University of Texas at Austin Department of Astronomy Graduate Student Postdoc Seminar (GSPS). The topic is building skills in modern computing, specifically git and GitHub. The goal is to generate a discussion on the tradeoffs of investing in software version control and collaboration. This revised version has more content, specifically resources, and some pedagogical/graphical improvements.

gully

January 31, 2014
Tweet

More Decks by gully

Other Decks in Science

Transcript

  1. michael gully-santiago Graduate student at UTexas Astronomy Aug 25, 2008

    – May 2015 (projected) Advisor: Dan Jaffe I make diffraction gratings from single crystal silicon. I work on brown dwarfs, and have broad interests in star and planet formation. !""#$%%"!&'()'#*(+&,"-,(. /!&01()'02*(+&," 3,('0/&.#45"& 6&.7'8&*9 :"*(;&9 /*<0"(0;&&#09"*(;&905"0=#> ?7'7.).09"*(;&0@&7A!"0790B#> C(*0"!7,;&*09"*(;&90)9&0&D&'0 ').E&*9$0F#>G0H#>0&",- 6&.&.E&*0"(0&>#5'809"*(;&90 E&I(*&095D7'A05905'0:JK0 :7L& M5''("0E&0@78&*0(*0"544&*0"!5'0 NOO#>0P5*"E(5*8097L&Q :,54&0<()*07,('0"(0I7440590.),!0(I0 "!&05*"E(5*80590#(997E4& R'A*()# 3I0<()*08&97A'0!590.(*&0"!5'0('&0 9!5#&G0.5;&09)*&0"(0)'A*()# :5D&059 :5D&0590-:JK05'80.5;&09)*&0 SR9&0T*"E(5*89U0790,!&,;&8 NOO#> -:JK Lately, I’ve been building my skills in statistics, data mining, machine learning, and modern computing. why?
  2. The volume of data 0.01! 0.1! 1! 10! 100! 1000!

    10000! 100000! 1995! 2000! 2005! 2010! 2015! 2020! 2025! Data rate (TB / year)! Year! Data rates in astronomy and elsewhere! SDSS 2MASS gully’s data HETDEX NYSE Facebook LSST sources: SDSS Bill Howe (UW) 2MASS http://spider.ipac.caltech.edu/staff/roc/2mass/archive/data.profile.v3.html My data set MGS HETDEX http://hetdex.org/pdfs/research/Hill1.pdf LSST Bill Howe (UW) NYSE http://marciaconner.com/blog/data-on-big-data/ Facebook http://gigaom.com/2012/08/22/facebook-is-collecting-your-data-500-terabytes-a-day/
  3. Here is a 94 second segment from a Coursera video.

    It’s from 0:30 to 2:14 of ‘eScience’ in Bill Howe’s Introduction to Data Science https://class.coursera.org/datasci-001/lecture/19
  4. Key idea. The skills that will be useful for astronomy

    already are useful for data science.
  5. Key idea. The skills that will be useful for astronomy

    already are useful for data science. databases Python git & GitHub NoSQL Cloud Computing Machine Learning R SQL MapReduce /Hadoop Visualizations Automated analysis
  6. Key insight. Let’s build data science skills, because it will

    make our astronomy better, and better prepare us for NAPs*. It’s a win-win. *NAPs Non Academic Professions (C. Lindner talk from GSPS Jan. 17, 2014)
  7. Key insight. Let’s build data science skills, because it will

    make our astronomy better, and better prepare us for NAPs*. It’s a win-win.
  8. Key insight. Let’s build data science skills, because it will

    make our astronomy better, and better prepare us for NAPs*. It’s a win-win.
  9. databases Python git & GitHub NoSQL Cloud Computing Machine Learning

    R SQL MapReduce /Hadoop Visualizations Automated analysis Our strategy. Let’s follow Brian Mulligan’s advice, and focus on just a few things.
  10. Our strategy. Let’s follow Brian Mulligan’s advice, and focus on

    just a few things. Python git & GitHub Machine Learning
  11. Python Machine Learning These are the main topics of our

    data science in astronomy meetup. gigayear.weebly.com/data-science.html mailing list http://eepurl.com/LdArH
  12. git and GitHub demo pull request to astroML code base

    Visit astroML github page: https://github.com/astroML 1) Update the README.md file with this new text: Page 130: The denominator of the argument of the exponential of Eq. (4.11) should be sigma squared, not sigma, to better match Eq. (3.43) and lead to Eq. (4.13). 2) git status, git add, git commit, git push 3) Perform a pull request on GitHub
  13. [email protected] | astronomer and engineer attribution to: Pierre TORET, from

    The Noun Project Sá Ferreira - Purple Matter, from The Noun Project !""#$%%"!&'()'#*(+&,"-,(. /!&01()'02*(+&," 3,('0/&.#45"& 6&.7'8&*9 :"*(;&9 /*<0"(0;&&#09"*(;&905"0=#> ?7'7.).09"*(;&0@&7A!"0790B#> C(*0"!7,;&*09"*(;&90)9&0&D&'0 ').E&*9$0F#>G0H#>0&",- 6&.&.E&*0"(0&>#5'809"*(;&90 E&I(*&095D7'A05905'0:JK0 :7L& M5''("0E&0@78&*0(*0"544&*0"!5'0 NOO#>0P5*"E(5*8097L&Q :,54&0<()*07,('0"(0I7440590.),!0(I0 "!&05*"E(5*80590#(997E4& R'A*()# 3I0<()*08&97A'0!590.(*&0"!5'0('&0 9!5#&G0.5;&09)*&0"(0)'A*()# :5D&059 :5D&0590-:JK05'80.5;&09)*&0 SR9&0T*"E(5*89U0790,!&,;&8 NOO#> -:JK Thank you. This presentation is available for download on speakerdeck Open questions for discussion Is this all worth it? Will this put more papers in the ApJ? When is the best time to invest? Is it still useful if I’m not collaborating? Are we getting what we want from the Dept.? How do we build synergies within the Dept.? How to build momentum, overcome inertia
  14. Global Resources codeschool.com is a great way to quickly learn

    git try.github.io is a great way to try the basics of git astroml.org contains Astronomy specific machine learning code coursera.org/course/datasci has free online videos
  15. aas.org/posts/story/2014/01/astrophysics-code-sharing-ii-sequel Making Your Work More Valuable by Giving It Away

    Benjamin Weiner (University of Arizona) NSF Policies on Software and Data Sharing Daniel Katz (National Science Foundation) The Astropy Project’s Self-Herding Cats Development Model Erik Tollerud (Yale University) Costs and Benefits of Developing Out in the Open David W. Hogg (New York University)
  16. Local Resources UT Austin data science in astronomy meetup- times

    vary Next week’s grad student town hall- (& proposal to astro Faculty) Friday, Feb 7 at 1pm in the classroom UT Austin Astronomy GitHub Organization: OttoStruve
  17. The data science in astronomy meetup- times vary Next week’s

    grad student town hall Friday, Feb 7 at 1pm in the classroom UT Austin Astronomy GitHub Organization: OttoStruve