From the lab to the factory - Data Day Texas Slides: http://www.slideshare.net/joshwills/production-machine-learninginfrastructure Video: https://www.youtube.com/watch?v=v-91JycaKjc
to learn • Rich ecosystem of libraries: high quality and quantity. Growing at a rapid pace. • Developer Community - Conferences: SciPy, PyData… • Mature core Scientific Computing libraries (bindings C/C++ or Fortran) • Glue language • Diverse users: SysAdmins, Web developers, Scientists, Statisticians… enables cross teams collaboration • Analysis -> Production (vs R, Matlab…) • R: "The best thing about R is that it was written by statisticians. The worst thing about R is that it was written by statisticians." Bow Cogwill • Matlab: $$$, not open http://nbviewer.ipython.org/github/twiecki/pydata_ninja/blob/master/PyData%20Ninja.ipynb
+ pip… IDE Spyder, PyCharm, Atom, Rodeo, Sublime DATA MUNGING numpy, xray, DATA VISUALIZATION matplotlib, seaborn, pyxley, plotly, lightning Bokeh Sat - 12:30 p.m. Introducción a visualizaciones interactivas con Bokeh ! Alejandro Vidal Sat -3:40 p.m. Data structures beyond dicts and lists ! Sergi Sorribas
DEEP LEARNING Caffe, Keras, TensorFlow… BIG DATA - dask, bcolz - Hadoop, Spark, Impala, Ibis Sun - 11:50 a.m. Trolling Detection with Scikit-learn and NLTK ! Rafa Haro Sun - 12:30 p.m. Tratando datos más allá de los límites de la memoria ! Francesc Alted Medium Data and Distributed computing + Lasagne +
/ PIPELINES Airflow Luigi NLP Spacy Gensim, NLTK STATISTICS PyMC, PyMC3… Sat - 3 p.m. Know your models - Statsmodels! ! Israel Saeta Pérez y Miquel Camprodon Sun - 1:10 p.m. Dive into Scrapy ! Juan Riaza Another time
https://jakevdp.github.io/blog/2015/10/17/analyzing-pronto-cycleshare-data-with-python-and-pandas/ BLOGPOST: Analyzing Pronto CycleShare Data with Python and Pandas, Jake VanderPlas https://github.com/jreback/PyDataNYC2015/
De Ven, Sarah Bird PyDataLDN 2015 https://www.youtube.com/watch?v=XBiS0oBzX3o ! http://nbviewer.ipython.org/github/bokeh/bokeh-notebooks/blob/master/ index.ipynb#Tutorial
Regression categorical quantitative id gender age job_id 1 F 67 1 2 M 32 2 3 M 45 1 4 F 18 2 group similar individuals together id gend er age job_i d buy/click_ad money_spent 1 F 67 1 Yes $1,000 2 M 32 2 No - 3 M 45 1 No - 4 F 18 2 Yes $300 predict whether an individual is going to buy/click or not Classification Regression predict how much is the individual going to spend