Upgrade to Pro — share decks privately, control downloads, hide ads and more …

John Lin - IPython Notebook - PyDSLA meetup - N...

Avatar for Data Science LA Data Science LA
November 05, 2014
2.8k

John Lin - IPython Notebook - PyDSLA meetup - Nov 2014

Avatar for Data Science LA

Data Science LA

November 05, 2014
Tweet

More Decks by Data Science LA

Transcript

  1. iPython  Notebook  for  Data   Analysis   John  Lin  

    johnclin.com   (TrueCar  Data  Scien?st  –  we’re  hiring!)  
  2. A  liFle  about  me  …   •  Working  at  TrueCar,

     working  on  both  data   analy?cs  and  data  engineering  projects.   •  Experimental  Economist  by  training   –  Caltech  and  University  of  Michigan.   –  Lots  of  stats/econometric.   –  Game  theory/mechanism  design.   •  Programmer   –  Built  web-­‐based  financial    markets  at  Caltech  and   Michigan.   –  Building  robust  analy?cal  data  ETLs  at  TrueCar,  with   small  and  Big  data.    Lots  and  lots  of  data  …  
  3. BIG  Picture   •  iPython  Notebook  is:   – Easy  to

     install.   – Powerful  environment  in  its  own  right.   – The  founda?onal  environment  for  other  Python   data  packages:   •  Pandas   •  Matplotlib   – A  very  good  tool.  
  4. Overview   •  Diving  straight  in.   •  Installing  the

     iPython  Notebook.   •  Some  interes?ng  features.   •  Pros  and  Gotchas  of  Using  the  iPython   Notebook.   •  How  to  learn  more?  
  5. Installing  and  Running   •  Installing  the  iPython  Notebook  

    – pip  install  ipython  pyzmq  jinja2  tornado   •  pyzmq  takes  a  bit  longer  to  build  on  a  Mac   •  Running/launching  the  iPython  notebook   – ipython  notebook   – Note  that  ipython  is  a  shell,  ipython  notebook  is  a   browser  based  interface  
  6. Pros  of  Using  the  iPython  Notebook   –  iPython  Notebook

     is  interac?ve.    Great  for  data   analysis!   •  This  may  not  seem  like  a  big  deal  at  first  if  you  haven’t  done   a  lot  of  data  processing  work,  but  it  is!   •  Imagine  the  alterna?ve:   –  Edit  the  program  file.   –  Run  the  program  and  look  at  the  output  text  in  a  text  editor.   –  Repeat  endless  ?mes.       –  And  how  do  you  visualize  the  data?    Output  to  file  and  click  to   show  on  browser?   –  The  iPython  Notebook,  along  with  pandas  and   matplotlib,  provide  a  powerful  combina?on  of  tools  to   itera?vely  examine,  process,  and  visualize  data.  
  7. Gotchas  of  Using  the  iPython   Notebook.   •  The

     raw  iPython  notebook  is  not  very   readable  as  it  contains  a  lot  of  HTML   formaang  code.   •  Hard  to  read  the  code  in  github.   – Though  it  is  easy  to  convert  a  iPython  notebook  to   other  formats  (html,  python  code)  using                       ‘ipython  nbconvert’   •  Diffs  (‘diff’  or  ‘git  diff’)  are  a  lot  less  helpful   when  comparing  iPython  notebooks.  
  8. Gotchas  of  Using  the  iPython   Notebook   •  Because

     it  encourages  interac?ve  coding,  it  is   easy  to  pollute  the  name  space.       •  This  makes  the  code  hard  to  debug  because  you   may  have  over-­‐wriFen  a  variable  and  had   forgoFen  about  it.   –  When  in  doubt,  re-­‐start  the  kernel,  and  run  the   process  through  one  step  at  a  ?me  from  the  top.   –  Rename  variables  ader  a  transforma?on  step.   –  Break  your  code  into  separate  cells.   –  Leverage  methods  and  classes  as  appropriate.  
  9. How  to  Learn  More   •  hFp://iPython.org    (The  Mothership.)

      •  hFps://github.com/ipython/ipython/wiki/A-­‐ gallery-­‐of-­‐interes?ng-­‐IPython-­‐Notebooks     (Repository  of  iPython  notebooks.)   •  hFp://con?nuum.io/wakari    (Online  hos?ng  of   iPython  notebooks.)