Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Homespun Decentralised DIY Data Science Resea...

Jos
September 16, 2016

A Homespun Decentralised DIY Data Science Research Pipeline for the Internet of *Your* Things

A very brief intro to data science from a computer scientist's point of view. Slides for a talk at DevDay Poland 2016 #abbdevday

Jos

September 16, 2016
Tweet

More Decks by Jos

Other Decks in Research

Transcript

  1. This publication has emanated from research supported in part by

    a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 JOSE DOMINGUEZ 2 @JOSMASFLORES
  2. This publication has emanated from research supported in part by

    a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 CONTRIBUTIONS 3
  3. AGENDA ▸ Homespun ▸ *Your* things in IoT ▸ *Your*

    Data ▸ Data Science ▸ Tools: End User Development ▸ Data Collection and Communication ▸ Backend and Storage ▸ Analysis and Visualisation 5
  4. 7

  5. 10

  6. 12

  7. 16

  8. 21

  9. WHAT KIND OF STUFF CAN YOU DO WITH *YOUR* THINGS?

    ▸ Grab raw data from the accelerometer ▸ Zero cross analysis can give you a rough amount of steps. 22
  10. GOOGLE APIS. SECTION 5: CONTENT b. Submission of Content Some

    of our APIs allow the submission of content. Google does not acquire any ownership of any intellectual property rights in the content that you submit to our APIs through your API Client, except as expressly provided in the Terms. For the sole purpose of enabling Google to provide, secure, and improve the APIs (and the related service(s)) and only in accordance with the applicable Google privacy policies, you give Google a perpetual, irrevocable, worldwide, sublicensable, royalty-free, and non-exclusive license to Use content submitted, posted, or displayed to or from the APIs through your API Client. ”Use” means use, host, store, modify, communicate, and publish. Before you submit content to our APIs through your API Client, you will ensure that you have the necessary rights (including the necessary rights from your end users) to grant us the license.
  11. GOOGLE APIS. SECTION 5: CONTENT b. Submission of Content Some

    of our APIs allow the submission of content. Google does not acquire any ownership of any intellectual property rights in the content that you submit to our APIs through your API Client, except as expressly provided in the Terms. For the sole purpose of enabling Google to provide, secure, and improve the APIs (and the related service(s)) and only in accordance with the applicable Google privacy policies, you give Google a perpetual, irrevocable, worldwide, sublicensable, royalty-free, and non-exclusive license to Use content submitted, posted, or displayed to or from the APIs through your API Client. ”Use” means use, host, store, modify, communicate, and publish. Before you submit content to our APIs through your API Client, you will ensure that you have the necessary rights (including the necessary rights from your end users) to grant us the license.
  12. POSTING YOUR CONTENT ON THE FITBIT SERVICE ▸ You may

    post photos, exercise regimens, food logs, recipes, comments, and other content (“Your Content”) to the Fitbit Service. You retain all rights to Your Content that you post to the Fitbit Service. By making Your Content available on or through the Fitbit Service you grant to Fitbit a non- exclusive, transferable, sublicensable, worldwide, royalty-free license to use, copy, modify, publicly display, publicly perform and distribute Your Content only in connection with operating and providing the Fitbit Service. 30
  13. WITH A SMALL NUMBER OF GEOLOCATION DATA POINTS (1 DAYS

    WORTH), LOCATION CAN BE INFERRED, POTENTIALLY LEADING TO PRIVACY DISCLOSURES Liccardi et al, 2016. I Know Where You Live: Inferring Details of People's Lives by Visualizing Publicly Shared Location Data (CHI '16).
  14. AN ANONYMISED MEDICAL DATABASE WAS SUCCESSFULLY COMBINED WITH A VOTERS

    LIST TO EXTRACT THE HEALTH RECORD OF THE GOVERNOR OF MASSACHUSETTS Sweeney, L. k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness and Knowledge-Based Systems 10, 557–570 (2002) 33
  15. USING PUBLICLY AVAILABLE DATA, THE TYPES OF LOCATIONS CAN BE

    USED TO ESTIMATE SOMEONE’S AVERAGE INCOME […], AVERAGE HOUSING COST, DEBT, […] POLITICAL VIEWS ETC. Liccardi et al, 2016. I Know Where You Live: Inferring Details of People's Lives by Visualizing Publicly Shared Location Data (CHI '16). 34
  16. HEALS AND GEORGATOS (2014), PRIVACY AND HEALTH: HOW MOBILE HEALTH

    ‘APPS’ FIT INTO A PRIVACY FRAMEWORK NOT LIMITED TO HIPPA ▸ Surveillance (unauthorised collection) ▸ Identification ▸ Insecurity (lack of encryption) ▸ Disclosure (of sensitive data to third parties) ▸ Aggregation (consumer profiles) 36
  17. A FEW MORE REASONS IN CASE YOU STILL DON’T CARE

    Steven Spann (2016). Wearable Fitness Devices: Health Data Privacy in Washington State
  18. DATA COULD BE USED TO LEGALLY OR ILLEGALLY RESTRICT AN

    INDIVIDUAL’S ABILITY TO ACCESS CERTAIN MARKETS Steven Spann
  19. STEVEN SPANN (2016) THROUGH THE USE OF PERSONAL DATA, ENTITIES

    COULD DISCRIMINATE AGAINST AND INDIVIDUAL IN: ▸Employment ▸Health Care and Insurance ▸Credit-based Lending and other life necessities and options
  20. TERMS AND CONDITIONS IN RESEARCH CONTEXTS ▸ T&C concerns can

    inadvertently break Ethics agreements. 42
  21. TERMS AND CONDITIONS IN RESEARCH CONTEXTS ▸ T&C concerns can

    inadvertently break Ethics agreements. 43
  22. 49

  23. DATA SCIENCE HOW TO READ THE DATA SCIENCE BENN DIAGRAM

    56 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  24. CS109 - HARVARD ▸ The prerequisite for this class is

    programming knowledge at the level of CS 50 (or above), and statistics knowledge at the level of Stat 100 (or above). ▸ Stats 110: sample spaces, naive definition of probability, counting, sampling, random variables, CDFs, PMFs, discrete vs. continuous, Hypergeometric, Poisson distribution, Poisson approximation, standard Normal, Normal normalizing constant, Markov chains, transition matrix, stationary distribution, Chi-Square, Student-t, Multivariate Normal 59
  25. DATA SCIENCE BE WEARY OF TERMINOLOGY ▸ Is it Machine

    Learning or Data Mining? Or is it Statistical Learning? 63
  26. DATA SCIENCE WHAT ARE YOU TRYING TO DO? ▸ Figure

    something out ▸ Predict something 64
  27. DATA SCIENCE TYPES OF LEARNING ▸ Supervised: dataset with tags

    ▸ Unsupervised: dataset with no tags 65
  28. DIY

  29. 70

  30. A SET OF METHODS, TECHNIQUES AND TOOLS THAT ALLOW USERS

    OF SOFTWARE SYSTEMS, WHO ARE ACTING AS NON-PROFESSIONAL SOFTWARE DEVELOPERS, AT SOME POINT TO CREATE, MODIFY, OR EXTEND A SOFTWARE ARTIFACT. Lieberman et al. 2006
  31. A SET OF METHODS, TECHNIQUES AND TOOLS THAT ALLOW USERS

    OF SOFTWARE SYSTEMS, WHO ARE ACTING AS NON-PROFESSIONAL SOFTWARE DEVELOPERS, AT SOME POINT TO CREATE, MODIFY, OR EXTEND A SOFTWARE ARTIFACT. Lieberman et al. 2006
  32. COMMUNICATION DATA COMMUNICATION 85 ‣ Use HTTP ‣ Save data

    to a file and send it to a server ‣ Do analysis directly on the phone; necessary if you need to do real time: see Jan Machacek's Exercise Analysis talk. ‣ You probably want to use MQTT
  33. COMMUNICATION AND STORAGE APP INVENTOR AND NODE-RED 91 ‣ Collect

    accelerometer data in App Inventor ‣ MQTT to a broker ‣ Subscribe in Node-RED: ‣ send to storage ‣ analyse ‣ visualise
  34. COMMUNICATION AND INTERACTION APP INVENTOR AND NODE-RED 92 ‣ Send

    a notification from Node-RED with AeroGear Push Server ‣ Get it on the phone with App Inventor Work In Progress! Node-RED node for Notifications: https://github.com/CLDTio/node-red-contrib-aerogear-notifications App Inventor Component for AeroGear: https://github.com/josmas/app-inventor/tree/aerogear
  35. TOOLS BACKEND AND STORAGE ▸ Use a commercial solution: ▸

    there are plenty, and now you know better so you can make an informed decision if you want to use it or not! ▸ iSense (read T&Cs first) ▸ Your own backend with Parse, MIT Solid, or any other solution you like (including rolling your own) 93
  36. CONTAINERS DOCKER FOR EVERYTHING 99 ‣ https://github.com/CLDTio/docker-influxdb ‣ https://github.com/CLDTio/appinventor-env-docker ‣

    https://github.com/tensorflow/tensorflow/tree/master/ tensorflow/tools/docker ‣ Browse the Docker Hub and Github for many more!
  37. TOOLS DATA ANALYSIS AND VISUALISATION ▸ iSense ▸ Weka ▸

    Machine learning ▸ Activity recognition ▸ Visualisation tools ▸ Python notebooks, R, Julia, spider, and so forth. 100
  38. PICTURE CREDITS ▸ Tablet: http://siliconangle.com/files/2012/02/Android_Portrait_Overview.jpg ▸ Smart Watch: https://lh3.ggpht.com/ElwMg8bubiVB33euotYaD_mpKxSrr7SXTsrMwamk3_SRZx1VYkqVT8-HvkQDqXvLWw=h900 ▸

    HeartRate monitor http://i00.i.aliimg.com/img/pb/355/386/400/400386355_864.jpg ▸ Arduino: https://cdn.instructables.com/FDW/WCKV/HKBG733D/FDWWCKVHKBG733D.MEDIUM.jpg ▸ Core motion axes: http://blog.denivip.ru/wp-content/uploads/2013/07/CoreMotionAxes.png ▸ Zero cross: https://c2.staticflickr.com/8/7269/7866678792_b375ae7b26.jpg ▸ Step counter: https://www.mathworks.com/matlabcentral/mlc-downloads/downloads/submissions/40876/versions/8/previews/sensorgroup/Examples/html/StepCounter_03.png ▸ Rewriting: http://www.boyter.org/wp-content/uploads/2016/04/Ce5nYp0W4AAxp5Z.jpg ▸ Machine learning NG: http://cdn.usefulstuff.io/2016/01/machine-learning-ng.jpg ▸ Researcher: http://images.clipartpanda.com/researcher-clipart-cartoons_laboratory_201533_tnb.png ▸ Ethics: http://tcrc.eu/en/wp-content/uploads/2013/06/ethics.png ▸ Developer: http://www.grapessoftware.com/wp-content/uploads/2014/02/Hire-Developer-Sprite.png ▸ Señor developer: http://startupmyway.com/wp-content/uploads/2016/02/senordeveloper.gif ▸ API integrations: http://www.surf2host.com/images/api_integrations.png ▸ Data Science: http://static1.squarespace.com/static/5150aec6e4b0e340ec52710a/t/51525c33e4b0b3e0d10f77ab/1364352052403/Data_Science_VD.png ▸ DreamWeaver: http://getintopc.com/wp-content/uploads/2014/02/dreamweaver-cs6-jquery-mobile.png ▸ Spreadsheet: http://www.openoffice.us.com/cmsimages/software/calc2.gif ▸ Advanced spreadsheet: http://www.comfsm.fm/~dleeling/statistics/sc3/cover.png ▸ LabView: http://sql-lv.sourceforge.net/new_sql_LV.png ▸ E-prime2: http://www.pstnet.com/internal/kbimage/1801-1.gif ▸ Blockchain: http://www.cbronline.com/Uploads/NewsArticle/4978988/main.jpg ▸ Ethereum: https://ethereum.org/images/wallpaper-homestead.jpg ▸ Types of networks: https://upload.wikimedia.org/wikipedia/en/b/ba/Centralised-decentralised-distributed.png ▸ AeroGear logo: https://yt3.ggpht.com/-9bdWbHB80Og/AAAAAAAAAAI/AAAAAAAAAAA/2VbWVkS5CmY/s88-c-k-no-mo-rj-c0xffffff/photo.jpg ▸ Iris dataset: http://5047-presscdn.pagely.netdna-cdn.com/wp-content/uploads/2015/04/iris_petal_sepal.png ▸ Jupyter: http://jupyter.org/assets/main-logo.svg ▸ Pandas: http://pandas.pydata.org/_static/pandas_logo.png 108