Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Data Explosion

Machine Data Explosion

How to amplify the value of IoT data

Mark Chmarny

May 10, 2016
Tweet

More Decks by Mark Chmarny

Other Decks in Technology

Transcript

  1. DATA DYNAMICS Data to grow 4.4ZB in 2015 from to

    44ZB in 2020 10x growth, 90% of that machine-generated, new device …but, amount of data from which we can derive value to increase only from 22% to 35% Sources: IDC
  2. Sources: Gartner | Cisco | Intel | IDC Gartner Cisco

    McKinsey IDC 26B 50B 200B 212B DEVICES Manufacturing Health Care Retail Security Transportation 40% 30% 8% 7% 4% VERTICALS Source: Intel 25+Million New Apps $4+Trillion Business Source: IDC OPPORTUNITY DATA GROWTH
  3. • Value increases when data is related (more context, better)

    • Single raw data source can drive many value chains (diff process, diff service) • Some data has no value until integrated or even when delivered as a service (noise vs signal) Raw Data Processed Data Integrated Data Data Services Symbol What Where When How Why Sources: SVDS Lack of trust Fear of sharing with partners, common perception of incompetency to protect “their” data Knee-jerk reaction 67% would rather lose the opportunity to monetize than to risk losing control Gray market Yet, over 60% of service-delivery companies already monetize collected data without original providers concent Source: Accenture INHIBITORS VALUE CHAIN Sources: Nate Silver’s book DATA VALUE
  4. Sales Fact Customer Dimension Supplier Dimension Store Dimension Geography Dimension

    Product Dimension TRADITIONAL EDW DATA NEW DEVICE DATA EVENT FACT 14 53939807 2657 ABC 0.034 X: Y:Z… When Where What Value EVENT FACT Dimension is the context so this is efficient: get sales where product = ‘x’ and supplier = ‘y' § Data (most) born in an absence of context (narcissistic device?) § Observations, by default, are immutable (don’t change after reading) § Individual events insignificant, more interesting the longer observed (series) Observation Actuation Persistence Latency Attributes Ingestion bandwidth important but “total latency” most critical NEW CONSUMPTION MODEL VERTICALS DATA TYPES
  5. Raw (as is) Prepared (cleansed, standardized) Processed (augmented, related) Consumed

    (acquired, served) Health (genomic) Text-based files in columnar structure Standardized formats (VCF, WDL, CDL…) Small data variation across sources (deltas consumed) Finance (market) • Primarily transactional • RDBMS managed • Diverse data structures (schema, codes, relation) • Requires transformation, standardization • Comes with a lot of context (relationships) • May benefit from out-of-domain links • Batch (file) or service (API) • Parameterized queries (question/answer) Industrial (machine) • Machine generated, minimal context • Already highly standardized data per device type • Immutable (doesn’t change after reading) • Individual events insignificant, long series need management • Need relation • Derived value service (trends, anomalies…) • Best consumed as stream vs batch Data exchange format standardization opportunity DATA METRIC (VERTICAL)
  6. DISTRIBUTED § Federated queries return only summary/deltas § Best on

    common format data-sets § Deliver always latest data, no duplication § Demands support from individual partners § Better for async/batch requests due to latency CENTRILIZED § Aggregates all data prior to query (duplication) § Queries over combined/indexed data § Perception of data out of provider’s control § Enables query by context not available at source § Supports real-time queries Partner Partners MODEL CONSIDERATIONS § Data “schema” or format commonality (standard) § Consumer usage demands (async query) § Network bandwidth/latency, consistency tolerance § Context locality demand § Skillset, willingness to absorb opex (all providers) § Geofencing requirements (compliance) NOT mutually exclusive - ability to facilitate both is an advantage. store provider store provider store provider exchange consume r consume r consume r = = = = = = DATA ACCESS
  7. Minimize data sharing OPEX through automation. Make it convenient. No

    data will be shared if the cost of its exchange is higher than market value § Reusable connectors (Drivers) § Gateway API for Scheduling, Validation, Alerts, Audit Create information abstraction layer to deliver data in readily to consume formats optimized for specific use- case to assure maximum stickability § API management, bindings § Federated & granular ACLs § Deep metering & telemetry Build new data views by connecting related sets to expose otherwise not obvious insights. Invest in becoming birthplace of organic data § Mine for link & associations § Deliver data curation service § Augment on-read context Create insight bazar, services beyond data, enable bi-directional exchange, enable sampling for value prior to use or purchase § Model & service gamification § On-demand data scientists § Hackathon & competitions LOWER OPEX ADD CONTEXT DIVERSE APIS CREATE BAZAR DATA EXCANGE
  8. POSTPROCESSING Data Provider Data Exchange Data Consumer Batch Stream Data

    Set 1-1 Data Set Query 1-n sets Answer Service 1-n sets Data Events 1-n provider n:n 1:n n:n n:n USAGE PATTERNS File Query Stream $/File Download $/Query or $/Query Plan (Time) $/Event or $/Subscription (Time) Job Exec Distributed Exec 1-n provider n:1 Job $/Job (* Target) or $/Job Exec Plan
  9. Batch Upload Data Stream Federation (Registration, ACL) Gateway (Management, Audit

    & Push/Pull) Operations (Management, Metering) Manage Monitor Ingestion (Push/Pull) Processing (Link Mining, Catalog & Service Creation) Preparation (Tokenization, Standardization, Validation) Service (Batch, Exec, Query, Management) Data Provider Service (API) Data Consumer Batch Download Data Stream Data Query • Delivers data encryption at rest and in motion • Accommodate both distributed and centralized models • Enable specialized access & format requirements • Lower provider OPEX with turn-key gateway or SDK • Deliver granular utilization & access metrics to expand monetization surface Billing (Invoice, Collect) Partner Services Persistent (Distribution, Durability, Optimization) DATA PLATFORM