44ZB in 2020 10x growth, 90% of that machine-generated, new device …but, amount of data from which we can derive value to increase only from 22% to 35% Sources: IDC
• Single raw data source can drive many value chains (diff process, diff service) • Some data has no value until integrated or even when delivered as a service (noise vs signal) Raw Data Processed Data Integrated Data Data Services Symbol What Where When How Why Sources: SVDS Lack of trust Fear of sharing with partners, common perception of incompetency to protect “their” data Knee-jerk reaction 67% would rather lose the opportunity to monetize than to risk losing control Gray market Yet, over 60% of service-delivery companies already monetize collected data without original providers concent Source: Accenture INHIBITORS VALUE CHAIN Sources: Nate Silver’s book DATA VALUE
Product Dimension TRADITIONAL EDW DATA NEW DEVICE DATA EVENT FACT 14 53939807 2657 ABC 0.034 X: Y:Z… When Where What Value EVENT FACT Dimension is the context so this is efficient: get sales where product = ‘x’ and supplier = ‘y' § Data (most) born in an absence of context (narcissistic device?) § Observations, by default, are immutable (don’t change after reading) § Individual events insignificant, more interesting the longer observed (series) Observation Actuation Persistence Latency Attributes Ingestion bandwidth important but “total latency” most critical NEW CONSUMPTION MODEL VERTICALS DATA TYPES
(acquired, served) Health (genomic) Text-based files in columnar structure Standardized formats (VCF, WDL, CDL…) Small data variation across sources (deltas consumed) Finance (market) • Primarily transactional • RDBMS managed • Diverse data structures (schema, codes, relation) • Requires transformation, standardization • Comes with a lot of context (relationships) • May benefit from out-of-domain links • Batch (file) or service (API) • Parameterized queries (question/answer) Industrial (machine) • Machine generated, minimal context • Already highly standardized data per device type • Immutable (doesn’t change after reading) • Individual events insignificant, long series need management • Need relation • Derived value service (trends, anomalies…) • Best consumed as stream vs batch Data exchange format standardization opportunity DATA METRIC (VERTICAL)
common format data-sets § Deliver always latest data, no duplication § Demands support from individual partners § Better for async/batch requests due to latency CENTRILIZED § Aggregates all data prior to query (duplication) § Queries over combined/indexed data § Perception of data out of provider’s control § Enables query by context not available at source § Supports real-time queries Partner Partners MODEL CONSIDERATIONS § Data “schema” or format commonality (standard) § Consumer usage demands (async query) § Network bandwidth/latency, consistency tolerance § Context locality demand § Skillset, willingness to absorb opex (all providers) § Geofencing requirements (compliance) NOT mutually exclusive - ability to facilitate both is an advantage. store provider store provider store provider exchange consume r consume r consume r = = = = = = DATA ACCESS
data will be shared if the cost of its exchange is higher than market value § Reusable connectors (Drivers) § Gateway API for Scheduling, Validation, Alerts, Audit Create information abstraction layer to deliver data in readily to consume formats optimized for specific use- case to assure maximum stickability § API management, bindings § Federated & granular ACLs § Deep metering & telemetry Build new data views by connecting related sets to expose otherwise not obvious insights. Invest in becoming birthplace of organic data § Mine for link & associations § Deliver data curation service § Augment on-read context Create insight bazar, services beyond data, enable bi-directional exchange, enable sampling for value prior to use or purchase § Model & service gamification § On-demand data scientists § Hackathon & competitions LOWER OPEX ADD CONTEXT DIVERSE APIS CREATE BAZAR DATA EXCANGE
Set 1-1 Data Set Query 1-n sets Answer Service 1-n sets Data Events 1-n provider n:n 1:n n:n n:n USAGE PATTERNS File Query Stream $/File Download $/Query or $/Query Plan (Time) $/Event or $/Subscription (Time) Job Exec Distributed Exec 1-n provider n:1 Job $/Job (* Target) or $/Job Exec Plan