Mike Gualtieri, Forrester Data loses value quickly over time Real time Seconds Minutes Hours Days Months Value of data to decision-making Preventive/Predictive Actionable Reactive Historical Time critical decisions Traditional “batch” business intelligence Information half-life in decision-making
sources THE PROBLEM: OUR CHALLENGES ON PROCESSING EVENT DATA Events: Interactions (touching points) of (potential) customers towards ABN AMRO, throughout devices, across channels Error prone process: bad records Huge increase in volumes of data Fast changing sources Limitations in consuming capabilities Diversity in data from different sources CUSTOMER EVENT STORE: WHY
to handle important events in the life of the customer that impact their relation with ABN AMRO in an adequate way.” Bernard Faber Solution Architect, ABN AMRO “Strong increase of the digitalized touching points with our customers (called events), from a growing number of sources.” Charles Van Kints Product Owner, ABN AMRO “The continuous growth in event data sources and volume, the increasing demand towards using event data and the current solution within the Marketing Intelligence data warehouse.” Peter Kromhout Engineering Lead, ABN AMRO
Volumes Consuming Capabilities Customer Interactions Real – Time Future State Customer Event Store Building insights in the customer behaviour, customer journey and customer interactions with ABN AMRO in order to be able to act Personal and Relevant. CUSTOMER EVENT STORE : WHAT
ü Prepare for Go-Live March – 2018 Prototype ü Develop Prototype. ü Initiate License to Public April – 2018 Technical Go - Live ü Product Stack deployed ü 2 Sources Live ü Tune product for Business Go-Live Business Go – Live ü Add new sources ü Consuming Capabilities ü Enable data usage Approach ü Successful prototype ü Co-creation – Business & IT ü 2 Event Sources Go! August – 2018 September – 2018 December – 2018 CUSTOMER EVENT STORE: WHEN
BATCH Batch Bucket Nano - Batch Bucket EMR Glue Step Function Lambda SNS Enterprise Raw Data Store Auto-Scaling Group Snowplow Collector Fargate Fargate
STREAM & BATCH Nano - Batch Bucket Auto-Scaling Group Snowplow Collector Fargate Auto-Scaling Group Snowplow Enricher Fargate Kinesis Data Stream – Raw Kinesis Data Stream – Good Kinesis Data Stream – Bad Kinesis Data Firehose Schema Bucket Bad Events Bucket Enterprise Raw Data Store Batch Bucket
Data Store Auto-Scaling Group Snowplow Collector Fargate Auto-Scaling Group Snowplow Enricher Fargate Kinesis Data Stream – Raw Kinesis Data Stream – Good Kinesis Data Stream – Bad CloudWatch Kinesis Data Stream – Standardized Kinesis Data Firehose - ORC Kinesis Data Firehose - JSON Standard Bucket Glue Crawler Athena DynamoDB Alarm Alarm Rule Schema Bucket
Complex workflows involving iteration of Lambda functions can be implemented quickly. Complex Workflows Clear intermediate results. Debug Friendly New state machine can be created for only the failed states.. Restart – ability Preserves state between subsequent API calls. State Management Lambda, Glue, ECS, SageMaker. Serverless Orchestration Retrials can be triggered for specific errors. Other actions can also be configured. Error Handling
EMR Can be placed in custom VPC Horizontally scalable Server less & Pre-configured Limited customization Fully Managed Public Service Define Cluster – Choose Applications & Customize as you wish Vertically & horizontally scalable More actions than just SPARK AWS EMR AWS Glue Spin-up time Only Spark
and data streams in real-time Capture, process, and store video streams Amazon Kinesis Video Streams Load data streams into data stores Amazon Kinesis Data Firehose SQL Analyze data streams with SQL Amazon Kinesis Data Analytics Capture, process, and store data streams Amazon Kinesis Data Streams
Transactions ERP Web logs/ cookies Connected devices AWS SDKs • Publish directly from application code via APIs • AWS Mobile SDK • Managed AWS sources: CloudWatch Logs, AWS IoT, Kinesis Data Analytics and more • RDS Aurora via Lambda Kinesis Agent • Monitors log files and forwards lines as messages to Kinesis Data Streams Kinesis Producer Library (KPL) • Background process aggregates and batches messages 3rd party and open source • Log4j appender • Apache Kafka • Flume, fluentd, and more …
for real-time processing of streaming data Cost-effective: $0.014 per 1,000,000 PUT Payload Units Millions of sources producing 100’s of terabytes per hour Amazon Web Services Front End AZ AZ AZ Authentic authorization Durable, highly consistent storage replicas data across three data centers (availability zones) Ordered stream of events supports multiple readers Amazon Kinesis Client Library on EC2 Amazon Kinesis Data Firehose Amazon Kinesis Data Analytics AWS Lambda
Shard 3 Shard n Kinesis Data Stream Consumer application A GetRecords() Data GetRecords(): Five transactions per second, per shard Data: 2MB per second, per shard Data producer up to 1 MB or 1000 records per second, per shard With only one consumer application, records can be retrieved every 200 ms
poll. Messages are pushed to the consumer as they arrive Shard 1 Kinesis Data Stream Data producer Consumer application A SubscribeToShard() Uses HTTP/2 • Up to five mins connection • Data pushed to consumer persist
Data Stream • Default limit of five registered consuming applications. More can be supported with a service limit increase request • Low-latency requirements for data processing • Messages are typically delivered to a consumer in less than 70 ms Amazon Kinesis Data Streams Consumers Standard • Total number of consuming applications is low • Consumers are not latency- sensitive • Minimize cost
take granular real-time data and turn it into insights Data is continuously processed so you need to tell the application when you want results Aggregation Windows
OR REPLACE PUMP calls_per_ip_pump AS INSERT INTO calls_per_ip_stream SELECT STREAM source_ip_address, COUNT(*) FROM source_sql_stream_001 WINDOWED BY STAGGER( PARTITION BY source_ip_address RANGE INTERVAL '1' MINUTE);
with SAP and Oracle Supply Chain Custom forecasts with 3 clicks 50% more accurate 1/10th the cost Integrates with Amazon Timestream Retail demand Travel demand AWS usage Revenue forecasts Web traffic Advertising demand Generate forecasts for: Accurate time-series forecasting service, based on the same technology used at Amazon.com