[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch
Data is flowing everywhere around us, from phones, credit cards, sensor-equipped buildings, vending machines, thermostats, trains, buses, planes, posts to social media, digital pictures and video and so on....
events Most commonly needed: time windows, session windows Examples: ✗Real-time monitoring: 5-minute averages ✗Reader behavior on a website: user browsing sessions
phones enter airplane, lose Internet connectivity Emails are being written during the 10h flight Internet connectivity is restored, phones will send queued emails now
to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet • Late event processing • You choose how long to wait