I have given this talk (with minor variations) at the following venues:
• ApacheCon EU, Budapest, Hungary, 18 November 2014. http://apacheconeu2014.sched.org/event/3633e195715f88c3357749d57b7b3b8c
• Unified Log London Meetup, London, UK, 2 December 2014. http://www.meetup.com/unified-log-london/events/218025352/
• Jfokus, Stockholm, Sweden, 4 February 2015. https://martin.kleppmann.com/2015/02/04/samza-at-jfokus.html
Abstract:
Samza, an Apache Incubator project, is a framework for processing and analysing high-volume data streams. It is built upon Apache Kafka and YARN (Hadoop 2.0). You can think of Samza as a real-time, continuously running version of MapReduce.
In this talk, Martin will show why stream processing is becoming an important part of the architecture of data-intensive applications, alongside storage and batch processing. We will explore how Samza works, and show how it reliably processes millions of messages per second. We will also examine what kinds of applications would benefit from using Samza.