Apache Kafka is a streaming data platform. It enables integration of data across the enterprise, and ships with its own stream processing capabilities. But how do we get data in and out of Kafka in an easy, scalable, and standardised manner? Enter Kafka Connect.
Part of Apache Kafka since 0.9, Kafka Connect defines an API that enables the integration of data from multiple sources, including MQTT, common NoSQL stores, and CDC from relational databases such as Oracle. By "turning the database inside out" we can enable an event-driven architecture in our business that reacts to changes made by applications writing to a database, without having to modify those applications themselves. As well as ingest, Kafka Connect has connectors with support for numerous targets, including HDFS, S3, and Elasticsearch.
This presentation will briefly recap the purpose of Kafka, and then dive into Kafka Connect, with practical examples of data pipelines that can be built with it and are in production at companies around the world already. We'll also look at the Single Message Transform (SMT) capabilities introduced with Kafka 0.10.2 and how they can make Kafka Connect even more flexible and powerful.