Abstract:
As the saying goes: nothing is older than yesterday’s news, uhm, data. Join us for an immersive hands-on lab to explore real-time ETL using the triumphant trio Apache Flink, Debezium, and LangChain4j.
Participants will gain practical experience in setting up different end-to-end real-time data pipelines, streaming data from an operational database to an analytics data store—continuously, efficiently, and with a very low latency—enabling use cases such as full-text search and live dashboarding, enriched with LLM-derived metadata.
In the lab, you will learn how to:
* Build a real-time data pipeline from Postgres to OpenSearch, based on Apache Flink and Debezium for change data capture (CDC)
* Use Flink's connector capabilities to set up seamless real-time ETL pipelines between various data sources and sinks
* Implement data transformations, filtering, and aggregations on top of CDC streams in real time with the help of streaming SQL
* Integrate a large language model (LLM) for sentiment analysis based on LangChain4j, enabling deeper insights into the processed data
Join this lab to advance your skills in working with real-time data and learn how robust and leading open-source technologies support your business-critical stream processing workloads.
Repository: https://github.com/decodableco/oss-streaming-lab
Recording: hands-on labs haven't been recorded