A classic data lakehouse is built on open-source table formats such as Delta.io, Iceberg, or Hudi and seamlessly integrates with big data platforms like Apache Spark and event buses like Apache Kafka or Amazon Kinesis. The popularity of the data lakehouse stems from its ability to combine the quality, speed, and simple SQL access of data warehouses with the cost-effectiveness, scalability, and support for unstructured data of data lakes.
With the advent of generative AI models and the potential of using techniques such as Retrieval-augmented generation (RAG) in combination with fine-tuning or pre-training custom LLMs, a new paradigm has emerged in 2023: AI-infused lakehouses. These platforms use generative AI for code generation, natural language queries, and semantic search, LLM callouts from SQL, enhancing governance and automating documentation.
How do lakehouses adapt to the integration of new AI capabilities?
This talk is for data architects who are not afraid of some code, for data engineers who love open source and cloud services, and for practitioners who enjoy a fun end-to-end demo. The Databricks Lakehouse is used for the demos.