Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gen AI using Airflow 3 | Airflow Summit 2024

Kaxil Naik
September 13, 2024

Gen AI using Airflow 3 | Airflow Summit 2024

Gen AI has taken the computing world by storm. As Enterprises and Startups have started to experiment with LLM applications, it has become clear that providing the right context to these LLM applications is critical.

This process known as Retrieval augmented generation (RAG) relies on adding custom data to the large language model, so that the efficacy of the response can be improved. Processing custom data and integrating with Enterprise applications is a strength of Apache Airflow.

This talk goes into details about a vision to enhance Apache Airflow to more intuitively support RAG, with additional capabilities and patterns. Specifically, these include the following

Airflow Summit: https://airflowsummit.org/sessions/2024/gen-ai-using-airflow-3-a-vision-for-airflow-rags/

Kaxil Naik

September 13, 2024
Tweet

More Decks by Kaxil Naik

Other Decks in Programming

Transcript

  1. Introduction Kaxil Naik Airflow Committer & PMC Member Engineering Leader

    @ Astronomer Ash Berlin-Taylor Airflow Committer & PMC Member Engineering Leader @ Astronomer
  2. Evolving AI Landscape Explosion of AI Models Increased Focus on

    Data Privacy & Control GPUs are easily accessible Cost Optimization Growing Complexity of AI Workflows Increasing Need for Experimentation
  3. What is RAG? Typical Architecture for Q&A use-case using LLM

    Data Store Retrieval Output Storage Splitting Document Loading Vectorstore Database PDFs URLs LLM <Answer> Prompt Splits Relevant Splits Query <Question>
  4. RAG (Ingestion) as an Airflow DAG Large data sets Unstructured

    Data Generate and Store Embeddings Dynamic Mapping for large number of incoming datasets (website content, directories of files, .) Reading, chunking, and Transformation Python libraries and frameworks for above Eg: Unstructured, LangChain, etc. Using AI providers: Open AI, Cohere, etc. Store into Weviate, PgVector, …
  5. Ask Astro: Data Ingestion, Processing, and Embedding ▪ Airflow gives

    a framework to load data from APIs & other sources into LangChain ▪ LangChain helps pre-process and split documents into smaller chunks depending on content type ▪ After content is split into chunks, each chunk is embedded into vectors (semantic representations) ▪ Those vectors are written to Weaviate for later retrieval Embed chunks Write to Weaviate Pre-process and split into chunks 🦜🔗 LangChain Docs (.md) files Slack Messages GitHub issues Docs (.md) files
  6. Challenges Python Dependencies Selective GPU Execution Dynamic model choice Supporting

    varied Python configurations and dependencies between tasks Keeping main execution on CPUs, only selectively call out to GPUs on remote clusters Change LLM model in response to cost/performance/new features
  7. Solution part1 Task Execution Interface Python dependencies: - Different python

    dependencies for different tasks Cost-optimal Task Execution: - Data cleaning, Data transformation with CPUs - Model training w/ GPU as needed - less than 10% of tasks in a DAG
  8. Architectural decoupling: Task Execution Interface DAG File Processor(s) Scheduler Worker(s)

    Airflow Meta Database Web Server API Server Task SDK Task Execution Interface 3.0
  9. Solution part2 common.llm Selective model choice: - Different model performance

    & accuracy - Complexity vs. Cost & response time tradeoff - Dynamic selection based on task requirements and constraints AI provider selection: - Based on execution environment (e.g., GPUs, CPUs) - Data security constraints for external vs local models
  10. Example Inference as an Airflow DAG Rephrase the question Submit

    and get results Return results Use both original and re-phrased versions Query all versions of the question De-duplicate the results Optionally verify and rank the results Return results with sources
  11. AI SQL Assistant: Inference Users enter a question in Natural

    language in the AI Assist Editor on the UI ▪ Original prompt gets reworded 3x using gpt-3.5-turbo ▪ DB Schema incl. table & column names & type is retrieved ▪ Answer is generated by combining answers from each prompt and making a gpt-4 call 🦜🔗LangChain User Asks a Question Web App Original Prompt Rewording 2 Rewording 1 Rewording 3 Reword to get more related SQL queries Vector DB search with prompts DB DB schema + table & column names and col type Combine and make final LLM call to answer
  12. Challenges and upcoming enhancements Batch-triggered Dag Runs & Experimentation Dynamic

    model choice Synchronous DAG run Eliminate the execution date constraint Concurrent runs of the same DAG i.e. non-data-interval DAGs. commom.llm to dynamically change AI provider and model Inference DAGs return results upon completion Trigger API to support synchronous execution
  13. Batch-triggered Dag Runs - Non-data-interval based: No reliance on execution

    dates or schedules. - Ad-hoc invocation via API calls for inference allowing multiple instances to be triggered by API calls at the same time. Enables Experimentation - Run the same DAG with different parameters simultaneously, independent of the execution date. - Ideal for AI/ML workflows like: - Experiment with multiple models for embedding - Retraining models - Experimenting a new data source for RAG - Hyperparameter tuning Solution part3 Ad-hoc Dag Runs
  14. Data Assets - Dataset renamed to Data Asset to include

    Models, Reports, Embedding etc - Versioned Assets: Improved experiment tracking & Iterative changes - Enhanced UI support that allow visualization of “Data Asset Metadataˮ. - Example: RMSE value changes due to different parameters - Audit: Every version of data assets can be audited and compared across different experimental runs. Solution part4 Experimentation Tracking
  15. Solution part5 Synchronous DAG run Consumer of Inference DAG runs

    need results: - Current model: Final Task in DAG to store results in Blob storage - Ideal to add API support for it - Will support long-running DAGs, since timing is unpredictable Example: - Laurel: Automated timekeeping - Does not require “real-time chatbot style responsesˮ Other examples: - Evaluation of mortgage applications
  16. How Airflow 3 helps? Explosion of AI Models Increased Focus

    on Data Privacy & Control GPUs are easily accessible Cost Optimization Growing Complexity of AI Workflows Increasing Need for Experimentation common.llm Task Execution Interface common.llm Ad-hoc Dag Runs Data Assets Task Execution Interface Sync. DAG run
  17. In Summary Many organizations already using Airflow for Gen AI

    applications We need your feedback as we add these capabilities into Airflow 3 Recruiting beta users: - Building Gen AI platforms and use cases Come speak at the next Airflow Summit about your use case on Airflow 3!