Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Airflow as an AI Agent's toolkit: Going beyond ...

Airflow as an AI Agent's toolkit: Going beyond MCPs and unlocking Airflow's 1000+ Integrations

What if your Airflow tasks could understand natural language AND adapt to schema changes automatically, while maintaining the deterministic, observable workflows we rely on? This talk introduces practical patterns for AI-native orchestration that preserve Airflow’s strengths while adding intelligence where it matters most.

Through a real-world example, we’ll demonstrate AI-powered tasks that detect schema drift across multi-cloud systems and perform context-aware data quality checks that go beyond simple validation—understanding business rules, detecting anomalies, and generating validation queries from prompts like “check data quality across regions.” All within static DAG structures you can test and debug normally.

We’ll show how AI becomes a first-class citizen by combining Airflow’s features, assets for schema context, Human-in-the-Loop for approvals, and AssetWatchers for automated triggers, with engines such as Apache DataFusion for high-performance query execution and support for cross-cloud data processing with unified access to multiple storage formats. These patterns apply directly to schema validation and similar cases where natural language can simplify complex operations.

This isn’t about bolting AI onto Airflow. It’s about evolving how we build workflows, from brittle rules to intelligent adaptation, while keeping everything testable, auditable, and production-ready.

Avatar for Kaxil Naik

Kaxil Naik

October 10, 2025
Tweet

More Decks by Kaxil Naik

Other Decks in Programming

Transcript

  1. Airflow as an AI Agent's toolkit Going Beyond MCPs &

    Unlocking 1000+ Integrations Kaxil Naik & Pavan Kumar Gopidesu
  2. Introduction Kaxil Naik Sr Eng Director @ Astronomer Airflow Committer

    & PMC Member Pavan Kumar Gopidesu Technical Lead @ Tavant Airflow Committer & ASF Member
  3. Customer use case - Relies on a lot of incoming

    data Data Sources - Hundreds of clients → thousands of S3, GCS feeds - Formats vary Parquet/CSV/JSON; schemas evolve - Frequent schema drift - Data across clouds Consumers: - Per‑consumer tables Postgres, Iceberg, Glue) - Upstream drift breaks ingestion
  4. When things break! Debug! Change order processes Manual fixes Schema

    comparison across files & DB Backfills Notifying 100+ consumers
  5. Does this problem resonate? What currently takes 24+ hours: ❌

    Manual schema comparison ❌ Change order approval ❌ Code changes and testing ❌ Debugging and backfills What this PROTOTYPE demonstrates: ✅ Automated detection (using Apache DataFusion & AI) ✅ Human oversight at critical points ✅ Cross-cloud validation ✅ Business-friendly explanations Would this solution work for you?
  6. Automatic Context Injection For SQL operations: • Database type and

    version (PostgreSQL 15.2) • Full schema from DbApiHook or Asset metadata • Sample data (first few rows) • Built-in safety rules For File operations: • File format (Parquet, JSON, CSV) • Storage type (S3, GCS, Azure) • File size, row count estimates • Schema information • Partitioning structure
  7. Safety Mechanisms We are not just sending prompts to LLMs.

    Safety layers we're exploring: ✅ SQL Safety: Blocks DROP, DELETE without WHERE, TRUNCATE ✅ Human-in-the-Loop: Required for sensitive operations 🚧 Query validation: Parse and analyze before execution 🚧 Asset sensitivity: Mark Assets as auto-requiring approval for accessing it (PII) 🚧 Audit logging: All AI decisions tracked separately 🚧 Read-only by default: Write operations need explicit approval
  8. Why Apache DataFusion? • Unified query engine across object stores

    and DB. (S3, Postgres) • Multiple formats (Parquet, JSON, CSV, Iceberg, Delta Lake) • Single-node performance (no Spark overhead) • Performance (in our test): 50M records in 14 seconds (with joins, groupby, min, max etc) ⚠ DataFusion is for READING only. Write uses DBApiHook
  9. Current Implementation: • LLMSchemaCompareOperator - for schema drift • LLMDataQualityOperator

    - for validation • LLMFileAnalysisOperator - for file analysis • … more to come for interacting with API(s) apart from Files & DB Why specialized: Clear intent, better context for LLM, type safety, focused documentation Alternative being explored: Unified LLMOperator with resource adapters We're still figuring out the right abstraction. Your feedback will help. Current Approach - Specialized Operators
  10. Integration with Assets Mark Asset as sensitive Define how to

    access the Asset • URI • Connection Define Asset type • Data format • Schema Define metadata (for better AI context) • Description • Example queries Future: • Validations • Statistics
  11. What Airflow Principles Must Stay Whatever we build must preserve

    Airflow's core strengths: ✅ Deterministic DAG structure - static, reviewable, testable ✅ Observable - lineage, logging, monitoring ✅ Reliable - existing retry logic, error handling ✅ Safe - no breaking changes to existing workflows Leveraging an LLM is just one task in a predictable pipeline. We're NOT building AI that changes DAG structure.
  12. What We're NOT Building ❌ NOT: AI that changes DAG

    structure Dynamic pipeline generation AI that makes architecture decisions Replacement for your data engineers ✅ YES: AI for repetitive, context-dependent tasks Deterministic DAGs with intelligent tasks Human oversight at critical points Audit trails and observability
  13. Implementation Reality Check If we proceed, the path would be:

    • Phase 1: Experimental provider (apache-airflow-providers-ai ) • Phase 2: Community feedback and iteration • Phase 3: Production-ready provider (if it proves valuable) • Phase 4: Core integration (only if community demands it) This could take multiple months to get right. No shortcuts.
  14. What We Need From You Before we go further, we

    need community input: • Is this solving real problems you face? • What safety mechanisms are non-negotiable? • How should we handle AI errors and edge cases? • Right balance between intelligence & predictability? • Should this be a provider or core feature?
  15. Future Possibilities • Could AI-detected issues become deterministic checks in

    future runs? • Should operators propose PRs for DAG code changes (double approval)? • Multi-agent validation (one generates, another reviews)? • All AI calls can be logged in logs and DB and shown via plugins too • … any other wild ideas (?) • Expose 1000s of Hooks as AI Agent’s “tools”
  16. How to get involved? 💬 Mailing list: [email protected] (AIP coming

    after Summit) 🗣 Slack: #airflow-3-dev channel 📧 Pavan: [email protected] [He wants your feedback directly] 📧 Kaxil: [email protected]
  17. The 2025 Apache Airflow® Survey is here! Fill it out

    to for a free Airflow 3 Fundamentals or DAG Authoring in Airflow 3 certification code