Airflow as an AI Agent's toolkit: Going beyond MCPs and unlocking Airflow's 1000+ Integrations

Airﬂow as an AI Agent's toolkit Going Beyond MCPs &
Unlocking 1000+ Integrations Kaxil Naik & Pavan Kumar Gopidesu

Introduction Kaxil Naik Sr Eng Director @ Astronomer Airflow Committer
& PMC Member Pavan Kumar Gopidesu Technical Lead @ Tavant Airflow Committer & ASF Member

24+ hours of work! Every. Schema. Change. 2-3 times per
week

Customer use case - Relies on a lot of incoming
data Data Sources - Hundreds of clients → thousands of S3, GCS feeds - Formats vary Parquet/CSV/JSON; schemas evolve - Frequent schema drift - Data across clouds Consumers: - Per‑consumer tables Postgres, Iceberg, Glue) - Upstream drift breaks ingestion

When things break! Debug! Change order processes Manual fixes Schema
comparison across files & DB Backfills Notifying 100+ consumers

Demo: Handling Schema Drift

Demo: Data Quality Checks

Does this problem resonate? What currently takes 24+ hours: ❌
Manual schema comparison ❌ Change order approval ❌ Code changes and testing ❌ Debugging and backfills What this PROTOTYPE demonstrates: ✅ Automated detection (using Apache DataFusion & AI) ✅ Human oversight at critical points ✅ Cross-cloud validation ✅ Business-friendly explanations Would this solution work for you?

How does this work?

Automatic Context Injection For SQL operations: • Database type and
version (PostgreSQL 15.2) • Full schema from DbApiHook or Asset metadata • Sample data (first few rows) • Built-in safety rules For File operations: • File format (Parquet, JSON, CSV) • Storage type (S3, GCS, Azure) • File size, row count estimates • Schema information • Partitioning structure

Safety Mechanisms We are not just sending prompts to LLMs.
Safety layers we're exploring: ✅ SQL Safety: Blocks DROP, DELETE without WHERE, TRUNCATE ✅ Human-in-the-Loop: Required for sensitive operations 🚧 Query validation: Parse and analyze before execution 🚧 Asset sensitivity: Mark Assets as auto-requiring approval for accessing it (PII) 🚧 Audit logging: All AI decisions tracked separately 🚧 Read-only by default: Write operations need explicit approval

Why Apache DataFusion? • Unified query engine across object stores
and DB. (S3, Postgres) • Multiple formats (Parquet, JSON, CSV, Iceberg, Delta Lake) • Single-node performance (no Spark overhead) • Performance (in our test): 50M records in 14 seconds (with joins, groupby, min, max etc) ⚠ DataFusion is for READING only. Write uses DBApiHook

Current Implementation: • LLMSchemaCompareOperator - for schema drift • LLMDataQualityOperator
- for validation • LLMFileAnalysisOperator - for file analysis • … more to come for interacting with API(s) apart from Files & DB Why specialized: Clear intent, better context for LLM, type safety, focused documentation Alternative being explored: Unified LLMOperator with resource adapters We're still figuring out the right abstraction. Your feedback will help. Current Approach - Specialized Operators

Integration with Assets Mark Asset as sensitive Define how to
access the Asset • URI • Connection Define Asset type • Data format • Schema Define metadata (for better AI context) • Description • Example queries Future: • Validations • Statistics

Airﬂow PMC perspective

What Airflow Principles Must Stay Whatever we build must preserve
Airflow's core strengths: ✅ Deterministic DAG structure - static, reviewable, testable ✅ Observable - lineage, logging, monitoring ✅ Reliable - existing retry logic, error handling ✅ Safe - no breaking changes to existing workflows Leveraging an LLM is just one task in a predictable pipeline. We're NOT building AI that changes DAG structure.

What We're NOT Building ❌ NOT: AI that changes DAG
structure Dynamic pipeline generation AI that makes architecture decisions Replacement for your data engineers ✅ YES: AI for repetitive, context-dependent tasks Deterministic DAGs with intelligent tasks Human oversight at critical points Audit trails and observability

Implementation Reality Check If we proceed, the path would be:
• Phase 1: Experimental provider (apache-airflow-providers-ai ) • Phase 2: Community feedback and iteration • Phase 3: Production-ready provider (if it proves valuable) • Phase 4: Core integration (only if community demands it) This could take multiple months to get right. No shortcuts.

Should Airflow have production-ready AI operators?

What We Need From You Before we go further, we
need community input: • Is this solving real problems you face? • What safety mechanisms are non-negotiable? • How should we handle AI errors and edge cases? • Right balance between intelligence & predictability? • Should this be a provider or core feature?

Future Possibilities • Could AI-detected issues become deterministic checks in
future runs? • Should operators propose PRs for DAG code changes (double approval)? • Multi-agent validation (one generates, another reviews)? • All AI calls can be logged in logs and DB and shown via plugins too • … any other wild ideas (?) • Expose 1000s of Hooks as AI Agent’s “tools”

How to get involved? 💬 Mailing list: [email protected] (AIP coming
after Summit) 🗣 Slack: #airflow-3-dev channel 📧 Pavan: [email protected] [He wants your feedback directly] 📧 Kaxil: [email protected]

Questions? Concerns? Ideas?

The 2025 Apache Airflow® Survey is here! Fill it out
to for a free Airflow 3 Fundamentals or DAG Authoring in Airflow 3 certification code

Airflow as an AI Agent's toolkit: Going beyond ...

Airflow as an AI Agent's toolkit: Going beyond MCPs and unlocking Airflow's 1000+ Integrations

Kaxil Naik

More Decks by Kaxil Naik

Other Decks in Programming

Featured

Transcript

Airﬂow as an AI Agent's toolkit Going Beyond MCPs &

Introduction Kaxil Naik Sr Eng Director @ Astronomer Airflow Committer

24+ hours of work! Every. Schema. Change. 2-3 times per

Customer use case - Relies on a lot of incoming

When things break! Debug! Change order processes Manual fixes Schema

Demo: Handling Schema Drift

Demo: Data Quality Checks

Does this problem resonate? What currently takes 24+ hours: ❌

How does this work?

Automatic Context Injection For SQL operations: • Database type and

Safety Mechanisms We are not just sending prompts to LLMs.

Why Apache DataFusion? • Unified query engine across object stores

Current Implementation: • LLMSchemaCompareOperator - for schema drift • LLMDataQualityOperator

Integration with Assets Mark Asset as sensitive Define how to

Airﬂow PMC perspective

What Airflow Principles Must Stay Whatever we build must preserve

What We're NOT Building ❌ NOT: AI that changes DAG

Implementation Reality Check If we proceed, the path would be:

Should Airflow have production-ready AI operators?

What We Need From You Before we go further, we

Future Possibilities • Could AI-detected issues become deterministic checks in

How to get involved? 💬 Mailing list: [email protected] (AIP coming

Questions? Concerns? Ideas?

The 2025 Apache Airflow® Survey is here! Fill it out