Advanced RAG Pipelines: Engineering Scalable Retrieval Systems for Enterprise AI

Advanced RAG Pipelines Engineering Scalable Retrieval Systems for Enterprise AI
S73015  March 21, 2025

Todayʼs Presenters Bilge Yücel Haystack Developer Relations Engineer at deepset
Meriem Bendris Senior Solution Architect AI at NVIDIA

Agenda 01 deepset & Haystack 02 NVIDIA NIM 03 Advanced
RAG using NVIDIA NIM in deepset Studio 04 Deploying Haystack Pipelines with NVIDIA NIM on Kubernetes 05 Conclusion

deepset: Solving Custom AI challenges since 2018 Early Advancements Foundational
NLP technology development Inflection Point GPT3.5 & ChatGPT Major public awareness Broad Ecosystem of AI Tools Business apps and agents running on compound AI systems • Flexible AI Orchestration • Variety of LLMs • Multimodal, Agentic systems • Variety of deployments 2010s Transformer Models Google BERT “Attention is all you needˮ 2017 Early LLMs GPT2, GPT3, LaMDA 20192021 Nov 2022 2024-beyond Studio 2.0 on-prem Our Community

AI Architectures in the Enterprise Retrieval Augmented Generation RAG Retrieve
relevant documents, use them to inform responses, and generate accurate and contextually rich answers to complex queries. Intelligent AI Agents Automate and streamline tasks, workflows, and insight generation with an compound AI system capable of complex reasoning and decision making. Text-to-SQL / Conversational BI Transform natural language queries into SQL commands, ask questions of complex datasets, and make data analysis more accessible & intuitive. Intelligent Document Processing Process documents at scale, accelerate insight extraction, and boost workflow efficiency with AI-powered automation. Semantic Search Understand and retrieve information based on semantic similarity, deliver highly accurate and relevant search results, and provide an advanced recommendation service. Multimodal Integrate and process multiple forms of data including text, PDFs, images, audio, and video for an enriched user experience.

deepset AI Platform Studio, Enterprise Editions] Orchestration Tools Build Test
Deploy Monitor Framework Components Pipelines Solutions Flexible Architecture Templates (e.g., Agent, RAG, GraphRAG, Multimodal, Search, IDP, Text2SQL Open Ecosystem Any Data, LLMs, and Integrations deepset: Delivering Custom Enterprise-Grade Gen AI Haystack Open Source LLM Orchestration AI Tools VectorDBs Embedding Models LLMs Evaluation Observability

Open-source LLM orchestration framework Provides the tools that Python developers
need to build real world, advanced AI systems Quickly combine models, data, and other tools for custom Gen AI Building blocks = Components & Pipelines Component Component Pipeline pip install haystack-ai

Custom Components and Pipelines Custom Pipeline Custom Component

NVIDIA NIM Microservice

Organizations must choose between ease of use and control Challenges
Experimenting with Generative AI

NVIDIA NIM Microservice

Experience and Run Enterprise Generative AI Models Anywhere Use NVIDIA
API catalog to get access to NVIDIA NIM

Retrieval Augmented Generation RAG

Basic RAG with NVIDIA NIM and Haystack

Advanced RAG Pipeline Join documents Query BM25 Embedding Reranking Answer
Retrieval Hybrid Retrieval Prompt Builder LLM Conditional Router Answer Web Search Prompt Builder LLM Use a fallback branch Fallback to web • Alternative data sources • Error handling

deepset Studio • Drag, drop, and construct Haystack pipelines •
Ready-made pipelines • Bring your own files or connect to your database • Deploy on Studio or export pipelines • Free and open to everyone Development Environment for Haystack

Build the Advanced RAG Pipeline Drag & Drop

Build the Advanced RAG Pipeline Deploy to Development Environment

Evaluating a RAG Pipeline Source: haystack.deepset.ai/blog/optimize-rag-with-nvidia-nemo • Quantitative → Statistical
& Model-based metrics

Evaluating a RAG Pipeline on Studio

Build the Advanced RAG Pipeline Export Pipeline

Deploying Pipelines as REST APIs with Hayhooks • Hayhooks: tool
to deploy and serve Haystack pipelines as REST APIs • Pipeline → Endpoint

Deploying Pipelines as REST APIs with Hayhooks

Haystack RAG Pipelines with Self-Hosted NVIDIA NIM

Haystack RAG Pipelines with self-hosted NVIDIA NIM llama-3.1-8 b-instruct nvidia/llama-3.2-nv
-embedqa-1b-v2 nvidia/llama-3.2-n v-embedqa-1b-v2

Haystack RAG Pipelines Deployment on Kubernetes

Autoscaling NVIDIA NIM using NIM Operator

NVIDIA NIM Metrics Grafana Dashboard

NVIDIA Blueprints Available on build.nvidia.com

Multimodal PDF Data Extraction For Enterprise RAG Unlocks Knowledge from
trillions of PDFs

Reference Architecture for Enterprise AI AI Orchestration Build, Test and
Monitor Your Agents and Applications Vector Database Dependent on Customer) Inference Service NVIDIA NIM Microservice for LLM Models / Embedders / Rerankers Network Infrastructure Compute Infrastructure Storage Infrastructure NVIDIA Accelerated Computing Infrastructure Management NVIDIA NIM Operator NVIDIA GPU Operator Kubernetes Workers Container Runtime Enterprise Linux Any Database

Thank You Visit our booth #2111 👉 Get all resources
Learn more: deepset.ai haystack.deepset.ai

Advanced RAG Pipelines: Engineering Scalable Re...

Advanced RAG Pipelines: Engineering Scalable Retrieval Systems for Enterprise AI

More Decks by Bilge Yücel

Other Decks in Technology

Featured

Transcript