NVIDIA GTC 2025
by Meriem Bendris & Bilge Yucel
Retrieval-augmented generation (RAG) systems integrate the reasoning capabilities of large language models and information retrieval by searching for the relevant content from a large corpus to generate informed, accurate responses. Learn how to build advanced, custom RAG applications. You'll explore the design of an end-to-end RAG system, including data preparation, indexing, retrieval, and response generation. Then, we'll show how you can leverage Haystack pipelines with multiple NVIDIA NIMs such as the LLM, text embedding, and reranking microservices to self-host on a Kubernetes production environment at scale. Finally, we'll discuss AI models evaluation and customization for specific RAG tasks.