There are a lot of different moving pieces when it comes to developing and serving LLM applications. This talk will provide a comprehensive guide for developing retrieval augmented generation (RAG) based LLM applications — with a focus on scale (embed, index, serve, etc.), evaluation (component-wise and overall) and production workflows. We’ll also explore more advanced topics such as hybrid routing to close the gap between OSS and closed LLMs.
Takeaways:
1. Evaluating RAG-based LLM applications is crucial for identifying and productionizing the best configuration.
2. Developing your LLM application with scalable workloads involves minimal changes to existing code.
3. Mixture of Experts (MoE) routing allows you to close the gap between open source and closed LLMs.