RAG : Serverless Retrieval-Augmented Generation con S3 Vectors & Bedrock

RAG : Serverless Retrieval- Augmented Generation con S3 Vectors &
Bedrock Lino Espinoza AWS Community Builder @Serverless

Acerca de mi I do tech stuff on the cloud
Dad! Serverless lover ❤️ Tech & cloud content creator 🚀 Follow me on: /in/linoespinoza /linointhecloud Credencials / Certifications

The RAG Problem & Solution Serverless Architecture Deep-dive Live Demo
- Upload, Search, Generate Agenda de hoy

Tu AI no puede responder preguntas como : ¿Qué hay
en las notas de la reunión de la junta del Q3 (tercer trimestre) ? Resume los comentarios de los clientes de hoy ¿Cuál es nuestra política de trabajo remoto actualizada? El problema con los LLMs Porqué? Los LLMs tienen : Fechas límites de entrenamiento No cuentan con acceso a información privada Problemas de halucinación

Resultado: ¡Tu IA ahora conoce acerca de tu información y
se mantiene actualizada! RAG al rescate Query + Find Relevant Docs → Add Context → Enhanced LLM Response RAG = Retrieval Augmented Generation

Setup típico: Managed vector database ($$) Infraestructura compleja Problemas de
escalamiento Operational overhead Traditional RAG Challenges Costo: $200 - $2000 / month para Pinecone/Weaviate

Stack: S3: Document + Vector storage Lambda: Processing engine Bedrock:
AI models (embeddings + generation) DynamoDB: Metadata index API Gateway: REST endpoints Serverless Solution

¿Por qué S3? 0.023/GB vs 0.30+/GB en otras opciones (managed)
99.999999999% durabilidad Escalamiento ilimitado Integraciones nativas con servicios de AWS S3 como Vector Database Patrones de Almacenamiento s3://vectors-bucket/ ├── documents/ │ ├── doc-001/ │ │ ├── chunk-001.json # Vector individual │ │ ├── chunk-002.json │ │ └── metadata.json # Info del documento │ └── doc-002/ │ ├── chunk-001.json │ └── chunk-002.json └── indexes/ ├── by-date/2024-01-15.json # Índice por fecha └── by-type/pdf.json # Índice por tipo

Document Ingestion Flow Upload document (pdf / text-plain) S3 Vector
Storage DynamoDB (metadata) processing, completed Lambda Trigger (documentId extraction, text extraction, text chunking) Bedrock Embeddings

Query Flow User Query API Gateway (/POST) Lambda Query Bedrock
LLM

Gracias por asistir! I do tech stuff on the cloud
Dad! Serverless lover ❤️ Tech & cloud content creator 🚀 Follow me on: /in/linoespinoza /linointhecloud Credencials / Certifications

RAG : Serverless Retrieval-Augmented Generation...

RAG : Serverless Retrieval-Augmented Generation con S3 Vectors & Bedrock

Lino Espinoza

More Decks by Lino Espinoza

Other Decks in Technology

Featured

Transcript

RAG : Serverless Retrieval- Augmented Generation con S3 Vectors &

Acerca de mi I do tech stuff on the cloud

The RAG Problem & Solution Serverless Architecture Deep-dive Live Demo

Tu AI no puede responder preguntas como : ¿Qué hay

Resultado: ¡Tu IA ahora conoce acerca de tu información y

Setup típico: Managed vector database ($$) Infraestructura compleja Problemas de

Stack: S3: Document + Vector storage Lambda: Processing engine Bedrock:

¿Por qué S3? 0.023/GB vs 0.30+/GB en otras opciones (managed)

Document Ingestion Flow Upload document (pdf / text-plain) S3 Vector

Query Flow User Query API Gateway (/POST) Lambda Query Bedrock

Demo

Gracias por asistir! I do tech stuff on the cloud