Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DWX 2025: LLMs & Embeddings in Action: Real-Wor...

DWX 2025: LLMs & Embeddings in Action: Real-World Patterns für Ihre AI-Anwendungen

Generative AI und menschliche Sprache als zentrale Bausteine Ihrer Software – jenseits des Buzzword-Bingos. Begleiten Sie Christian Weyer in dieser Session bei der Erkundung erprobter Patterns und Lösungen für die nahtlose Integration von Language- und Embedding-Modellen in moderne Softwarearchitekturen. Kernthemen wie Semantic Routing, Retrieval-Augmented Generation, Structured Output und Observability werden im Einklang anhand praxisnaher Beispiele in einem End-to-End-System mit mehreren Services und Client-Anwendungen demonstriert. Entwickler und Architekten erhalten wertvolle Einblicke, wie sie natürlichsprachliche Benutzerschnittstellen in ihren Projekten zum Leben erwecken können.

Avatar for Christian Weyer

Christian Weyer

July 03, 2025
Tweet

More Decks by Christian Weyer

Other Decks in Programming

Transcript

  1. LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen

    Our journey Models for our software Lightweight RAG Semantic Routing Observability Structured Output / Tool Calling 2
  2. Language Models understand and generate semantically rich human language, transforming

    it into text or structured data for both humans and machines. ⚠ Non-deterministic: same input can lead to different outputs. Embedding Models capture semantic meaning by encoding human language into numerical vector representations, facilitating understanding, comparison, and retrieval for both humans and machines. ✅ Deterministic: same input always results in the same embedding. LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen 🫱 🫲 Semantic AI Generative AI 4
  3. § Language & embedding models part of end-to-end architectures §

    E-M enable semantic search & comparison § L-M enable human language understanding via context § System prompt § Conversation history § User query LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen API-based model integrations 5
  4. LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen

    Classical applications & UIs API-based data Document-based data 6
  5. LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen

    C4 system context diagram § Various tech stacks § Docker-based distributed system 8
  6. LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen

    Talking to documents (Retrieval-augmented generation) Cleanup & Split Text Embedding Question Text Embedding Save Query Relevant Results Question Answ er w / sources LLM Embedding Model Embedding Model 💡 Indexing / Embedding Question Answering .md, .docx, .pdf etc. “Lorem ipsum…?” 💡 Vector DB 10
  7. § Frameworks § LangChain § FastEmbed § Lightweight & efficient

    for generating text embeddings § Embedding model § jinaai/jina-embeddings-v2-base-de (local) – 768 dims § Vector store § PostgreSql (pgvector) vector store § LLM/SLM § Llama 3.3 70B on Cerebras (very fast) LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Technical implementation – Lightweight RAG 11
  8. § Tools integration is being standardized with MCP LLMs &

    Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Talking to APIs (Function / Tool calling) 13 “When is CW available for a two-days workshop?” System Prompt (+ employee data) + Schema (for structured output) Web API Availability business logic
  9. § Frameworks § Pydantic § Instructor § Methodology § Schema

    with JSON Mode (not Function Calling) § SLM/LLM § Llama 3.3 70B on Cerebras (very fast) LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Technical implementation – Structured Output 14
  10. LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen

    Semantics-based decisions for user interactions Guarding (e.g. prompt injection) Routing (selecting correct target) “Lorem ipsum…?” Target RAG Target API Call Target … something else … Fine-tuned NLP Model Embedding Model 16
  11. Guarding § Frameworks § llm-guard § HuggingFace Transformers § NLP

    model § deepset/ deberta-v3-base-injection (local) Routing § Frameworks § semantic-routing § FastEmbed § Embedding model § intfloat/ multilingual-e5-large (local) – 1024 dims § Vector store § PostgreSql (pgvector) LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Technical implementation – Semantic Guarding & Routing 17
  12. § Methodology § Open Telemetry (OTel) § Frameworks § OTel

    Python packages § LogFire SDK § Tools § LogFire, LangFuse § Any OTel-enabled system LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen Technical implementation - Observability 20
  13. LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen

    Typical Semantic AI patterns & solutions – in end-to-end software engineering Lightweight RAG Structured Output Semantic Guarding & Routing Insightful Observability 21
  14. LLMs & Embeddings in Action Real-World Patterns für Ihre AI-Anwendungen

    22 AI solutions are ≅10% AI and 100% software engineering. 22