Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JAX 2025: Semantic AI in Action: Architektur-Pa...

JAX 2025: Semantic AI in Action: Architektur-Patterns für LLMs & Embeddings

Semantic AI als Schlüssel zur Integration von KI in eigene Lösungen. In diesem Vortrag zeigt Christian Weyer praxisnahe Architektur-Patterns und Ansätze für die Nutzung von Large und Small Language Models wie GPT oder Llama sowie Embedding-Modellen in modernen Software-Architekturen. Wichtige Konzepte wie Semantic Routing, Semantic Search & RAG, Structured Output und Observability werden anhand eines End-to-End-Systems mit mehreren Services und Client-Anwendungen demonstriert. Entwickler und Architekten erhalten einen pragmatischen Überblick über die mögliche Umsetzung in eigenen Projekten.

Avatar for Christian Weyer

Christian Weyer

May 08, 2025
Tweet

More Decks by Christian Weyer

Other Decks in Programming

Transcript

  1. § Technology catalyst § AI-powered solutions § Pragmatic end-to-end architectures

    § Microsoft Regional Director § Microsoft MVP for AI § Google GDE for Web AI [email protected] https://www.thinktecture.com Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Christian Weyer Co-Founder & CTO @ Thinktecture AG 2
  2. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Our

    journey 3 Models for our software Lightweight RAG Semantic Routing Observability LLM all-the-things? Structured Output / Tool Calling
  3. Language Models understand and generate semantically rich human language, transforming

    it into text or structured data for both humans and machines. ⚠ Non-deterministic: same input can lead to different outputs. Embedding Models capture semantic meaning by encoding human language into numerical vector representations, facilitating understanding, comparison, and retrieval for both humans and machines. ✅ Deterministic: same input always results in the same embedding. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings 5 🫱 🫲 Semantic AI Generative AI
  4. § Language & embedding models part of end-to-end architectures §

    E-M enable semantic search & comparison § L-M enable human language understanding via context § System prompt § Conversation history § User query Semantic AI in Action Architektur-Patterns für LLMs & Embeddings API-based model integrations 7
  5. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings PATTERN

    LIGHTWEIGHT RAG [RETRIEVAL-AUGMENTED GENERATION] 11
  6. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings “Talk

    to your Data” Cleanup & Split Text Embedding Question Text Embedding Save Query Relevant Results Question Answ er w / sources LLM Embedding Model Embedding Model 💡 Indexing / Embedding Question Answering .md, .docx, .pdf etc. “Lorem ipsum…?” 💡 Vector DB 12
  7. § Frameworks § LangChain § FastEmbed § Lightweight & efficient

    for generating text embeddings § Embedding model § jinaai/jina-embeddings-v2-base-de (local) § Vector store § PostgreSql (pgvector) vector store § LLM/SLM § Llama 3.3 70B on Cerebras (very fast) Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation – Lightweight RAG 13
  8. § Integration is being standardized with MCP Semantic AI in

    Action Architektur-Patterns für LLMs & Embeddings Structured data from unstructured input For calling APIs / tools 15 “OK, when is my colleague CW available for a two- days workshop?” System Prompt (with employee data) + Schema / Function Calling (for structured output) (Internal) Web API Availability business logic
  9. § Frameworks § Pydantic § Instructor § Methodology § Schema

    with JSON Mode (not Function Calling) § SLM/LLM § Llama 3.3 70B on Cerebras (very fast) Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation – Structured Output 16
  10. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Semantics-based

    decisions for user interactions Guarding (e.g. prompt injection) Routing (selecting correct target) “Lorem ipsum…?” Target RAG Target API Call Target … something else … Fine-tuned Language Model Embedding Model 18
  11. Guarding § Frameworks § llm-guard § HuggingFace Transformers § Model

    § deepset/deberta-v3-base- injection Routing § Frameworks § semantic-routing § FastEmbed § Embedding model § intfloat/multilingual-e5- large § Vector store § PostgreSql (pgvector) Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation – Semantic Guarding & Routing 19
  12. § Methodology § Open Telemetry (OTel) § Frameworks § OTel

    Python packages § LogFire SDK § Tools § LogFire, LangFuse § Any OTel-enabled system Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Technical implementation - Observability 23
  13. Semantic routing Semantic AI in Action Architektur-Patterns für LLMs &

    Embeddings "Talk to your systems" - for Availability info 25 Web App / Watch App Speech-to-Text Internal Gateway (Python FastAPI) LLM / SLM Text-to-Speech Transcribe spoken text Transcribed text Check for experts availability with text Extract { experts, booking times } from text Structured JSON data (Function calling) Generate response with availability Response Response with experts availability 🔉 Speech-to-text for response Response audio Internal Business API (node.js – veeeery old) Query Availability API Availability When is CL…? CL will be…
  14. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Recap:

    Top Semantic AI patterns & solutions – in end-to-end software engineering 26 Lightweight RAG Structured Output Semantic Guarding & Routing Insightful Observability 💡 Fun Fact: Large parts been built with AI-assisted Coding / Vibe Coding
  15. Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Language

    & Embedding Models everywhere OpenAI-related (cloud) OpenAI Azure OpenAI Service Big cloud providers Google Model Garden on Vertex AI Amazon Bedrock Open-source Edge IoT Server Desktop Mobile Web Other providers Anthropic Google DeepMind Mistral AI Hugging Face Open-source 30
  16. § SLM families, e.g. § Llama § Mistral § Phi

    § Qwen § Success factors § Use case § Parameter size § Quantization § Local inference runtimes with APIs § E.g. llama.cpp, ollama, VLLM, ONNXRuntime Semantic AI in Action Architektur-Patterns für LLMs & Embeddings Open-source models thrive 31 § Local UIs § E.g. Open WebUI § Processing power needed § CPU optimization on its way § Embedding models often run great on CPU