Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SDD 2025: Pragmatic Gen AI: LLM-based applicati...

SDD 2025: Pragmatic Gen AI: LLM-based applications - top patterns & solutions for successful adoption

Generative AI beyond buzzword bingo. In this session, Christian presents concrete patterns and solutions for integrating Large Language Models (LLMs) such as GPT, Mistral, Claude or Llama into your own software architectures. Important topics such as semantic routing, semantic caching, guarding, or observability are illustrated with code examples. Developers and architects can expect a pragmatic insight into possible implementations for their own projects.

Avatar for Christian Weyer

Christian Weyer

May 13, 2025
Tweet

More Decks by Christian Weyer

Other Decks in Programming

Transcript

  1. Pragmatic Gen AI LLM-based applications – top patterns & solutions

    for successful adoption Christian Weyer | Co-Founder & CTO | Thinktecture AG | [email protected]
  2. § Technology catalyst § AI-powered solutions § Pragmatic end-to-end architectures

    § Microsoft Regional Director § Microsoft MVP for AI § Google GDE for Web AI [email protected] https://www.thinktecture.com Pragmatic Gen AI: LLM-based applications Top patterns & solutions for successful adoption Christian Weyer Co-Founder & CTO @ Thinktecture AG 2
  3. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption Our journey 3 Models for our software Lightweight RAG Semantic Guarding & Routing Observability LLM all-the-things? Structured Output / Tool Calling
  4. Language Models understand and generate semantically rich human language, transforming

    it into text or structured data for both humans and machines. ⚠ Non-deterministic: same input can lead to different outputs. Embedding Models capture semantic meaning by encoding human language into numerical vector representations, facilitating understanding, comparison, and retrieval for both humans and machines. ✅ Deterministic: same input always results in the same embedding. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for successful adoption 5 🫱 🫲 Semantic AI Generative AI
  5. § We rely on Language Model’s Language understanding NOT on

    its world knowledge Pragmatic Gen AI: LLM-based applications Top patterns & solutions for successful adoption ⚠ Important shoutout 6
  6. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption MODELS FOR OUR SOFTWARE 7
  7. § Language & embedding models part of end-to-end architectures §

    E-M enable semantic search & comparison § L-M enable human language understanding via context § System prompt § Conversation history § User query Pragmatic Gen AI: LLM-based applications Top patterns & solutions for successful adoption API-based model integrations 8
  8. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption Classical applications & UIs 9
  9. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption Language-enabled “UIs” – Talk-to-TT sample 10
  10. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption 11 C4 system context diagram § Docker-based distributed system
  11. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption Sample solution - Technology stack 12 Services § Python as the go-to-platform for genuine ML/AI/Gen-AI § Esp. for local model execution § But: Most of the logic could be implemented in any language/platform Clients
  12. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption PATTERN LIGHTWEIGHT RAG [RETRIEVAL-AUGMENTED GENERATION] 13
  13. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption “Talk to your Data” Cleanup & Split Text Embedding Question Text Embedding Save Query Relevant Results Question Answ er w / sources LLM Embedding Model Embedding Model 💡 Indexing / Embedding Question Answering .md, .docx, .pdf etc. “Lorem ipsum…?” 💡 Vector DB 14
  14. § Frameworks § LangChain § FastEmbed § Lightweight & efficient

    for generating text embeddings § Embedding model § jinaai/jina-embeddings-v2-base-de (local) § Vector store § PostgreSql (pgvector) vector store § LLM/SLM § Llama 3.3 70B on Cerebras (very fast) Pragmatic Gen AI: LLM-based applications Top patterns & solutions for successful adoption Technical implementation – Lightweight RAG 15
  15. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption PATTERN STRUCTURED OUTPUT 16
  16. § Integration is being standardized with MCP Pragmatic Gen AI:

    LLM-based applications Top patterns & solutions for successful adoption Structured data from unstructured input For calling APIs / tools 17 “OK, when is my colleague CW available for a two- days workshop?” System Prompt (with employee data) + Schema / Function Calling (for structured output) (Internal) Web API Availability business logic
  17. § Frameworks § Pydantic § Instructor § Methodology § Schema

    with JSON Mode (not Function Calling) § SLM/LLM § Llama 3.3 70B on Cerebras (very fast) Pragmatic Gen AI: LLM-based applications Top patterns & solutions for successful adoption Technical implementation – Structured Output 18
  18. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption PATTERN SEMANTIC GUARDING & ROUTING 19
  19. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption Semantics-based decisions for user interactions Guarding (e.g. prompt injection) Routing (selecting correct target) “Lorem ipsum…?” Target RAG Target API Call Target … something else … Fine-tuned Language Model Embedding Model 20
  20. Guarding § Frameworks § llm-guard § HuggingFace Transformers § Model

    § protectai/ deberta-v3-base-prompt- injection-v2 (local) Routing § Frameworks § semantic-routing § FastEmbed § Embedding model § intfloat/ multilingual-e5-large (local) § Vector store § PostgreSql (pgvector) Pragmatic Gen AI: LLM-based applications Top patterns & solutions for successful adoption Technical implementation – Semantic Guarding & Routing 21
  21. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption PATTERN / SOLUTION OBSERVABILITY 22
  22. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption Things can get… overwhelming 23
  23. § Methodology § Open Telemetry (OTel) § Frameworks § OTel

    Python § LogFire SDK § Tools § LogFire § Any OTel-enabled system Pragmatic Gen AI: LLM-based applications Top patterns & solutions for successful adoption Technical implementation - Observability 25
  24. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption END-TO-END SOLUTION ILLUSTRATED 26
  25. Semantic routing Pragmatic Gen AI: LLM-based applications Top patterns &

    solutions for successful adoption "Talk to your systems" - for Availability queries 27 Web App / Watch App Speech-to-Text Internal Gateway (Python FastAPI) LLM / SLM Text-to-Speech Transcribe spoken text Transcribed text Check for experts availability with text Extract { experts, booking times } from text Structured JSON data (Function calling) Generate response with availability Response Response with experts availability 🔉 Speech-to-text for response Response audio Internal Business API (node.js – veeeery old) Query Availability API Availability When is CL…? CL will be…
  26. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption Top Semantic AI patterns & solutions – in end-to-end software engineering 29 Lightweight RAG Structured Output Semantic Guarding & Routing Insightful Observability 💡 Fun Fact: Large parts been built with AI-assisted Coding / Vibe Coding
  27. Pragmatic Gen AI: LLM-based applications Top patterns & solutions for

    successful adoption Top Semantic AI patterns & solutions – in end-to-end software engineering 30 Lightweight RAG Structured Output Semantic Guarding & Routing Insightful Observability AI solutions are 10% AI. And 100% software engineering.