$30 off During Our Annual Pro Sale. View Details »

Software Architecture Gathering 2025: Architect...

Software Architecture Gathering 2025: Architecting with Semantic AI: A Language Model and an Embedding Model Walk Into a Bar...

Semantic AI, as an evolution of Generative AI, offers a powerful path to integrating meaningful intelligence into modern software systems.
In this talk, Christian Weyer presents practical architecture patterns and approaches for leveraging both large and small language models—such as GPT or LLaMA—alongside embedding models, within contemporary software architectures.
Key concepts like Semantic Routing, Semantic Search & Lightweight RAG, as well as Structured Output generation, are demonstrated using an end-to-end system composed of multiple services and client applications.
Developers and architects will leave with a pragmatic, real-world overview of how to bring Semantic AI into their own solutions—efficiently, modularly, and meaningfully.

Avatar for Christian Weyer

Christian Weyer PRO

November 26, 2025
Tweet

More Decks by Christian Weyer

Other Decks in Programming

Transcript

  1. Architecting with Semantic AI: A Language Model & an Embedding

    Model walk into a bar... Christian Weyer | Co-Founder & CTO | Thinktecture AG | [email protected]
  2. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... Our Architectural Journey with AI Models 2 Model Foundation Retrieval Flow Control Semantic Observability Contracts
  3. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... MODELS IN ARCHITECTURE LANDSCAPE 3 Architecture Building Block: Model Foundation Layer
  4. Language Models understand and generate semantically rich human language, transforming

    it into text or structured data for both humans and machines. ⚠ Non-deterministic: same input can lead to different outputs. Embedding Models capture semantic meaning by encoding human language into numerical vector representations, facilitating understanding, comparison, and retrieval for both humans and machines. ✅ Deterministic: same input always results in the same embedding. Architecting with Semantic AI A Language Model & an Embedding Model Walk Into a Bar... 🫱 🫲 Semantic AI Generative AI 4
  5. § Language & embedding models part of end-to-end architectures §

    Accessed via an API § Embedding models can be run locally § Optimized for CPU § Language models (still) hard to run locally § High GPU power § High VRAM § High memory bandwidth Architecting with Semantic AI A Language Model & an Embedding Model Walk Into a Bar... Integration Architecture: Models as Services 5
  6. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... Traditional Client Interfaces API-based data Document-based data 6 Architecture Building Block UI Layer
  7. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... Language-enabled “UIs” 7 – e.g. Talk-to-TT
  8. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... C4 System Context Diagram § Container-based distributed system § Various tech stacks 8
  9. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... PATTERN LIGHTWEIGHT RAG 9 Architecture Building Block Retrieval Layer
  10. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... Talking to Documents (Retrieval-Augmented Generation) Cleanup & Split Text Embedding Question Text Embedding Save Query Relevant Results Question Answ er w / sources LLM Embedding Model Embedding Model 💡 Indexing / Embedding Question Answering .md, .docx, .pdf etc. “Lorem ipsum…?” 💡 Vector DB 10
  11. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... PATTERN STRUCTURED OUTPUT 11 Architecture Building Block Contract Layer
  12. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... Talking to Systems (Function / Tool calling) “When is CW available for a two-days workshop?” System Prompt (+ employee data) + Schema (for structured output) Web API Availability business logic 12
  13. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... PATTERN SEMANTIC GUARDING & ROUTING 13 Architecture Building Block Flow Control Layer
  14. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... Semantic Decision-Making for Interaction Flows Guarding (e.g. prompt injection) Routing (selecting correct target) “Lorem ipsum…?” Target RAG Target API Call Target … something else … Fine-tuned NLP Model Embedding Model 14
  15. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... PATTERN TELEMETRY 15 Architecture Building Block Semantic Observability Layer
  16. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... Things can get… Overwhelming 16
  17. Semantic Observability Layer Cross-cutting end-to-end telemetry insights Flow Control Layer

    Semantic guarding & routing decisions Contract Layer Structured output + schema enforcement Retrieval Layer Embedding-based document & data retrieval Model Foundation Layer LLM + Embedding models as core capabilities Recap: Semantic AI Architecture A Language Model & an Embedding Model Walk Into a Bar... Architecting with Semantic AI 17
  18. Architecting with Semantic AI A Language Model & an Embedding

    Model Walk Into a Bar... AI-based solutions are ≅10% AI and 100% software engineering. 18
  19. § Frameworks § Pydantic § Instructor § Methodology § Schema

    with JSON Mode (not Function Calling) § SLM/LLM § Llama 3.3 70B on Cerebras (very fast) Architecting with Semantic AI A Language Model & an Embedding Model Walk Into a Bar... Technical implementation – Structured Output 21
  20. § Frameworks § LangChain § FastEmbed § Lightweight & efficient

    for generating text embeddings § Embedding model § jinaai/jina-embeddings-v2-base-de (local) – 768 dims § Vector store § PostgreSql (pgvector) vector store § LLM/SLM § Llama 3.3 70B on Cerebras (very fast) Architecting with Semantic AI A Language Model & an Embedding Model Walk Into a Bar... Technical implementation – Lightweight RAG 22
  21. Guarding § Frameworks § llm-guard § HuggingFace Transformers § NLP

    model § deepset/ deberta-v3-base-injection (local) Routing § Frameworks § semantic-routing § FastEmbed § Embedding model § intfloat/ multilingual-e5-large (local) – 1024 dims § Vector store § PostgreSql (pgvector) Architecting with Semantic AI A Language Model & an Embedding Model Walk Into a Bar... Technical implementation – Semantic Guarding & Routing 23
  22. § Methodology § Open Telemetry (OTel) § Frameworks § OTel

    Python packages § Tools § LangFuse § Any OTel-enabled system Architecting with Semantic AI A Language Model & an Embedding Model Walk Into a Bar... Technical implementation - Observability 24