DWX 2025 - Talk to your data

”Talk to your Data”: Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps
Sebastian Gingter [email protected] Developer Consultant

▪ Was Sie ERWARTET ▪ Hintergrundwissen und Theorie zu RAG
▪ Überblick über Semantische Suche ▪ Probleme die auftreten können ▪ Pragmatische Methoden für die Verwendung eigener Daten im RAG ▪ Demos (Python) ▪ Was Sie NICHT erwartet ▪ ChatGPT, CoPilot(s) ▪ Grundlagen von ML ▪ Deep Dives in LLMs, Vektor-Datenbanken, LangChain Talk to your data: Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

”Talk to your Data”: Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps
Sebastian Gingter [email protected] Developer Consultant

4 ▪ Generative AI in business settings ▪ Flexible and
scalable backends ▪ All things .NET ▪ Pragmatic end-to-end architectures ▪ Developer productivity ▪ Software quality [email protected] @phoenixhawk https://www.thinktecture.com Sebastian Gingter Developer Consultant @ Thinktecture AG Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

6 ▪ Short introduction to RAG ▪ Embeddings (and a
bit of theory ) ▪ Indexing ▪ Retrieval ▪ Not good enough? – Indexing II ▪ HyDE & alternative indexing methods ▪ Conclusion Agenda Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

7 Introduction Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to
your data":

8 Use case: Retrieval-augmented generation (RAG) Cleanup & Split Text
Embedding Question Text Embedding Save Query Relevant Text Question LLM Vector DB Embedding model Embedding model Indexing / Embedding QA Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

9 ▪ Search is provided as a tool to the
LLM ▪ LLM then can decide to call the tool to search on its own ▪ LLM also decides the search term (could be problematic) Alternative: Agentic RAG Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

10 ▪ Classic search: lexical ▪ Compares words, parts of
words and variants ▪ Classic SQL: WHERE ‘content’ LIKE ‘%searchterm%’ ▪ We can search only for things where we know that it is somewhere in the text ▪ New: Semantic search ▪ Compares for the same contextual meaning ▪ “Das Rudel rollt das runde Gerät auf dem Rasen herum” ▪ “The pack enjoys rolling a round thing on the green grass” ▪ “Die Hunde spielen auf der Wiese mit dem Ball” ▪ “The dogs play with the ball on the meadow” Semantic Search Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

11 ▪ How to grasp “semantics”? ▪ Computers only calculate
on numbers ▪ Computing is “applied mathematics” ▪ AI also only calculates on numbers Semantic Search Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

12 ▪ We need a numeric representation of text ▪
Tokens ▪ We need a numeric representation of meaning ▪ Embeddings Semantic Search Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

13 ▪ Similar to char tables (e.g. ASCII), just with
larger elements ▪ Tokens are parts of text ▪ Words ▪ Syllables ▪ Punctuation ▪ … ▪ Tokens are translated to token IDs ▪ Example: https://platform.openai.com/tokenizer Tokens Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

14 Embeddings Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to
your data":

15 Embedding (math.) ▪ Topologic: Value of a high dimensional
space is “embedded” into a lower dimensional space ▪ Natural / human language is very complex (high dimensional) ▪ Task: Map high complexity to lower complexity / dimensions ▪ Injective function ▪ Similar to hash, or a lossy compression Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

16 ▪ Embedding model (specialized ML model) converting text into
a numeric representation of its meaning ▪ Representation is a Vector in an n-dimensional space ▪ n floating point values ▪ OpenAI ▪ “text-embedding-ada-002” uses 1536 dimensions ▪ “text-embedding-3-small” 512 and 1536 ▪ “text-embedding-3-large” 256, 1024 and 3072 ▪ Huggingface models have a very wide range of dimensions Embeddings https://huggingface.co/spaces/mteb/leaderboard & https://openai.com/blog/new-embedding-models-and-api-updates Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

17 ▪ Embedding models are unique ▪ Each dimension has
a different meaning, individual to the model ▪ Vectors from different models are incompatible with each other ▪ they live in different vector spaces ▪ Some embedding models are multi-language, but not all ▪ In an LLM, also the first step is to embed the input into a lower dimensional space Embeddings Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

18 ▪ Mathematical quantity with a direction and length ▪
Ԧ 𝑎 = 𝑎𝑥 𝑎𝑦 What is a vector? https://mathinsight.org/vector_introduction Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

19 Vectors in 2D Ԧ 𝑎 = 𝑎𝑥 𝑎𝑦 Signifikant
bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

20 Ԧ 𝑎 = 𝑎𝑥 𝑎𝑦 𝑎𝑧 Vectors in 3D
Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

21 Ԧ 𝑎 = 𝑎𝑢 𝑎𝑣 𝑎𝑤 𝑎𝑥 𝑎𝑦 𝑎𝑧
Vectors in multidimensional space Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

22 Calculation with vectors Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps
"Talk to your data":

23 𝐵𝑟𝑜𝑡ℎ𝑒𝑟 − 𝑀𝑎𝑛 + 𝑊𝑜𝑚𝑎𝑛 ≈ 𝑆𝑖𝑠𝑡𝑒𝑟 Word2Vec Mikolov
et al., Google, 2013 Man Woman Brother Sister https://arxiv.org/abs/1301.3781 Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

24 Embedding-Model ▪ Task: Create a vector from an input
▪ Extract meaning / semantics ▪ Embedding models usually are very shallow & fast Word2Vec is only two layers ▪ Similar to the first step of an LLM ▪ Convert text to values for input layer ▪ This comparison is very simplified, but one could say: ▪ The embedding model ‘maps’ the meaning into the model’s ‘brain’ Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

25 Vectors from your Embedding-Model 0 Signifikant bessere LLM-RAG-Lösungen durch
Real-World Tipps "Talk to your data":

26 [ 0.50451 , 0.68607 , -0.59517 , -0.022801, 0.60046
, -0.13498 , -0.08813 , 0.47377 , -0.61798 , -0.31012 , -0.076666, 1.493 , -0.034189, -0.98173 , 0.68229 , 0.81722 , -0.51874 , -0.31503 , -0.55809 , 0.66421 , 0.1961 , -0.13495 , -0.11476 , -0.30344 , 0.41177 , -2.223 , -1.0756 , -1.0783 , -0.34354 , 0.33505 , 1.9927 , -0.04234 , -0.64319 , 0.71125 , 0.49159 , 0.16754 , 0.34344 , -0.25663 , -0.8523 , 0.1661 , 0.40102 , 1.1685 , -1.0137 , -0.21585 , -0.15155 , 0.78321 , -0.91241 , -1.6106 , -0.64426 , -0.51042 ] Embedding-Model http://jalammar.github.io/illustrated-word2vec/ Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

27 Embedding-Model http://jalammar.github.io/illustrated-word2vec/ Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk
to your data":

28 ▪ Select your Embedding Model carefully for your use
case ▪ e.g. ▪ intfloat/multilingual-e5-large-instruct ~ 50 % ▪ T-Systems-onsite/german-roberta-sentence-transformer-v2 < 70 % ▪ danielheinz/e5-base-sts-en-de > 80 % ▪ BAAI/bge-m3 > 95 % ▪ Maybe fine-tuning of the embedding model might be an option ▪ As of now: Treat embedding models as exchangeable commodities! Important Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

29 ▪ Embedding model: “Analog to digital converter for text”
▪ Embeds the high-dimensional natural language meaning into a lower dimensional-space (the model’s ‘brain’) ▪ No magic, just applied mathematics ▪ Math. representation: Vector of n dimensions ▪ Technical representation: array of floating point numbers Recap Embeddings Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

Embeddings Sentence Transformers, local embedding model DEMO Signifikant bessere LLM-RAG-Lösungen
durch Real-World Tipps "Talk to your data":

31 ▪ Similarity determination ▪ Semantic search ▪ Semantic routing
Semantically determine the knowledge base / source for a query ▪ Semantic caching Can be used to cache answers for similar search queries ▪ Categorization ▪ Keyword determination by contextual similarity ▪ etc. Other use-cases: Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

32 Improvement Process Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk
to your data":

33 ▪ We have a LOT of variables ▪ Chunk
size ▪ Chunking strategy ▪ Embedding model ▪ Retrieval methods ▪ How many documents are retrieved (n) ▪ Embedding-only ▪ Hybrid search ▪ Reranking (yes / no, model, amount…) ▪ Plain RAG vs. Agentic RAG ▪ Potential transformations ▪ Potential knowledge graphs Improvement Process Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

34 ▪ This is (computer) science and software engineering ▪
We need to perfectly know what works, and what does not work ▪ We need reproducible experiments ▪ We need to measure our stuff Improvement Process Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

35 ▪ Create, maintain and compare metrics ▪ One possibilty
for Python: https://www.ragas.io/ Improvement Process Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

36 Indexing Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to
your data":

37 Consists of ▪ Loading ▪ Clean-up ▪ Splitting ▪
Embedding ▪ Storing Indexing Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

38 ▪ Import documents from different sources, in different formats
▪ LangChain has very strong support for loading data ▪ Support for cleanup ▪ Support for splitting Loading https://python.langchain.com/docs/integrations/document_loaders Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

39 ▪ HTML Tags ▪ Formatting information ▪ Normalization ▪
lowercasing ▪ stemming, lemmatization ▪ remove punctuation & stop words ▪ Enrichment ▪ tagging ▪ keywords, categories ▪ metadata Clean-up Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

40 ▪ Document is too large / too much content
/ not concise enough Splitting (Text Segmentation) ▪ by size (text length) ▪ by character (\n\n) ▪ by paragraph, sentence, words (until small enough) ▪ by size (tokens) ▪ overlapping chunks (token-wise) Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

41 ▪ Document is too large / too much content
/ not concise enough Splitting / Chunking (Text Segmentation) ▪ by size (text length) ▪ by character (\n\n) ▪ by paragraph, sentence, words (until small enough) ▪ by size (tokens) ▪ overlapping chunks (token-wise) Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

42 ▪ Every sentence gets an embedding ▪ Embeddings for
each sentence are compared with each other ▪ When deviation is too large, we assume a meaning (topic) change ▪ At this border chunks are separated ▪ Needs a lot of vectors and comparisons ▪ Indexing gets slow & expensive Semantic Chunking Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

43 Retrieval (Search) Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk
to your data":

44 Retrieval Embedding- Model Embedding 𝑎 𝑏 𝑐 … Vector-
Database “What is the name of the teacher?” Query Doc. 1: 0.86 Doc. 2: 0.84 Doc. 3: 0.79 Weighted result … (Answer generation) Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

46 Indexing II Not good enough? Signifikant bessere LLM-RAG-Lösungen durch
Real-World Tipps "Talk to your data":

47 Not good enough? ? Signifikant bessere LLM-RAG-Lösungen durch Real-World
Tipps "Talk to your data":

48 ▪ Semantic search still only uses your data ▪
It’s just as good as your embeddings ▪ All chunks need to be sized correctly and distinguishable enough ▪ Garbage in, garbage out Not good enough? Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

49 ▪ Search for a hypothetical Document HyDE (Hypothetical Document
Embedddings) LLM, e.g. GPT-3.5-turbo Embedding 𝑎 𝑏 𝑐 … Vector- Database Doc. 3: 0.86 Doc. 2: 0.81 Doc. 1: 0.81 Weighted result Hypothetical Document Embedding- Model Write a company policy that contains all information which will answer the given question: {QUERY} “What should I do, if I missed the last train?” Query https://arxiv.org/abs/2212.10496 Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

50 ▪ Downside of HyDE: ▪ Each request needs to
be transformed through an LLM (slow & expensive) ▪ A lot of requests will probably be very similar to each other ▪ Each time a different hypothetical document is generated, even for an extremely similar request ▪ Leads to very different results each time ▪ Idea: Alternative indexing ▪ Transform the document, not the query What else? Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

51 Alternative Indexing HyQE: Hypothetical Question Embedding LLM, e.g. GPT-3.5-turbo
Transformed document Write 3 questions, which are answered by the following document. Chunk of Document Embedding- Model Embedding 𝑎 𝑏 𝑐 … Vector- Database Metadata: content of original chunk Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

52 ▪ Retrieval Alternative Indexing Embedding- Model Embedding 𝑎 𝑏
𝑐 … Vector- Database Doc. 3: 0.89 Doc. 1: 0.86 Doc. 2: 0.76 Weighted result Original document from metadata “What should I do, if I missed the last train?” Query Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

Compare embeddings LangChain, Qdrant, OpenAI GPT DEMO Signifikant bessere LLM-RAG-Lösungen
durch Real-World Tipps "Talk to your data":

54 Additional strategies Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk
to your data":

55 ▪ Reranking (after retrieval): Retrieve with much higher n,
rerank, then pick new top n ▪ Agentic RAG: Provide search as tool to LLM and let LLM determine what to search for, potentially refining search terms ▪ Add additional data sources, e.g. knowledge graphs Additional strategies Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

56 ▪ Index different document depths (via LLM calls) in
addition to detailed chunks ▪ Create a summary of the complete document ▪ Create a summary of each chapter ▪ Create a summary of each paragraph ▪ Allows for more general queries instead of nitty gritty detail questions only Additional strategies Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

57 Conclusion Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to
your data":

58 Cleanup & Split Text Embedding Question Text Embedding Save
Query Relevant Text Question LLM Vector DB Embedding model Embedding model Indexing / Embedding QA Retrieval-augmented generation (RAG) Indexing & (Semantic) search Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

59 ▪ Tune text cleanup, segmentation, splitting ▪ HyDE or
HyQE or alternative indexing ▪ How many questions? ▪ With or without summary ▪ Other approaches ▪ Only generate summary ▪ Extract “Intent” from user input and search by that ▪ Transform document and query to a common search embedding ▪ HyKSS: Hybrid Keyword and Semantic Search https://www.deg.byu.edu/papers/HyKSS.pdf ▪ Always evaluate approaches with your own data & queries ▪ The actual / final approach is more involved as it seems on the first glance Recap: Not good enough? Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

60 ▪ Semantic search is a first and fast Generative
AI business use-case ▪ Quality of results depend heavily on data quality and preparation pipeline ▪ RAG pattern can produce breathtaking good results without the need for user training Conclusion Signifikant bessere LLM-RAG-Lösungen durch Real-World Tipps "Talk to your data":

Thank you! Sebastian Gingter https://thinktecture.com/sebastian-gingter Demos: https://github.com/thinktecture-labs/dwx-2025-talk-to-your-data

“Talk to your Data”: Signigfikant bessere LLM-RAG-Lösungen durch Real-World Tipps
Sebastian Gingter [email protected] Developer Consultant Slides & Code https://www.thinktecture.com/de/sebastian-gingter

DWX 2025 - Talk to your data

DWX 2025 - Talk to your data

More Decks by Sebastian Gingter

Other Decks in Programming

Featured

Transcript