Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Krakow - Elastic Red Hat Meetup

Avatar for Shaaf Syed Shaaf Syed
May 15, 2025
8

Krakow - Elastic Red Hat Meetup

The talk explores how to build and deploy a Retrieval Augmented Generation (RAG) application using OpenShift AI and Podman AI Lab. He will introduce the Podman AI Lab extension, demonstrating how it enables local model testing before deploying into production environments. The session covers vector search fundamentals, demonstrating how Elasticsearch handles vector embeddings for efficient information retrieval. He will also discuss RAG use cases, showcasing how this approach enhances chatbot responses by combining LLMs with real-time data. Finally, he will walk through a live demo of deploying a chatbot on OpenShift AI, integrating Elasticsearch for vector search using LangChain. Attendees will gain a practical understanding of vector-based applications and how to bring them to production environments.

Avatar for Shaaf Syed

Shaaf Syed

May 15, 2025
Tweet

More Decks by Shaaf Syed

Transcript

  1. Building and deploying AI infused apps with Elasticsearch using Podman

    and OpenShift AI Syed M Shaaf Sr. Principal Developer Advocate Red Hat
  2. 2 Building and deploying AI Infused applications • Java developer,

    advocate, architect, engineer… • Open source enthusiast, contributor • InfoQ Java Technical Editor • Ask me about #Java, backends, architecture, containers.. • Trainer, coach fosstodon.org/@shaaf sshaaf https://www.linkedin.com/in/shaaf/ shaaf.dev https://bsky.app/profile/shaaf.dev
  3. @shaaf.dev • Systems do not speak Natural language, can’t translate

    and lack context outside of system boundaries. (e.g. sentiment) • Generating content is costly and sometimes hard. • Rapid data growth • Rising Expectations: Customers demand instant, personalized solutions. • Inefficiency: Manual processes increase costs and slow operations. • Skill Gaps: Limited expertise in AI adoption. Systems, Data, Networks and a Solution?
  4. @shaaf.dev Foundation models Learning without labels, adapt, tune, massive data

    appetite • Tasks ◦ Translation, Summarization, Writing, Q&A • “Attention is All you need”, Transformer architecture • Recognize, Predict, and Generate text • Trained on a Billions of words • Can also be tuned further A LLM predicts the next token based on its training data and statistical deduction Large Language Models
  5. @shaaf.dev Tokens Tokenization: breaking down text into tokens. e.g., Byte

    Pair Encoding (BPE) or WordPiece); handle diverse languages and manage vocabulary size efficiently. [12488, 6391, 4014, 316, 1001, 6602, 11, 889, 1236, 4128, 25, 3862, 181386, 364, 61064, 9862, 1299, 166700, 1340, 413, 12648, 1511, 1991, 20290, 15683, 290, 27899, 11643, 25, 93643, 248, 52622, 122, 279, 168191, 328, 9862, 22378, 2491, 2613, 316, 2454, 1273, 1340, 413, 73263, 4717, 25, 220, 7633, 19354, 29338, 15] https://platform.openai.com/tokenizer "Running", “unpredictability” (word-based tokenization). Or: "run" " ning" ; “un” “predict” “ability” (subword-based tokenization, used by many LLMs). “Building Large Language Models from scratch” - Sebastian Raschka
  6. 1 0 Thank you! Source for the demo https://github.com/sshaaf/gpt-java-chatbot Syed

    M Shaaf Developer Advocate Red Hat fosstodon.org/@shaaf sshaaf https://www.linkedin.com/in/shaaf/ shaaf.dev https://bsky.app/profile/shaaf.dev