Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Supercharge your Agentic Application with OpenS...

Supercharge your Agentic Application with OpenShift + Redis

AI projects often stall when models can’t deliver results at the speed and scale users expect. In this webinar, discover how Red Hat OpenShift AI and Redis combine to supercharge performance and unlock new possibilities for real-time AI.

Our experts will show how the two platforms work together to enable retrieval-augmented generation (RAG), semantic caching, and LLM context management—helping reduce latency, cut costs, and deliver production-ready AI applications.

You’ll also learn how to deploy Redis on OpenShift and integrate leading AI frameworks to accelerate your path from experimentation to enterprise-grade solutions.

Join us to see how you can take your AI from promising prototypes to fast, scalable, production-ready systems.

Avatar for Raphael De Lio

Raphael De Lio

November 18, 2025
Tweet

More Decks by Raphael De Lio

Other Decks in Programming

Transcript

  1. Agenda 3 Agenda ▸ Open Source LLMs ▸ Challenges in

    AI Adoption ▸ Openshift + Redis ▸ Cost Effective & Reliable Agents ▸ Technical Deep Dive
  2. 4 Introduction The World Changed in November 2022 ChatGPT woke

    the world up to the power of generative AI
  3. The power of open There has been an explosion of

    capability from open-source over the last 2 years Fast, flexible and scalable inference 6 Llama 3, Queen 2 Llama 2 No OSS models Jan 2023 March 2023 Llama RedPajama, MPT, Falcon Sept 2023 Mistral, Granite 2 Jan 2024 Mixtral, Phi-2 May 2023 July 2023 Zephyr Nov 2023 May 2024 DBRX, Phi3 March 2024 DRBX, Granite 3, Qwen2-VL Sept 2024 Gemma 2, Nemotron July 2024 DeepSeek-R1 Jan 2025 Phi-3, Arctic Nov 2024 Llama 4 Apr 2025
  4. Connecting models to data 7 ▸ Open source AI models

    are catching up to proprietary models. ▸ Smaller language models, like IBM Granite, are orders of magnitude smaller than frontier models. ▪ Models with less than 10 billion parameters are cheaper and faster to run, and consume less energy. ▸ These models can be tuned and customized with private enterprise data for domain specific tasks. ▸ Customers own their own models and can create multiple instances for different use cases and deployment environments. The value of open source and smaller language models Smaller models are more efficient & customizable
  5. Introduction 8 Complexity Tuning models with private enterprise data for

    customer use cases is too complex for non-data scientists. Flexibility Enterprise AI use cases span data center, cloud & edge and can’t be constrained to a single public cloud service. Cost Generative AI frontier model services are cost prohibitive at scale for most enterprise customer use cases. Generative AI customer adoption challenges
  6. 9 Red Hat AI provides a platform for consistently building,

    deploying and running AI models, AI-enabled applications, and AI agents across the hybrid cloud at scale. Scaling AI across the hybrid cloud Gather and prepare data Tune the model Model monitoring and management Integrate models in application development AI platform Edge Private Cloud Physical Virtual Public Cloud Hardware Acceleration It provides: ▸ An efficient inference runtime (vLLM) ▸ Validated and optimized third-party models ▸ InstructLab and RAG for customization ▸ MLOps and LLMOps capabilities ▸ Monitoring, bias detection and guardrails
  7. 10 Red Hat AI overview Trusted, Consistent and Comprehensive foundation

    Edge Private Cloud Physical Virtual Public Cloud Hardware Acceleration * NVIDIA, AMD, Intel, Google TPU supported in Red Hat AI. AWS Inferentia/Neuron IBM AIU are on our roadmap
  8. Red Hat OpenShift AI 11 11 Integrated AI platform Create

    and deliver gen AI and predictive models at scale across hybrid cloud environments. Available as • Fully managed cloud service • Traditional software product on-site or in the cloud! Model development Bring your own models or customize Granite models to your use case with your data. Supports integration of multiple AI/ML libraries, frameworks, and runtimes. Lifecycle management Expand DevOps practices to MLOps to manage the entire AI/ML lifecycle. Model serving and monitoring Deploy models across any OpenShift footprint and centrally monitor their performance. Resource optimization and management Scale to meet workload demands of gen AI and predictive models. Share resources, projects, and models across environments.
  9. Gather and prepare data Deploy models in an application Model

    monitoring and management Develop models Detailed look integrating our partner ecosystem Application platform Accelerators Infrastructure ISV software and services Customer managed applications 3rd party models and repositories Red Hat software and cloud services Red Hat cloud platform Overview of Red Hat OpenShift AI NVIDIA NIM Granite models Physical Virtual Edge
  10. Auto Pilot Horizontal/vertical scaling, auto-config tuning, abnormal detection, schedule tuning

    Seamless Upgrades Patch and minor version upgrades supported Basic Install Automated application provisioning and configuration management Redis Operator Automating Redis on OpenShift ✔ Runs on OpenShift ✔ Certified operators ✔ Fully containerized ✔ Vendor supported ✔ Vulnerability scans Operator capability level Self-service access to application workloads, managed service-like experience. Consistent packaging, deployment and life cycle management across Openshift footprints Extends and orchestrates Kubernetes. Streamline and automate installation, updates, back-ups, and maintenance of container-based services. Redis provides product support. When Red Hat publishes a security advisory, Red Hat scans partner container images for important vulnerabilities Certified Operator Level I Level II Level III Level IV Level V Deep Insights Metrics, alerts, log processing and workload analysis Full Lifecycle Application lifecycle, storage lifecycle (backup, failure, recovery)
  11. 14 ⓒ 2025 Redis Ltd. All rights reserved. Container native

    for K8s, OpenShift Multiple tenancy model for cluster & namespace isolation
  12. The new stack for AI agents Redis leads as the

    most-used tool for agent data and vector search. Check the survey the full survey at https://survey.stackoverflow.co/2025/
  13. Not all context is good context Bigger input sizes hurt

    performance—and your budget. GPT-5 API Price: $1.25 / 1M Input tokens - $10 / 1M Output tokens Source: https://openai.com/index/introducing-gpt-5-for-developers/
  14. Reliable & Cost Effective Agents with Redis Semantic caching Pre

    Generated Generated On Demand Semantic routing Pre Generated Semantic Classification Semantic Tool Calling Semantic Guardrails
  15. Reliable & Cost Effective Agents with Redis Semantic caching Pre

    Generated Generated On Demand Semantic routing Pre Generated Semantic Classification Semantic Tool Calling Semantic Guardrails Reliability Rate Limiting Resumable Workflows Availability Multi Cluster Active-Active
  16. Reliable & Cost Effective Agents with Redis Semantic caching Pre

    Generated Generated On Demand Semantic routing Pre Generated Semantic Classification Semantic Tool Calling Semantic Guardrails Reliability Rate Limiting Resumable Workflows Availability Multi Cluster Active-Active Long-Term Memory Context Aware (RAG) Short-term Memory Agent Awareness
  17. Reliable & Cost Effective Agents with Redis Semantic caching Pre

    Generated Generated On Demand Semantic routing Pre Generated Semantic Classification Semantic Tool Calling Semantic Guardrails
  18. 22 ⓒ 2025 Redis Ltd. All rights reserved. Semantic routing

    Semantic routing is the process of directing user requests to the most appropriate model, service, or function based on the meaning (semantics) of the input rather than just its keywords or structure. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency HR documentation Customer support tickets Product specifications User Router What's the policy for PTO?
  19. 24 ⓒ 2025 Redis Ltd. All rights reserved. Semantic classification

    Semantic classification is an application of semantic routing that categorizes inputs into predefined classes or labels based on their meaning rather than surface-level keywords. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency Is this about Redis? LLM True/false Response Every repeated LLM call is money on fire. Redis 8 semantic caching understands meaning, not just keys. open.substack.com/pub/systemde... Social Media Post Prompt With LLMs
  20. 25 ⓒ 2025 Redis Ltd. All rights reserved. Semantic classification

    Semantic classification is an application of semantic routing that categorizes inputs into predefined classes or labels based on their meaning rather than surface-level keywords. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Preparation LLM Pro tip: Use SCAN instead of KEYS in production. KEYS blocks the entire server while SCAN is non-blocking. Remember when everyone said Redis is just a cache? Now it powers real-time leaderboards, pub/sub systems, full applications. Evolution in action. PostgreSQL vs Redis for caching debate misses the point. Use Redis as L1 cache, PG as source of truth. Why choose when you can have both? Our Redis instance has been running 847 days without restart. Rock solid stability 💪 #redis #uptime Generate 150 social media posts about Redis [...]
  21. 26 ⓒ 2025 Redis Ltd. All rights reserved. Vector database

    Semantic classification Semantic classification is an application of semantic routing that categorizes inputs into predefined classes or labels based on their meaning rather than surface-level keywords. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Preparation Redis is the fastest tool for performing semantic caching Remember when everyone said Redis is just a cache? Now it powers real-time leaderboards, pub/sub systems, full applications. Evolution in action. PostgreSQL vs Redis for caching debate misses the point. Use Redis as L1 cache, PG as source of truth. Why choose when you can have both? Our Redis instance has been running 847 days without restart. Rock solid stability 💪 #redis #uptime [...] Embedding model Embed references Store embeddings
  22. 27 ⓒ 2025 Redis Ltd. All rights reserved. Semantic classification

    Semantic classification is an application of semantic routing that categorizes inputs into predefined classes or labels based on their meaning rather than surface-level keywords. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Operation New real post Embed Embedding model Vector database Similarity search Redis is the fastest tool for performing semantic caching Similarity score: 0.2843 Is it similar enough? Every repeated LLM call is money on fire. Redis 8 semantic caching understands meaning, not just keys. open.substack.com/pub/systemde...
  23. 29 ⓒ 2025 Redis Ltd. All rights reserved. Semantic tool

    calling Semantic tool calling is an application of semantic routing where a system interprets the meaning of a user’s input to intelligently select and execute the most appropriate tool or API for the task. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With LLMs Agent These are the available tools LLM Call tool X
  24. 30 ⓒ 2025 Redis Ltd. All rights reserved. Semantic tool

    calling Semantic tool calling is an application of semantic routing where a system interprets the meaning of a user’s input to intelligently select and execute the most appropriate tool or API for the task. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Preparation What's the weather like? Tool: get_weather_default_city Will it rain today? Hello! What can you do? Tool: greeting_and_help Hello! How can you help me? Tool: greeting_and_help Do I have any notifications? Tool: new_notifications Read my notifications Tool: new_notifications Turn on the lights Tool: turn_on_the_lights_room Make the lights light Tool: turn_on_the_lights_room Tool: get_weather_default_city
  25. 31 ⓒ 2025 Redis Ltd. All rights reserved. Vector Database

    Store embeddings Semantic tool calling Semantic tool calling is an application of semantic routing where a system interprets the meaning of a user’s input to intelligently select and execute the most appropriate tool or API for the task. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Preparation Embedding Model Embed references [...] Tool: [...] What's the weather like? Tool: get_weather_default_city Will it rain today? Hello! What can you do? Tool: greeting_and_help Tool: get_weather_default_city
  26. 32 ⓒ 2025 Redis Ltd. All rights reserved. Semantic tool

    calling Semantic tool calling is an application of semantic routing where a system interprets the meaning of a user’s input to intelligently select and execute the most appropriate tool or API for the task. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Operation Hey!! What are you capable of?? User prompt Hello! What can you do? Similarity Score: 0.0459 Tool: greeting_and_help Embedding Model Embed Vector Database Similarity search
  27. 34 ⓒ 2025 Redis Ltd. All rights reserved. Semantic guardrails

    Semantic guardrails are an application of semantic routing that use meaning-based understanding to detect and prevent undesired, unsafe, or off-topic model outputs before they occur. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency → More secure because is not prone to prompt injection or LLM jailbreaking With Vector Search Operation New real post Embed Embedding model Vector database Similarity search jokes about aliens Similarity score: 0.2843 Is it similar enough? Tell me a joke about extraterrestrials
  28. 36 ⓒ 2025 Redis Ltd. All rights reserved. Semantic caching

    Semantic caching uses semantic search to retrieve and reuse past responses that are meaningfully similar to a new query. → 15x faster than complete calling LLM API → Up to 90% less cost from calls to LLMs Check cache for similar queries Respond from cache (if cache hit ✅) LLM User Invoke LLM API (if cache miss ❌) Save LLM response in cache App
  29. 39 ⓒ 2025 Redis Ltd. All rights reserved. Pre Generated

    Semantic caching Pre-generated semantic caching uses semantic search to serve responses that were generated in advance for anticipated user queries, enabling instant retrieval with minimal processing. → 15x faster than complete calling LLM API → Up to 90% less cost from calls to LLMs Check cache for similar queries Respond from cache (if cache hit ✅) LLM User Invoke LLM API (if cache miss ❌) Save LLM response in cache App
  30. Reliable & Cost Effective Agents with Redis Reliability Rate Limiting

    Resumable Workflows Availability Multi Cluster Active-Active
  31. 41 ⓒ 2025 Redis Ltd. All rights reserved. Multi Cluster

    Active-Active GEO A globally distributed Redis architecture that keeps data synchronized across regions for agentic systems. → Low-latency by serving requests from the nearest region for faster retrieval and response times → Seamless recovery through automatic synchronization and failover across global clusters → Continuous availability with always-on data access, even during regional disruptions Ensures agents stay responsive, even under failure. Rate limiting A technique to control the rate at which requests are sent or processed in order to maintain system stability and reduce LLM costs. → Balance loads across LLMs → Prevent abuse from bad actors or rogue apps Resumable Workflows A mechanism that uses Redis Streams to checkpoint progress between agents or tasks, allowing execution to resume from the last successful step instead of restarting the entire process. → Recover gracefully from partial failures without losing progress → Replay or resume tasks from saved checkpoints to ensure reliability and efficiency
  32. Reliable & Cost Effective Agents with Redis Long-Term Memory Context

    Aware (RAG) Short-term Memory Agent Awareness
  33. 43 ⓒ 2025 Redis Ltd. All rights reserved. Redis Redis

    Agent Memory Server Our Agent Memory Server makes LLM responses more relevant & useful by managing short-term and long-term memory. Short-term memory → Automatic summarization → Configurable window sizes for recent messages Long-term memory → Search for relevant memories → Extract topic & named entity recognition → Namespace support for proper isolation Agent Memory Server User input AI app Short-term memory management Long-term memory management LLMs
  34. 44 ⓒ 2025 Redis Ltd. All rights reserved. RAG A

    pattern where any and all related content is retrieved from a trusted data source, augmented with a user request, and sent to an LLM to generate a response. → Reduce hallucinations by inserting relevant info into the LLM context → Stay fresh by adding up-to-date details and proprietary info into LLM responses Relevant document Documents LLM Question Answer Retrieval Augmented generation Retrieval-augmented generation
  35. ⓒ 2025 Redis Ltd. All rights reserved. 45 Solutions Architect

    Red Hat Camille Nigon Developer Advocate Redis Raphael De Lio Resources: • Blog: Supercharge your AI with OpenShift AI and Redis • Redis for AI • Redis: Upcoming Webinars • Red Hat AI • Red Hat Summit Connect