Supercharge your Agentic Application with OpenShift + Redis

Supercharge your AI with OpenShift AI and Redis AI ·
(Redis + Openshift) = 🥇

Speakers 2 Solutions Architect Red Hat Camille Nigon Developer Advocate
Redis Raphael De Lio

Agenda 3 Agenda ▸ Open Source LLMs ▸ Challenges in
AI Adoption ▸ Openshift + Redis ▸ Cost Effective & Reliable Agents ▸ Technical Deep Dive

4 Introduction The World Changed in November 2022 ChatGPT woke
the world up to the power of generative AI

5 Introduction Some observations …

The power of open There has been an explosion of
capability from open-source over the last 2 years Fast, ﬂexible and scalable inference 6 Llama 3, Queen 2 Llama 2 No OSS models Jan 2023 March 2023 Llama RedPajama, MPT, Falcon Sept 2023 Mistral, Granite 2 Jan 2024 Mixtral, Phi-2 May 2023 July 2023 Zephyr Nov 2023 May 2024 DBRX, Phi3 March 2024 DRBX, Granite 3, Qwen2-VL Sept 2024 Gemma 2, Nemotron July 2024 DeepSeek-R1 Jan 2025 Phi-3, Arctic Nov 2024 Llama 4 Apr 2025

Connecting models to data 7 ▸ Open source AI models
are catching up to proprietary models. ▸ Smaller language models, like IBM Granite, are orders of magnitude smaller than frontier models. ▪ Models with less than 10 billion parameters are cheaper and faster to run, and consume less energy. ▸ These models can be tuned and customized with private enterprise data for domain speciﬁc tasks. ▸ Customers own their own models and can create multiple instances for different use cases and deployment environments. The value of open source and smaller language models Smaller models are more efﬁcient & customizable

Introduction 8 Complexity Tuning models with private enterprise data for
customer use cases is too complex for non-data scientists. Flexibility Enterprise AI use cases span data center, cloud & edge and can’t be constrained to a single public cloud service. Cost Generative AI frontier model services are cost prohibitive at scale for most enterprise customer use cases. Generative AI customer adoption challenges

9 Red Hat AI provides a platform for consistently building,
deploying and running AI models, AI-enabled applications, and AI agents across the hybrid cloud at scale. Scaling AI across the hybrid cloud Gather and prepare data Tune the model Model monitoring and management Integrate models in application development AI platform Edge Private Cloud Physical Virtual Public Cloud Hardware Acceleration It provides: ▸ An efﬁcient inference runtime (vLLM) ▸ Validated and optimized third-party models ▸ InstructLab and RAG for customization ▸ MLOps and LLMOps capabilities ▸ Monitoring, bias detection and guardrails

10 Red Hat AI overview Trusted, Consistent and Comprehensive foundation
Edge Private Cloud Physical Virtual Public Cloud Hardware Acceleration * NVIDIA, AMD, Intel, Google TPU supported in Red Hat AI. AWS Inferentia/Neuron IBM AIU are on our roadmap

Red Hat OpenShift AI 11 11 Integrated AI platform Create
and deliver gen AI and predictive models at scale across hybrid cloud environments. Available as • Fully managed cloud service • Traditional software product on-site or in the cloud! Model development Bring your own models or customize Granite models to your use case with your data. Supports integration of multiple AI/ML libraries, frameworks, and runtimes. Lifecycle management Expand DevOps practices to MLOps to manage the entire AI/ML lifecycle. Model serving and monitoring Deploy models across any OpenShift footprint and centrally monitor their performance. Resource optimization and management Scale to meet workload demands of gen AI and predictive models. Share resources, projects, and models across environments.

Gather and prepare data Deploy models in an application Model
monitoring and management Develop models Detailed look integrating our partner ecosystem Application platform Accelerators Infrastructure ISV software and services Customer managed applications 3rd party models and repositories Red Hat software and cloud services Red Hat cloud platform Overview of Red Hat OpenShift AI NVIDIA NIM Granite models Physical Virtual Edge

Auto Pilot Horizontal/vertical scaling, auto-config tuning, abnormal detection, schedule tuning
Seamless Upgrades Patch and minor version upgrades supported Basic Install Automated application provisioning and configuration management Redis Operator Automating Redis on OpenShift ✔ Runs on OpenShift ✔ Certified operators ✔ Fully containerized ✔ Vendor supported ✔ Vulnerability scans Operator capability level Self-service access to application workloads, managed service-like experience. Consistent packaging, deployment and life cycle management across Openshift footprints Extends and orchestrates Kubernetes. Streamline and automate installation, updates, back-ups, and maintenance of container-based services. Redis provides product support. When Red Hat publishes a security advisory, Red Hat scans partner container images for important vulnerabilities Certified Operator Level I Level II Level III Level IV Level V Deep Insights Metrics, alerts, log processing and workload analysis Full Lifecycle Application lifecycle, storage lifecycle (backup, failure, recovery)

14 ⓒ 2025 Redis Ltd. All rights reserved. Container native
for K8s, OpenShift Multiple tenancy model for cluster & namespace isolation

The new stack for AI agents Redis leads as the
most-used tool for agent data and vector search. Check the survey the full survey at https://survey.stackoverﬂow.co/2025/

Not all context is good context Bigger input sizes hurt
performance—and your budget. GPT-5 API Price: $1.25 / 1M Input tokens - $10 / 1M Output tokens Source: https://openai.com/index/introducing-gpt-5-for-developers/

Reliable & Cost Effective Agents with Redis

Reliable & Cost Effective Agents with Redis Semantic caching Pre
Generated Generated On Demand Semantic routing Pre Generated Semantic Classiﬁcation Semantic Tool Calling Semantic Guardrails

Generated Generated On Demand Semantic routing Pre Generated Semantic Classiﬁcation Semantic Tool Calling Semantic Guardrails Reliability Rate Limiting Resumable Workﬂows Availability Multi Cluster Active-Active

Generated Generated On Demand Semantic routing Pre Generated Semantic Classiﬁcation Semantic Tool Calling Semantic Guardrails Reliability Rate Limiting Resumable Workﬂows Availability Multi Cluster Active-Active Long-Term Memory Context Aware (RAG) Short-term Memory Agent Awareness

Generated Generated On Demand Semantic routing Pre Generated Semantic Classiﬁcation Semantic Tool Calling Semantic Guardrails

22 ⓒ 2025 Redis Ltd. All rights reserved. Semantic routing
Semantic routing is the process of directing user requests to the most appropriate model, service, or function based on the meaning (semantics) of the input rather than just its keywords or structure. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency HR documentation Customer support tickets Product specifications User Router What's the policy for PTO?

Semantic classiﬁcation is an application of semantic routing that categorizes inputs into predeﬁned classes or labels based on their meaning rather than surface-level keywords. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency Is this about Redis? LLM True/false Response Every repeated LLM call is money on fire. Redis 8 semantic caching understands meaning, not just keys. open.substack.com/pub/systemde... Social Media Post Prompt With LLMs

Semantic classiﬁcation is an application of semantic routing that categorizes inputs into predeﬁned classes or labels based on their meaning rather than surface-level keywords. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Preparation LLM Pro tip: Use SCAN instead of KEYS in production. KEYS blocks the entire server while SCAN is non-blocking. Remember when everyone said Redis is just a cache? Now it powers real-time leaderboards, pub/sub systems, full applications. Evolution in action. PostgreSQL vs Redis for caching debate misses the point. Use Redis as L1 cache, PG as source of truth. Why choose when you can have both? Our Redis instance has been running 847 days without restart. Rock solid stability 💪 #redis #uptime Generate 150 social media posts about Redis [...]

26 ⓒ 2025 Redis Ltd. All rights reserved. Vector database
Semantic classification Semantic classification is an application of semantic routing that categorizes inputs into predefined classes or labels based on their meaning rather than surface-level keywords. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Preparation Redis is the fastest tool for performing semantic caching Remember when everyone said Redis is just a cache? Now it powers real-time leaderboards, pub/sub systems, full applications. Evolution in action. PostgreSQL vs Redis for caching debate misses the point. Use Redis as L1 cache, PG as source of truth. Why choose when you can have both? Our Redis instance has been running 847 days without restart. Rock solid stability 💪 #redis #uptime [...] Embedding model Embed references Store embeddings

Semantic classiﬁcation is an application of semantic routing that categorizes inputs into predeﬁned classes or labels based on their meaning rather than surface-level keywords. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Operation New real post Embed Embedding model Vector database Similarity search Redis is the fastest tool for performing semantic caching Similarity score: 0.2843 Is it similar enough? Every repeated LLM call is money on fire. Redis 8 semantic caching understands meaning, not just keys. open.substack.com/pub/systemde...

28 ⓒ 2025 Redis Ltd. All rights reserved. Semantic tool
calling

calling Semantic tool calling is an application of semantic routing where a system interprets the meaning of a user’s input to intelligently select and execute the most appropriate tool or API for the task. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With LLMs Agent These are the available tools LLM Call tool X

calling Semantic tool calling is an application of semantic routing where a system interprets the meaning of a user’s input to intelligently select and execute the most appropriate tool or API for the task. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Preparation What's the weather like? Tool: get_weather_default_city Will it rain today? Hello! What can you do? Tool: greeting_and_help Hello! How can you help me? Tool: greeting_and_help Do I have any notifications? Tool: new_notifications Read my notifications Tool: new_notifications Turn on the lights Tool: turn_on_the_lights_room Make the lights light Tool: turn_on_the_lights_room Tool: get_weather_default_city

31 ⓒ 2025 Redis Ltd. All rights reserved. Vector Database
Store embeddings Semantic tool calling Semantic tool calling is an application of semantic routing where a system interprets the meaning of a user’s input to intelligently select and execute the most appropriate tool or API for the task. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Preparation Embedding Model Embed references [...] Tool: [...] What's the weather like? Tool: get_weather_default_city Will it rain today? Hello! What can you do? Tool: greeting_and_help Tool: get_weather_default_city

calling Semantic tool calling is an application of semantic routing where a system interprets the meaning of a user’s input to intelligently select and execute the most appropriate tool or API for the task. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency With Vector Search Operation Hey!! What are you capable of?? User prompt Hello! What can you do? Similarity Score: 0.0459 Tool: greeting_and_help Embedding Model Embed Vector Database Similarity search

34 ⓒ 2025 Redis Ltd. All rights reserved. Semantic guardrails
Semantic guardrails are an application of semantic routing that use meaning-based understanding to detect and prevent undesired, unsafe, or oﬀ-topic model outputs before they occur. → Reduce tokens by leveraging vector search instead of LLM calls → Speeds up your agent because vector search latency is much slower than LLM latency → More secure because is not prone to prompt injection or LLM jailbreaking With Vector Search Operation New real post Embed Embedding model Vector database Similarity search jokes about aliens Similarity score: 0.2843 Is it similar enough? Tell me a joke about extraterrestrials

36 ⓒ 2025 Redis Ltd. All rights reserved. Semantic caching
Semantic caching uses semantic search to retrieve and reuse past responses that are meaningfully similar to a new query. → 15x faster than complete calling LLM API → Up to 90% less cost from calls to LLMs Check cache for similar queries Respond from cache (if cache hit ✅) LLM User Invoke LLM API (if cache miss ❌) Save LLM response in cache App

Semantic caching

Semantic caching Pre-generated semantic caching uses semantic search to serve responses that were generated in advance for anticipated user queries, enabling instant retrieval with minimal processing. → 15x faster than complete calling LLM API → Up to 90% less cost from calls to LLMs Check cache for similar queries Respond from cache (if cache hit ✅) LLM User Invoke LLM API (if cache miss ❌) Save LLM response in cache App

Reliable & Cost Effective Agents with Redis Reliability Rate Limiting
Resumable Workﬂows Availability Multi Cluster Active-Active

41 ⓒ 2025 Redis Ltd. All rights reserved. Multi Cluster
Active-Active GEO A globally distributed Redis architecture that keeps data synchronized across regions for agentic systems. → Low-latency by serving requests from the nearest region for faster retrieval and response times → Seamless recovery through automatic synchronization and failover across global clusters → Continuous availability with always-on data access, even during regional disruptions Ensures agents stay responsive, even under failure. Rate limiting A technique to control the rate at which requests are sent or processed in order to maintain system stability and reduce LLM costs. → Balance loads across LLMs → Prevent abuse from bad actors or rogue apps Resumable Workﬂows A mechanism that uses Redis Streams to checkpoint progress between agents or tasks, allowing execution to resume from the last successful step instead of restarting the entire process. → Recover gracefully from partial failures without losing progress → Replay or resume tasks from saved checkpoints to ensure reliability and eﬃciency

Reliable & Cost Effective Agents with Redis Long-Term Memory Context
Aware (RAG) Short-term Memory Agent Awareness

43 ⓒ 2025 Redis Ltd. All rights reserved. Redis Redis
Agent Memory Server Our Agent Memory Server makes LLM responses more relevant & useful by managing short-term and long-term memory. Short-term memory → Automatic summarization → Conﬁgurable window sizes for recent messages Long-term memory → Search for relevant memories → Extract topic & named entity recognition → Namespace support for proper isolation Agent Memory Server User input AI app Short-term memory management Long-term memory management LLMs

44 ⓒ 2025 Redis Ltd. All rights reserved. RAG A
pattern where any and all related content is retrieved from a trusted data source, augmented with a user request, and sent to an LLM to generate a response. → Reduce hallucinations by inserting relevant info into the LLM context → Stay fresh by adding up-to-date details and proprietary info into LLM responses Relevant document Documents LLM Question Answer Retrieval Augmented generation Retrieval-augmented generation

ⓒ 2025 Redis Ltd. All rights reserved. 45 Solutions Architect
Red Hat Camille Nigon Developer Advocate Redis Raphael De Lio Resources: • Blog: Supercharge your AI with OpenShift AI and Redis • Redis for AI • Redis: Upcoming Webinars • Red Hat AI • Red Hat Summit Connect

Supercharge your Agentic Application with OpenS...

Supercharge your Agentic Application with OpenShift + Redis

More Decks by Raphael De Lio

Other Decks in Programming

Featured

Transcript