Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond Microservices: Running VMs, WASM and AI ...

Beyond Microservices: Running VMs, WASM and AI Workloads on Kubernetes

Kubernetes has evolved far beyond its roots as an orchestration platform. Today, it's a powerful and flexible engine capable of managing a wide variety of workloads, from virtual machines to WebAssembly applications to AI inference models. In this talk, we'll explore how Kubernetes can be leveraged to run non-traditional workloads, tackling common challenges and showcasing cutting-edge solutions.

We'll start by discussing KubeVirt, a technology that enables seamless migration of on-premise VMs into a Kubernetes environment, making it easier to consolidate infrastructure and modernize legacy systems. Next, we’ll dive into WebAssembly (WASM) workloads using SpinKube, highlighting WASM’s security, sandboxing, and near-instant startup times that make it ideal for lightweight cloud-native applications. Lastly, we’ll cover running AI models and inferences on Kubernetes with an emphasis on optimizing performance through proper cluster topology, hardware considerations like GPUs, and scalable configurations.

Throughout the talk, I will showcase live demos of each workload type—VMs, WASM apps, and AI models—running on Kubernetes, providing practical examples and real-world use cases. Whether you're looking to expand Kubernetes usage in your projects or exploring new workloads to orchestrate, this session will offer valuable insights and hands-on solutions for diverse compute needs.

Avatar for M.-Leander Reimer

M.-Leander Reimer

July 02, 2025
Tweet

More Decks by M.-Leander Reimer

Other Decks in Technology

Transcript

  1. qaware.de Beyond Microservices: Running VMs, WASM, and AI Workloads on

    Kubernetes Mario-Leander Reimer [email protected] @qaware #CloudNativeNerd
  2. A Quick History Lesson on Kubernetes QAware | 3 namespaces

    Linux 2.4.19 04 Aug 2002 cgroups Linux 2.6.24 24 Jan 2008 Kubernetes 1.0 21 Jul 2015 Kubernetes 0.0.1 06 Jun 2014 Docker 0.1 20 Mar 2013 Docker 1.0 9 June 2014 Kubernetes 1.4 26 Sep 2016 "Making things easy …" Kubernetes 1.10 26 Mar 2018 "containerD CRI as default" Kubernetes 1.5 13 Dec 2016 "Alpha of CRI" Kubernetes 1.24 3 May 2022 "CRI only" Kubernetes 1.20 9 Dec 2020 "The Radest Release" Kubernetes 1.30 17 Apr 2024 "Uwubernetes" Kubernetes 1.33 23 Apr 2025 Kubernetes 1.25 23 Aug 2022 "Combiner"
  3. Most often used Kubernetes Concepts and Resources ▪ Pods are

    the smallest deployable compute unit in K8S ▪ Deployments are used to declare pods, volumes and RCs ▪ ReplicaSets ensure the required number of replicas ▪ Labels are key/value pairs that are used for identification ▪ Services are an abstraction for a logical collection of pods ▪ Ingress handles incoming traffic
  4. KubeVirt in a Nutshell ▪ Unified Workload Management: KubeVirt enables

    running and managing virtual machines (VMs) alongside containerized applications within Kubernetes clusters ▪ Kubernetes API Extension: It extends Kubernetes by introducing Custom Resource Definitions (CRDs) like VirtualMachineInstance, allowing VMs to be managed using standard K8s tools ▪ Container-Native Virtualization: Utilizing Kernel-based Virtual Machine (KVM) technology, KubeVirt runs VMs inside Kubernetes pods ▪ Legacy Application Support: KubeVirt allows organizations to run traditional VM-based applications within Kubernetes, aiding in the gradual modernization and migration of legacy systems. Allows for organizations transitioning from virtualization platforms (e.g., VMware) ▪ Open-Source and Community-Driven: Initiated by Red Hat in 2016, KubeVirt is an open-source project under the Cloud Native Computing Foundation (CNCF), with contributions from major industry players. 9 QAware For more information, visit the official KubeVirt website: https://kubevirt.io/
  5. WebAssembly (WASM) – In a Nutshell ▪ A binary instruction

    format for a stack-based virtual machine, enabling high- performance applications to run on the web, outside browsers, and across platforms. ▪ Deliver near-native speed execution in a secure, portable, and efficient format. WebAssembly System Interface (WASI) – In a Nutshell ▪ A standard API designed to provide WebAssembly programs with a secure, minimal set of operating system-like capabilities (e.g., file access, networking) — without depending on a specific OS. ▪ Make WebAssembly a viable platform for running applications outside the browser, safely and portably.
  6. Pros and Cons ▪ Language Flexibility: Supports multiple languages, e.g.,

    Rust, Go, C, JavaScript. ▪ Portability: Write once, run anywhere — same binaries across browsers, servers, edge devices. ▪ High Performance: Fast Startup, Small size and Near-native execution speed ▪ Security: Strong sandboxing model greatly reduces the attack surface ▪ Expanding Ecosystem (WASI): Enables server- side and standalone use cases beyond browsers. ▪ Modularity and Embedding: Easy to embed WASM engines into applications and platforms. 13 QAware ▪ Maturity Challenges: Ecosystem tools (debuggers, profilers, compilers) are improving but still young. ▪ Complexity: Building and optimizing for WASM sometimes requires non-trivial toolchain setup. ▪ WASI Still Growing: Full POSIX-like feature parity (threads, sockets, etc.) is still in progress. ▪ Limited Multithreading: WebAssembly threads and shared memory support are evolving, but complex. ▪ Security Tradeoffs: Sandboxing adds safety but also restricts on certain low-level capabilities
  7. SpinKube in a Nutshell ▪ Kubernetes-Native WebAssembly Platform: Spin and

    SpinKube are open-source projects that enables seamless development, deployment, and operation of WebAssembly (Wasm) workloads within Kubernetes ▪ Integrated Components for Wasm Workloads: It combines several key components: – the Spin Operator for managing Spin applications, – the containerd-shim-spin for executing Wasm modules, – the Runtime Class Manager (formerly KWasm) for Wasm runtime handling ▪ Seamless Integration with Kubernetes Ecosystem: SpinKube integrates with Kubernetes primitives such as DNS, probes, autoscaling, and metrics, allowing developers to utilize existing tools and workflows ▪ Collaborative Development and CNCF Sandbox Project: Developed with contributions from Microsoft, SUSE, Liquid Reply, and Fermyon, SpinKube is a CNCF sandbox project 16 QAware For more information, visit the official SpinKube website: https://spinkube.dev/
  8. 19 AI is everywhere, but … • dependency on LLMs

    hosted by large US companies • closed source models, training data is unknown • concerns about data privacy and regulations Sovereignty and privacy in EU • self-hosted AI applications and models running on private or air gapped servers • current open source models are good enough for many use cases • no sharing of organizational (local) knowledge QAware Everybody Else Is Doing It, So Why Can’t We? Foto von BoliviaInteligente auf Unsplash
  9. Ollama in a Nutshell ▪ Ollama is a platform designed

    to run and manage large language models (LLMs) locally on personal devices. ▪ It offers a simple developer experience, using a single command-line tool to download, run, and interact with models. ▪ Ollama focuses heavily on performance optimization and minimal setup, making LLMs accessible without heavy cloud infrastructure. ▪ It supports model customization via Modelfiles (similar to Dockerfiles), enabling tailored AI applications. ▪ Designed for privacy-first and offline capabilities, Ollama is ideal for edge computing and secure environments. 20 QAware
  10. The Kubernetes cluster topology requires precise planning. Otherwise the costs

    will go through the roof! 21 QAware ▪ There are different GPU machines ▪ Not all types are available in all regions ▪ Prices vary drastically, accurate research is recommended ▪ Additional local SSDs are recommended ▪ To be decided: – all nodes with GPU – different nodes optimised for normal as well as GPU workloads https://cloud.google.com/compute/gpus-pricing?hl=de#other-gpu-models
  11. Compliance Plane Integration & Delivery Plane Service Plane Platform Plane

    Operability Resource Plane Compute Data: Local SSD Integration Security Delivery FinOps Quality Plane Data Plane Model Plane User Serving Plane Access Plane Data Modelling Pl.
  12. QAware GmbH | Aschauer Straße 30 | 81549 München |

    GF: Dr. Josef Adersberger, Michael Stehnken, Michael Rohleder, Mario-Leander Reimer Niederlassungen in München, Mainz, Rosenheim, Darmstadt | +49 89 232315-0 | [email protected] Thank you! The next step? Let's talk! Mario-Leander Reimer Managing Director, CTO [email protected] +49 151 61314748