Beyond Microservices: Running VMs, WASM and AI Workloads on Kubernetes

qaware.de Beyond Microservices: Running VMs, WASM, and AI Workloads on
Kubernetes Mario-Leander Reimer [email protected] @qaware #CloudNativeNerd

2 Mario-Leander Reimer Managing Director | CTO @LeanderReimer #cloudnativenerd #qaware
#gernperDude

A Quick History Lesson on Kubernetes QAware | 3 namespaces
Linux 2.4.19 04 Aug 2002 cgroups Linux 2.6.24 24 Jan 2008 Kubernetes 1.0 21 Jul 2015 Kubernetes 0.0.1 06 Jun 2014 Docker 0.1 20 Mar 2013 Docker 1.0 9 June 2014 Kubernetes 1.4 26 Sep 2016 "Making things easy …" Kubernetes 1.10 26 Mar 2018 "containerD CRI as default" Kubernetes 1.5 13 Dec 2016 "Alpha of CRI" Kubernetes 1.24 3 May 2022 "CRI only" Kubernetes 1.20 9 Dec 2020 "The Radest Release" Kubernetes 1.30 17 Apr 2024 "Uwubernetes" Kubernetes 1.33 23 Apr 2025 Kubernetes 1.25 23 Aug 2022 "Combiner"

High Level Overview of Kubernetes Components

Most often used Kubernetes Concepts and Resources ▪ Pods are
the smallest deployable compute unit in K8S ▪ Deployments are used to declare pods, volumes and RCs ▪ ReplicaSets ensure the required number of replicas ▪ Labels are key/value pairs that are used for identiﬁcation ▪ Services are an abstraction for a logical collection of pods ▪ Ingress handles incoming traffic

lreimer/vms-wasm-ai-on-k8s

AND NOW FOR SOMETHING COMPLETELY DIFFERENT 7 QAware

Hardware Virtualization vs. Operating System Virtualization

KubeVirt in a Nutshell ▪ Uniﬁed Workload Management: KubeVirt enables
running and managing virtual machines (VMs) alongside containerized applications within Kubernetes clusters ▪ Kubernetes API Extension: It extends Kubernetes by introducing Custom Resource Deﬁnitions (CRDs) like VirtualMachineInstance, allowing VMs to be managed using standard K8s tools ▪ Container-Native Virtualization: Utilizing Kernel-based Virtual Machine (KVM) technology, KubeVirt runs VMs inside Kubernetes pods ▪ Legacy Application Support: KubeVirt allows organizations to run traditional VM-based applications within Kubernetes, aiding in the gradual modernization and migration of legacy systems. Allows for organizations transitioning from virtualization platforms (e.g., VMware) ▪ Open-Source and Community-Driven: Initiated by Red Hat in 2016, KubeVirt is an open-source project under the Cloud Native Computing Foundation (CNCF), with contributions from major industry players. 9 QAware For more information, visit the official KubeVirt website: https://kubevirt.io/

KubeVirt Architecture Overview 10 QAware Node Virtual Machine https://portworx.com/knowledge-hub/understanding-kubevirt/

WebAssembly (WASM) – In a Nutshell ▪ A binary instruction
format for a stack-based virtual machine, enabling high- performance applications to run on the web, outside browsers, and across platforms. ▪ Deliver near-native speed execution in a secure, portable, and efficient format. WebAssembly System Interface (WASI) – In a Nutshell ▪ A standard API designed to provide WebAssembly programs with a secure, minimal set of operating system-like capabilities (e.g., ﬁle access, networking) — without depending on a speciﬁc OS. ▪ Make WebAssembly a viable platform for running applications outside the browser, safely and portably.

Pros and Cons ▪ Language Flexibility: Supports multiple languages, e.g.,
Rust, Go, C, JavaScript. ▪ Portability: Write once, run anywhere — same binaries across browsers, servers, edge devices. ▪ High Performance: Fast Startup, Small size and Near-native execution speed ▪ Security: Strong sandboxing model greatly reduces the attack surface ▪ Expanding Ecosystem (WASI): Enables server- side and standalone use cases beyond browsers. ▪ Modularity and Embedding: Easy to embed WASM engines into applications and platforms. 13 QAware ▪ Maturity Challenges: Ecosystem tools (debuggers, proﬁlers, compilers) are improving but still young. ▪ Complexity: Building and optimizing for WASM sometimes requires non-trivial toolchain setup. ▪ WASI Still Growing: Full POSIX-like feature parity (threads, sockets, etc.) is still in progress. ▪ Limited Multithreading: WebAssembly threads and shared memory support are evolving, but complex. ▪ Security Tradeoffs: Sandboxing adds safety but also restricts on certain low-level capabilities

SpinKube

SpinKube in a Nutshell ▪ Kubernetes-Native WebAssembly Platform: Spin and
SpinKube are open-source projects that enables seamless development, deployment, and operation of WebAssembly (Wasm) workloads within Kubernetes ▪ Integrated Components for Wasm Workloads: It combines several key components: – the Spin Operator for managing Spin applications, – the containerd-shim-spin for executing Wasm modules, – the Runtime Class Manager (formerly KWasm) for Wasm runtime handling ▪ Seamless Integration with Kubernetes Ecosystem: SpinKube integrates with Kubernetes primitives such as DNS, probes, autoscaling, and metrics, allowing developers to utilize existing tools and workﬂows ▪ Collaborative Development and CNCF Sandbox Project: Developed with contributions from Microsoft, SUSE, Liquid Reply, and Fermyon, SpinKube is a CNCF sandbox project 16 QAware For more information, visit the official SpinKube website: https://spinkube.dev/

Supported Kubernetes Distributions 17 QAware https://kwasm.sh/

19 AI is everywhere, but … • dependency on LLMs
hosted by large US companies • closed source models, training data is unknown • concerns about data privacy and regulations Sovereignty and privacy in EU • self-hosted AI applications and models running on private or air gapped servers • current open source models are good enough for many use cases • no sharing of organizational (local) knowledge QAware Everybody Else Is Doing It, So Why Can’t We? Foto von BoliviaInteligente auf Unsplash

Ollama in a Nutshell ▪ Ollama is a platform designed
to run and manage large language models (LLMs) locally on personal devices. ▪ It offers a simple developer experience, using a single command-line tool to download, run, and interact with models. ▪ Ollama focuses heavily on performance optimization and minimal setup, making LLMs accessible without heavy cloud infrastructure. ▪ It supports model customization via Modelfiles (similar to Dockerfiles), enabling tailored AI applications. ▪ Designed for privacy-first and offline capabilities, Ollama is ideal for edge computing and secure environments. 20 QAware

The Kubernetes cluster topology requires precise planning. Otherwise the costs
will go through the roof! 21 QAware ▪ There are different GPU machines ▪ Not all types are available in all regions ▪ Prices vary drastically, accurate research is recommended ▪ Additional local SSDs are recommended ▪ To be decided: – all nodes with GPU – different nodes optimised for normal as well as GPU workloads https://cloud.google.com/compute/gpus-pricing?hl=de#other-gpu-models

Compliance Plane Integration & Delivery Plane Service Plane Platform Plane
Operability Resource Plane Compute Data: Local SSD Integration Security Delivery FinOps Quality Plane Data Plane Model Plane User Serving Plane Access Plane Data Modelling Pl.

lreimer/k8s-native-ai-platform lreimer/k3s-ai-platform

QAware GmbH | Aschauer Straße 30 | 81549 München |
GF: Dr. Josef Adersberger, Michael Stehnken, Michael Rohleder, Mario-Leander Reimer Niederlassungen in München, Mainz, Rosenheim, Darmstadt | +49 89 232315-0 | [email protected] Thank you! The next step? Let's talk! Mario-Leander Reimer Managing Director, CTO [email protected] +49 151 61314748

Beyond Microservices: Running VMs, WASM and AI ...

Beyond Microservices: Running VMs, WASM and AI Workloads on Kubernetes

M.-Leander Reimer PRO

More Decks by M.-Leander Reimer

Other Decks in Technology

Featured

Transcript

qaware.de Beyond Microservices: Running VMs, WASM, and AI Workloads on

2 Mario-Leander Reimer Managing Director | CTO @LeanderReimer #cloudnativenerd #qaware

A Quick History Lesson on Kubernetes QAware | 3 namespaces

High Level Overview of Kubernetes Components

Most often used Kubernetes Concepts and Resources ▪ Pods are

lreimer/vms-wasm-ai-on-k8s

AND NOW FOR SOMETHING COMPLETELY DIFFERENT 7 QAware

Hardware Virtualization vs. Operating System Virtualization

KubeVirt in a Nutshell ▪ Uniﬁed Workload Management: KubeVirt enables

KubeVirt Architecture Overview 10 QAware Node Virtual Machine https://portworx.com/knowledge-hub/understanding-kubevirt/

AND NOW FOR SOMETHING COMPLETELY DIFFERENT 11 QAware

WebAssembly (WASM) – In a Nutshell ▪ A binary instruction

Pros and Cons ▪ Language Flexibility: Supports multiple languages, e.g.,

SpinKube

SpinKube in a Nutshell ▪ Kubernetes-Native WebAssembly Platform: Spin and

Supported Kubernetes Distributions 17 QAware https://kwasm.sh/

AND NOW FOR SOMETHING COMPLETELY DIFFERENT 18 QAware

19 AI is everywhere, but … • dependency on LLMs

Ollama in a Nutshell ▪ Ollama is a platform designed

The Kubernetes cluster topology requires precise planning. Otherwise the costs

Compliance Plane Integration & Delivery Plane Service Plane Platform Plane

lreimer/k8s-native-ai-platform lreimer/k3s-ai-platform

QAware GmbH | Aschauer Straße 30 | 81549 München |