Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond Portability: Live Migration for Evolvin...

Beyond Portability: Live Migration for Evolving WebAssembly Workloads

Japan Community Day at KubeCon + CloudNativeCon Japan 2025

WebAssembly (Wasm) is known for its platform-neutral execution, letting the same program run across devices, edge systems, and the cloud without platform-specific changes. Thanks to an active and growing community, we now have a rich set of runtimes optimized for various environments—making Wasm more versatile and efficient than ever. But what if Wasm could go beyond static portability and become truly dynamic? In this talk, we explore live migration between different Wasm runtimes, allowing workloads to move at runtime—for example, following a user across networks or shifting to another platform under high load. This isn't easy. Runtimes differ in how they represent execution state, and optimizations like JIT or AOT make migration tricky. To tackle this, we're experimenting with two research approaches: (1) converting runtime-internal execution states, and (2) adding migration-aware features through a self-hosted Wasm runtime. We'll share what we've learned and the challenges we've faced, as we explore how we can help advance the Wasm ecosystem.

Avatar for Yuki Nakata chikuwait

Yuki Nakata chikuwait

June 15, 2025
Tweet

More Decks by Yuki Nakata chikuwait

Other Decks in Programming

Transcript

  1. Beyond Portability: Live Migration for Evolving WebAssembly Workloads Japan Community

    Day at KubeCon + CloudNativeCon Japan 2025 Yuki Nakata (SAKURA internet Inc. / Future University Hakodate) Daigo Fujii (Future University Hakodate) 1
  2. Self Introduction Yuki Nakata • Researcher at SAKURA internet Inc.

    • Ph.D. student at Future University Hakodate • X: @chiku_wait Daigo Fujii • Master's student at Future University Hakodate • X: @fun_7776 2 2 Our Interests: OS, Virtualization, and Wasm Current Research Topic: Live Migration for Wasm
  3. Cross-platform Portability of Wasm • Write once, run anywhere …

    Compile to Wasm Wasm App Wasm App Wasm App Run on Any Computing Platform 3
  4. Diversity of Wasm Runtimes 4 [1]Yixuan Zhang, Mugeng Liu, Haoyu

    Wang, Yun Ma, Gang Huang, and Xuanzhe Liu. 2025. Research on WebAssembly Runtimes: A Survey. ACM Trans. Softw. Eng. Methodol. Just Accepted (January 2025). https://doi.org/10.1145/3714465 • 100+ runtimes with different characteristics [1] • Wasmtime ◦ High performance with JIT compilation • WasmEdge ◦ Rich extensions for AI/LLM workloads • WAMR ◦ Low memory usage for embedded systems
  5. Wasm Meets Edge Computing • Distributed heterogeneous computing ◦ Different

    CPU architectures, OSs, and platform characteristics • Wasm enables easy deployment of apps to various platforms ◦ Achieve efficient execution apps with the most suitable runtime for each platform Wasmtime Wasm App Wasm App WAMR Wasm App 5
  6. Edge Computing with Live Migration • Move apps between machines

    while maintaining their execution status f(x) Offload Heavy Tasks f(x) 6
  7. Edge Computing with Live Migration • Move apps between machines

    while maintaining their execution status f(x) Offload Heavy Tasks Task Handoff That Follows User Mobility f(x) f(x) f(x) 7
  8. Goal: Live Migration Among Heterogeneous Runtimes • Wasm gain mobility

    as well as portability ◦ Deploy running app/tasks closest to users • Switch runtimes and platforms according to tasks and requirements ◦ Change the platform to suit the app's processing context 8 Wasmtime Wasm App WAMR Wasm App Wasm App
  9. Challenges: Depends on Runtime Implementations Wasm VM of Runtime A

    Frame Stack 32bit Value Stack 0xFFFFFFFF Linear Memory Program Counter Wasm VM of Runtime B Frame Stack 64bit Value Stack Linear Memory Program Counter ≠ 0xFFFFFFFFF FFFFFFF Locals:… module:… Internal:… Differences in Execution State Implementation 9
  10. Challenges: Depends on Runtime Implementations Wasm VM of Runtime A

    Frame Stack 32bit Value Stack 0xFFFFFFFF Linear Memory Program Counter Wasm VM of Runtime B Frame Stack 64bit Value Stack Linear Memory Program Counter ≠ 0xFFFFFFFFF FFFFFFF Locals:… module:… Internal:… Differences in Execution State Implementation Program Counter Wasm VM  OS  Stack  Program   Counter Moving Execution State to Outside VM by JIT/AOT compilation 10
  11. Two Different Approaches 1. Convert execution states between runtimes ◦

    Designed for major interpreter runtime ◦ Between WAMR, WasmEdge and Wasm3 2. Self-hosted runtime for runtime neutral checkpointing/restoring (C/R) ◦ Designed for JIT/AOT compilation enabled runtime 11 11
  12. Overview Convert execution state across different runtimes 13 32bit Value

    Stack 0xFFFFFFFF PC 0xdeadbe 0x11111111 Wasm Bytecode Linear Memory C/R Mechanism WAMR Wasm Bytecode Linear Memory C/R Mechanism 64bit Value Stack PC 0xff1100 0x00000000 FFFFFFFF … 0x11111111 Wasm Bytecode Linear Memory C/R Mechanism 64bit Value Stack PC 0xff1100 0x00000000 FFFFFFFF … 0x11111111 Wasm 3
  13. The Execution State within Wasm VM 14 Memory Instance Global

    Instance $g1 = 100 $g2 = 3.14 … 010101 111101 011011 • Defined by the Wasm spec • Memory Instance, Program Counter, and Value Stack change during execution ◦ Need to checkpoint and restore … Instr … 1001 1002 1003 PC 1002 3. Value Stack • functions’ local values • immediate values 128 100100 3.14 … 2. Program Counter 1. Module Instance
  14. Technical Challenges 1. Program Counter (PC) Counting 2. Memory Layout

    of the Value Stack 3. Optimized Custom Instructions WAMR/ Wasm3 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x6a (i32.add) 0x00 0x21 (local.set) WasmEdge opcode: 0x36 operand: 0xff, 0x10 opcode: 0x41 operand: 0x0a OpCode opcode: 0x36 operand: 0xff, 0x10 opcode: 0x41 operand: 0x14 OpCode opcode: 0x36 operand: 0xff, 0x10 opcode: 0x6a operand: null OpCode 4 1 3 5 6 2 opcode: 0x36 operand: 0xff, 0x10 opcode: 0x21 operand: 0x00 OpCode 1 2 3 4 7 WasmEdge 00000000 12345678 … WAMR/ Wasm3 0010 1234 0101 5678 … 00000000 00000010 00000000 00000101 i32.const 10 i32.const 20 i32.const 30 i32.add i32.add 1 2 3 4 5 $func_add Wasm3 optimization i32.add 20 30 i32.add 10 top 1 2 $func_add 15 • Differences in implementation of execution state between runtimes ❌
  15. 3-A: Optimization Makes PC Different between Runtimes WAMR 0x41 (i32.const)

    0x0a 0x41 (i32.const) 0x14 0x41 (i32.const) 0x6a (i32.add) 0x22 1 2 3 4 5 6 7 8 Wasm3 0x6a (i32.add) 0x22 0x14 0x6a (i32.add) [top] 0x0a 4 1 3 5 2 ? Cannot Restore Omitted Instrs 16 0x6a (i32.add) 6 i32.const 10 i32.const 20 i32.const 30 i32.add i32.add 1 2 3 4 5 $func_add i32.add 20 30 i32.add 10 top 1 2 $func_add Wasm3 optimization Convert Stack Push Instrs to Immediate Value
  16. 3-B: Optimization Makes Stacks Different Among Runtimes 10 20+30 17

    i32.const 10 i32.const 20 i32.const 30 i32.add i32.add 1 2 3 4 5 $func_add i32.add 20 30 i32.add 10 top 1 2 $func_add Wasm3 optimization WAMR/WasmEdge stack Wasm3 stack 20+30 • Optimization reversed the evaluation order of immediates. • As a result, stack contents may differ between normal and Wasm3 at a checkpoint.
  17. Solution 1: Resolving Differences in PC Counting 18 WAMR 0x41

    (i32.const) 0x0a 0x41 (i32.const) 0x14 0x6a (i32.add) 0x00 0x21 (local.set) WasmEdge opcode: 0x36 operand: 0xff, 0x10 opcode: 0x41 operand: 0x0a pc counting: 1 OpCode opcode: 0x36 operand: 0xff, 0x10 opcode: 0x41 operand: 0x14 pc counting: 3 OpCode opcode: 0x36 operand: 0xff, 0x10 opcode: 0x6a operand: null pc counting: 5 OpCode 4 1 3 5 6 2 opcode: 0x36 operand: 0xff, 0x10 opcode: 0x21 operand: 0x00 pc counting: 6 OpCode 1 2 3 4 7 • Link the PC counting based on WAMR to each opcode in WasmEdge. • Calculate it at Wasm code load time, enabling computation without additional runtime cost.
  18. Solution 2: Using a Type Stack to Resolve Value Stack

    Layout Differences WasmEdge 0010 12345678 0101 I32 I64 I32 … I32 I64 I32 … Type Stack conversion WAMR 0010 1234 0101 5678 … I32 I64 I32 To discern boundaries Removal zero padding Introduce a type stack 19
  19. Solution 3-A: Mapping Omitted Instructions to Restore PC Correspondence 20

    • Map omitted instructions to the next instructions that consume their values • Their values are already embedded in those instructions. • This allows semantically correct restoration even from skipped instructions. WAMR 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x41 (i32.const) 0x6a (i32.add) 0x22 1 2 3 4 5 6 7 8 Wasm3 0x6a (i32.add) 0x22 0x14 0x6a (i32.add) [top] 0x0a 4 1 3 5 2 0x6a (i32.add) 6 WAMR 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x41 (i32.const) 0x6a (i32.add) 0x22 1 2 3 4 5 6 7 8 Wasm3 0x6a (i32.add) 0x22 0x14 0x6a (i32.add) [top] 0x0a 4 1 3 5 2 0x6a (i32.add) 6 i32.const 0x14 and i32.const 0x22 are skipped as execution resumes at i32.add,
  20. 21 • Map omitted instructions to the next instructions that

    consume their values • Their values are already embedded in those instructions. • This allows semantically correct restoration even from skipped instructions. WAMR 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x41 (i32.const) 0x6a (i32.add) 0x22 1 2 3 4 5 6 7 8 Wasm3 0x6a (i32.add) 0x22 0x14 0x6a (i32.add) [top] 0x0a 4 1 3 5 2 0x6a (i32.add) 6 but, their values appear as arguments, so there’s no issue. WAMR 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x41 (i32.const) 0x6a (i32.add) 0x22 1 2 3 4 5 6 7 8 Wasm3 0x6a (i32.add) 0x22 0x14 0x6a (i32.add) [top] 0x0a 4 1 3 5 2 0x6a (i32.add) 6 Solution 3-A: Mapping Omitted Instructions to Restore PC Correspondence
  21. Solution 3-B: Using Instruction Stack Mapping to Restore Skipped Instructions

    22 10 20+30 20+30 WAMR/WasmEdge Wasm3 Fill in the value 10 produced by i32.const. i32.add i32.const … Instruction Stack Introduce a instruction stack • Instruction Stack: a stack that tracks which instruction produced each value on the value stack. • Wasm3 uses this to identify which omitted instruction generated each value and reconstructs the stack accordingly.
  22. Overview • Wasm-ized Wasm runtime with C/R mechanism ◦ Execute

    app bytecode via self-hosted runtime ◦ No need to modify host Wasm runtimes • Neutral to host runtime JIT/AOT optimization ◦ Self-hosted runtime manages execution state for C/R 24 Wasm Bytecode Self-hosted Runtime Any Host Runtime Stacks Linear Memory 0x00 0xff Program Counter Stacks Linear Memory 0x00 0xff Program Counter C/R Mechanism Manage execution state
  23. Technical Challenge: Overhead by Self-hosted Runtime[2] 25 Duplicate Sandbox Mechanism

    Explosion in the Number of Instructions Runtimes Total Instructions in Benchmark Wasm3 400,849,582,525 Self-hosted Wasm3 on Wasm3 318,276,978,517,504 794x Self-hosted Rntime App Host Runtime Sandbox Self-hosted Runtime Sandbox Duplicate validation 0x00 0xff 0x00 0xff Duplicate Boundary Check • Protect the execution environment from malicious programs • Known runtime performance overhead • Wasm3: Interpreter-based OSS runtime with self-hosting support [2]Y. Nakata and K. Matsubara, “Poster: Feasibility of Runtime-Neutral Wasm Instrumentation for Edge-Cloud Workload Handover”, pp. 528–530, Dec. 2024, doi: https://doi.org/10.1109/sec62691.2024.00068.
  24. 26 Strategies • Implement a original runtime designed for self-hosting

    • Optimization for a self-hosted runtime 1. Reduce sandbox mechanism 2. Reduce Wasm Instructions 3. Offload Hotspots to Host Runtime
  25. Reduce Sandbox Mechanism • Remove sandboxing in the self-hosted runtime

    • Maintain isolation in the host runtime sandbox ◦ Execute self-hosted runtime instructions within the host sandbox 27 Self-hosted Rntime App Host Runtime Sandbox Self-hosted Runtime Sandbox Use Only Host Sandbox 0x00 0xff 0x00 0xff
  26. Reduce Wasm Instructions • Self-hosted normal interpreter converts a single

    instruction into multiple instructions • Instruction handlers using inline Wasm ◦ Instruction Pass-through to Host Runtime instructions processing 28 fn f64_nearest(…) -> …{ let x = value_stack.pop(); let y = x.fract(); let result = if y == 0.5 { x.floor() } else if y == -0.5 { x.ceil() } else { x.round() }; … fn f64_nearest(…) -> …{ let x = value_stack.pop(); asm!( "local.get {0}", "f64.nearest", "local.set {1}", in(local) x, out(local) result, ); … }
  27. Offload Hotspots to Host Runtime • Detects hotspots using a

    tracing mechanism in the self-hosted runtime • Prohibit C/R while offloading to Host ◦ The execution status of offloaded tasks exists in the host runtime 29 Host Runtime Self-hosted Wasm Runtime Wasm Bytecode f(x) f(x) f(x) Tracing Mechanism Offload to Host Runtime Find Frequently Executed Functions and Blocks
  28. Combine Advantages of Multiple Runtimes to Run an App. •

    Wasm module initialization performance varies depending on runtimes • Switch the runtime used for initialization and execution of instructions ◦ Faster than running apps with a single runtime 34
  29. Wasmtim e Handle Resource Exhaustion with the Runtime Switching 35

    • Switch runtime based on the load of the node hosting Wasm Apps ◦ Low load: High throughput runtime ◦ High load: Low memory consumption runtime Differences in Memory Usage between Runtimes Wasmtim e App App High Memory Load! WAMR App WAMR App
  30. “Hot” Healing and Scaling for Stateful Applications • Restore the

    app from a checkpoint when rebooting/scaling the app ◦ maintain volatile information (e.g., cache and memory states) 36 Execution State Restore App
  31. How much the Migration Improves App Response Perf. 37 Restarte

    d Pain caused by C/R Restore the ideal perf. JUST after the HOT restart
  32. Current Status • Convert Execution States Between Runtimes ◦ ✅

    C/R between WAMR and WasmEdge ◦ 🚧 Wasm3 support • Self-hosted Runtime for Runtime Neutral C/R (PoC: https://github.com/oss-fun/chiwawa) ◦ ✅C/R for Wasm MVP on any runtimes(e.g., WAMR, Wasmtime and WasmEdge) ◦ 🚧WASI preview1 implementation (Only Supported fd_write) ◦ 🚧Offload hotspots • Wanna release our code and contribute for the OSS Wasm community ◦ C/R between same runtimes ◦ Looking for more practical use cases 39 今後もがんばります! 😇