Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

Self-Hosted WebAssembly Runtime for Runtime-Neu...

Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum

the 3rd International Workshop on Middleware for the Computing Continuum (Mid4CC) 2025
https://mid4cc.netsons.org/2025/index.html

Paper: https://dl.acm.org/doi/abs/10.1145/3774898.3778040

Abstract:
Checkpoint/restore (C/R) in the Edge-Cloud Continuum, where edge servers and cloud servers are integrated to combine their respective computational characteristics, enables application live migration for load balancing and service continuity that adapts to user mobility. WebAssembly (Wasm) provides architecture neutrality, allowing bytecode execution across diverse platforms. Furthermore, by leveraging runtime implementations specialized for particular environments and performance optimizations such as Just-In-Time (JIT) and Ahead-Of-Time (AOT) compilation, Wasm bytecode can be executed efficiently by using the runtime best suited to each environment. However, the diversity of runtimes and performance optimizations results in heterogeneous representations of application execution states, making it difficult to apply C/R across multiple runtimes. To address this heterogeneity, we propose a C/R method that employs a self-hosted Wasm runtime compiled into Wasm itself. By checkpointing and restoring application execution states within the self-hosted Wasm runtime, the representation of execution states can be unified. In addition, to mitigate the overhead of duplicated runtime executions, we also investigate optimization techniques for Wasm runtimes designed to be self-hosted. Our evaluation results demonstrate that C/R with a self-hosted Wasm runtime can eliminate runtime differences and enable efficient live migration with a minimal execution state.

Avatar for Yuki Nakata chikuwait

Yuki Nakata chikuwait

December 15, 2025
Tweet

More Decks by Yuki Nakata chikuwait

Other Decks in Research

Transcript

  1. Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum Yuki

    Nakata(SAKURA internet Inc. / Future Unviersity Hakodate) Katsuya Matsubara(Future University Hakodate) December 15, 2025 3rd International Workshop on Middleware for the Computing Continuum (Mid4CC, 26th ACM/IFIP International Middleware Conference Co-located)
  2. Edge–Cloud Continuum: Distributed Platform by Hetero. Servers 2 Task offloading

    and handoff are key factors for the continuum Offloading tasks Edge-to-Edge tasks handoff Low latency High performance Major collaboration patterns
  3. Requirements of Platform for Edge–Cloud Continuum 3 1. CPU architecture

    neutrally • Different CPUs may be used at the edge and cloud 2. Efficient execution • Optimized application execution for different server characteristics • E.g., hardware performance and constraints 3. Portability • Offloading and handoffs require live migration • Transfers an application and a snapshot by checkpoint/restore (C/R) Snapshot
  4. WebAssembly(Wasm): Best Fit Edge–Cloud Continuum Platform 4 Virtual Instruction Set

    Architecture Run apps efficiently and anywhere using suitable runtime for each platform Compile to Wasm bytecode
  5. WebAssembly(Wasm): Best Fit Edge–Cloud Continuum Platform 5 Virtual Instruction Set

    Architecture Low memory footprint Run apps efficiently and anywhere using suitable runtime for each platform 100+ Runtimes Tailored for Edge/Cloud* WAMR (WebAssembly Micro Runtime) Bytecode Wasmtime Bytecode *Yixuan Zhang, Mugeng Liu, Haoyu Wang, Yun Ma, Gang Huang, and Xuanzhe Liu. 2025. Research on WebAssembly Runtimes: A Survey. ACM Trans. Softw. Eng. Methodol. Just Accepted (January 2025). https://doi.org/10.1145/3714465
  6. 6 Virtual Instruction Set Architecture Run apps efficiently and anywhere

    using suitable runtime for each platform 100+ Runtimes Tailored for Edge/Cloud* WAMR (WebAssembly Micro Runtime) Bytecode Wasmtime Bytecode High performance with JIT compilation WebAssembly(Wasm): Best Fit Edge–Cloud Continuum Platform *Yixuan Zhang, Mugeng Liu, Haoyu Wang, Yun Ma, Gang Huang, and Xuanzhe Liu. 2025. Research on WebAssembly Runtimes: A Survey. ACM Trans. Softw. Eng. Methodol. Just Accepted (January 2025). https://doi.org/10.1145/3714465
  7. How to Implement C/R for Wasm Stacks 2 1 …

    Value Stack int a = 1 + 2; … Frame Stack Args: … RetAddr: … Locals:… 7 Snapshot need VM states created by runtimes
  8. How to Implement C/R for Wasm Stacks 2 1 …

    Value Stack int a = 1 + 2; … Frame Stack Args: … RetAddr: … Locals:… Store non-num values (e.g., array and string) H 0x00 E L L 0x05 \n O Linear Memory 8 Snapshot need VM states created by runtimes
  9. How to Implement C/R for Wasm Stacks 2 1 …

    Value Stack int a = 1 + 2; … Frame Stack Args: … RetAddr: … Locals:… Store non-num values (e.g., array and string) H 0x00 E L L 0x05 \n O Linear Memory … Local.get Local.get I32.add … 0xaaa Next instruction address 0xabb 0xbbb Program Counter 9 Snapshot need VM states created by runtimes
  10. Wasm C/R Among Heterogeneous Platforms (1/2) 10 Challenge: depends on

    runtime and optimization ≠ Frame Stack 0xFFFFFF FFFFFFFFFF Program Counter locals:… module:… Internal:… Program Counter Frame Stack 32bit Value Stack 0xFFFFFFFF locals:… module:… Linear Memory VM of runtime A 64bit Value Stack Linear Memory VM of runtime B Diverse spec implementations between runtimes
  11. Wasm C/R Among Heterogeneous Platforms (2/2) 11 Challenge: depends on

    runtime and optimization OS Stack Program Counter 0xFFFFFFFF locals:… module:… Program Counter Execution states in snapshot existing outside the VM itself Ahead-Of-Time (AOT) or Just-In-Time (JIT) compilation
  12. Our Approach: Self-Hosted Wasm Runtime-based C/R 12 Chiwawa Stacks Program

    counter C/R mechanism Linear Memory Any Host Runtimes Application Stacks Program counter Linear Memory • Chiwawa*: minimal Wasm-ised runtime with C/R • Execute app via self-hosted runtime • No need to modify host runtimes • Eliminates C/R mechanisms for each optimization strategy • The snapshot of an application uses only Chiwawa’s execution state Dumped Snapshot *chiwawa(CHeckpoint/restore and Instrumentation-specific WAsm runtime on WAsm runtime), https://github.com/oss-fun/chiwawa
  13. Technical Issues for Self-Hosted Runtime Overhead (1/3) 1. Increase bytecode

    instructions executed in host runtime local.get … fn handle_local_get(ctx: Context){ … let val = ctx.frame.locals[i].clone(); ctx.value_stack.push(val); … } • An instruction in an application is transformed into multiple instructions • The instruction handler in a self-hosted runtime compiles into multiple bytecode instructions 13
  14. Technical Issues for Self-Hosted Runtime Overhead (2/3) 14 2-A. Sandbox

    mechanism • Protect platform from bugs or malicious programs • Major reason of performance degradation in Wasm* 2. Duplicated computationally expensive processing 0x00 0xff App Sandbox by self-hosted runtime Self-hosted runtime Sandbox by host runtime Duplicated validation 0x00 0xff Duplicated boundary check *Matthew Kolosick, Shravan Narayan, Evan Johnson, Conrad Watt, Michael LeMay, Deepak Garg, Ranjit Jhala, and Deian Stefan. 2022. Isolation without taxation: near-zero-cost transitions for WebAssembly and SFI. Proc. ACM Program. Lang. 6, POPL, Article 27 (January 2022), 30 pages. https://doi.org/10.1145/3498688
  15. Technical Issues for Self-Hosted Runtime Overhead (3/3) 15 2-A. Sandbox

    mechanism • Protect platform from bugs or malicious programs • Major reason of performance degradation in Wasm 2-B. WebAssembly System Interface (WASI) • To access host resources • (e.g., files and network sockets) • Many stack and memory operations by a single WASI function call VM by self-hosted runtime Self-hosted runtime VM by host runtime WASI API WASI API 2. Duplicated computationally expensive processing
  16. Approach for Technical Issue 1: Merging Instructions 16 … i32.const

    10 local.get 0 I32.add local.set 1 … … i32.add{10, local@0} => local@1 … • Almost all instructions use the value stack • Push/Pop operands (e.g., variables) and instruction results • Parser identifies dependencies between instructions • Convert stack instructions to Immediate Value Reduce stack operations executed frequently
  17. Approach for Technical Issue 2-A : Reducing Sandbox 17 •

    Chiwawa does not create a sandbox for each VM • Maintain isolation by host runtime sandboxing • Instruction handlers of Chiwawa runs within a host sandbox Use only the sandbox created by a host runtime App Sandbox by self-hosted runtime Self-hosted runtime Sandbox by host runtime Use only hosts’s Sandbox
  18. Approach for Technical Issue 2-B: Pass-through WASI 18 • Chiwawa

    can invoke host’s WASI functions • Compiled into Wasm itself • Chiwawa’s WASI function only invokes the host's WASI function • Use arguments in the value stack or linear memory directly • Reduce data copy Minimal WASI implementation designed to be Wasm-ized VM by self-hosted runtime Self-hosted runtime VM by host runtime WASI API Thin-WASI API
  19. Evaluation from Three Aspects 19 1. Performance benefits of the

    optimized self-hosted runtime 2. Performance overhead introduced by Chiwawa 3. Benefits of self-hosted runtime-based C/R OS Ubuntu 22.04.4 LTS(Linux 5.15.0) CPU Intel Xeon Silver 4208 8Core 2.10GHz Memory 32 GB Storage SSD 480GBx2 (RAID1) Benchmark environment
  20. 1. Performance Benefits of the Optimized Self-hosted Runtime (1/2) 20

    • Objective: Can Chiwawa outperform self-hostable runtimes? • Compared with Wizard* • Fast interpreter and self-hostable Wasm runtime • Also used as a host runtime • Used computationally intensive benchmarks • Leibniz formula for PI( n =1,000,000) • N-body (particle = 10,000) *Ben L. Titzer. 2022. A fast in-place interpreter for WebAssembly. Proc. ACM Program. Lang. 6, OOPSLA2, Article 148 (October 2022), 27 pages. https://doi.org/10.1145/3563311
  21. 21 Benchmarks Wizard on Wizard Chiwawa on Wizard pi-Leibniz 26.791

    19.620 N-body 14.554 9.932 Execution time (Sec.) Benchmarks Wizard on Wizard Chiwawa on Wizard pi-Leibniz 106.98 15.03 N-body 53.46 15.15 Memory usage – Peak RSS(Resident Set Size) (MB) Reduce 71 - 85% by minimal runtime implementation 1.3 – 1.4x faster by optimization techniques 1. Performance Benefits of the Optimized Self-hosted Runtime (2/2)
  22. 2. Performance Overhead Introduced by Chiwawa (1/2) 22 • Objective:

    How much performance overhead is introduced by duplicated runtime execution? • Compared chiwawa on wastime with wasmtime directly • Used SQLite benchmark* • SQLite is a lightweight DB widely used in edge • Invokes many WASI functions *SQLite3 Benchmark, https://github.com/ukontainer/sqlite-bench
  23. 23 • Execution time increased in all benchmarks • 13.64x

    〜426.95x • Runtime duplication still exist in complex processing 2. Performance Overhead Introduced by Chiwawa (2/2) Overhead (normalized to wasmtime)
  24. 24 • Execution time increased in all benchmarks • 13.64x

    〜426.95x • Runtime duplication still exist in complex processing 2. Performance Overhead Introduced by Chiwawa (2/2) Complex benchmarks cause worse overhead Overhead (normalized to wasmtime)
  25. 25 • Objective: Can Chiwawa perform C/R more efficiently than

    existing methods? • Snapshot size with Checkpoint/Restore In Userspace (CRIU) • A C/R mechanism for Linux processes • Achieves runtime neutrality by C/R on a Wasm runtime process • Host runtime: Wasmtime, interpreter/AOT WAMR, and interpreter WasmEdge 3. Benefits of Self-Hosted Runtime-Based C/R (1/4)
  26. • Objective: Can Chiwawa perform C/R more efficiently than existing

    methods? • Snapshot size with Checkpoint/Restore In Userspace (CRIU) • A C/R mechanism for Linux processes • Achieves runtime neutrality by C/R on a Wasm runtime process • Host runtime: Wasmtime, interpreter/AOT WAMR, and interpreter WasmEdge 26 3. Benefits of Self-Hosted Runtime-Based C/R (2/4) Hosts Chiwawa CRIU Wasmtime 1076 9232 WAMR(Interp.) 1076 654484 WAMR(AOT) 1076 14360 WasmEdge 1076 89784 Checkpointed Wasm-bytecode state size (KB) Significantly reduce the state size Keep the size across different runtimes
  27. 27 • Objective: How do C/R mechanisms impose app performance

    overhead? • Compared application performance differences with and without the C/R mechanism in Chiwawa and Runtime-based C/R • Comparison target: Wizard • Can monitor execution states required for C/R* • Emulated runtime-based C/R by tracing these while running the application • Used SQLite benchmark • Host runtime: Wasmtime 3. Benefits of Self-Hosted Runtime-Based C/R (3/4) *Ben L. Titzer, Elizabeth Gilbert, Bradley Wei Jie Teo, Yash Anand, Kazuyuki Takayama, and Heather Miller. 2024. Flexible Non-intrusive Dynamic Instrumentation for WebAssembly. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS '24), Vol. 3. Association for Computing Machinery, New York, NY, USA, 398–415. https://doi.org/10.1145/3620666.3651338
  28. 28 3. Benefits of Self-Hosted Runtime-Based C/R (4/4) Overhead of

    emulated runtime-based C/R (normalized to wizard without C/R) Overhead of Chiwawa’s C/R (normalized to chiwawa without C/R)
  29. 29 3. Benefits of Self-Hosted Runtime-Based C/R (4/4) Overhead of

    emulated runtime-based C/R (normalized to wizard without C/R) Overhead of Chiwawa’s C/R (normalized to chiwawa without C/R) Runtime-based C/R may introduce significant overhead Monitoring execution state is a costly operation
  30. 30 3. Benefits of Self-Hosted Runtime-Based C/R (4/4) Overhead of

    emulated runtime-based C/R (normalized to wizard without C/R) Overhead of Chiwawa’s C/R (normalized to chiwawa without C/R) Chiwawa’s C/R does not significantly degrade performance Self-hosted runtime-based C/R is efficient
  31. Conclusion 31 • Live migration by C/R between edge and

    cloud is a key factor for application offloading and handoff • Chiwawa: a self-hosted runtime-based C/R with optimization tailored for self-hosted • Eliminate runtime and optimization differences in C/R • Enable efficient C/R with a minimal snapshot and overhead • Current performance optimization is insufficient and must be improved • Future work: • Further performance optimization for the Self-hosted runtime • C/R containing WASI execution state existing outside the Wasm runtime