Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Wazero vs Chicory: An In-Depth Comparison Betwe...

Wazero vs Chicory: An In-Depth Comparison Between Two Language-Native Wasm Runtimes

FOSDEM 2025

Avatar for Edoardo Vacchi

Edoardo Vacchi

August 13, 2025
Tweet

More Decks by Edoardo Vacchi

Other Decks in Programming

Transcript

  1. who the hell are you anyway Edoardo Vacchi @evacchi •

    DSL/PL Research @ UniMi • R&D @ UniCredit • Drools, Kogito codegen @ Red Hat • wazero WebAssembly Run-Time @ Tetrate • chicory, wazero, and more @ 2
  2. who the hell are you anyway Edoardo Vacchi @evacchi •

    DSL/PL Research @ UniMi • R&D @ UniCredit • Drools, Kogito codegen @ Red Hat • wazero WebAssembly Run-Time @ Tetrate • chicory, wazero, and more @ 3
  3. Extism: brief introduction
 1. easiest way to embed wasm into

    your app in 17 different languages:
 
 
 
 2. designed as the first off-the-shelf plugin system
 
 3. universal ABI to do guest/host interaction from 9 wasm-targeting languages:
 
 
 Implemented identically over many popular runtimes:
 Lots of developer love and contribution on GitHub, ~4,700 stars, 30+ contribs:
 @evacchi.dev
  4. Extism is a universal Wasm abstraction!
 output = plugin.call(“func”, ...)

    alloc / dealloc _start HTTP requests host imports @evacchi.dev
  5. wasm lets your users write software extensions using their favorite

    language and run it in a sandboxed environment
  6. “language-native runtimes” (Wasm I/O 2024) a language-native Wasm runtime is

    a runtime that is written in the host language (well, isn’t that true for all run-times?) managed languages - a Wasm runtime for the Go runtime, written in go - a Wasm runtime for the JVM, written in Java chicory @evacchi.dev
  7. what is wazero wazero is a Go runtime for WebAssembly

    - wazero implements an interpreter for WebAssembly, supporting all* the architectures and operating systems where Go is supported - wazero implements an ahead-of-time, load-time, multi-pass, optimizing compiler for WebAssembly, supporting amd64, arm64 and all* major operating systems (*) OSs: macOS/Linux/Windows and in some cases FreeBSD archs: arm64, amd64; RISC-V for some workloads
  8. chicory Chicory is a Java runtime for WebAssembly - Chicory

    implements an interpreter for WebAssembly, supporting all the architectures and operating systems where Java is supported - Chicory implements an ahead-of-time, load-time, Java bytecode translator for WebAssembly 11 @evacchi.dev
  9. summary - go vs java - language-native runtimes - function

    compilation and evaluation in wazero - function compilation and evaluation in chicory
  10. go vs java (code style) - different form of inheritance

    - shallow type hierarchies - libraries - OOP - deep type hierarchies - “opinionated” frameworks
  11. go vs java (runtime) - native executables - cross-compilation -

    static linking - libraries shared as source aggressive compile-time caching - bytecode target - multi-platform runtime - dynamic linking - libraries shared as binary artifacts
  12. go vs java (runtime) - native executables - cross-compilation -

    static linking - libraries shared as source aggressive compile-time caching - bytecode target - multi-platform runtime - dynamic linking - libraries shared as binary artifacts
  13. java: dynamic loading and linking are essential Build Time Run

    Time 3 Classloaders ~500 Classes ~160 Static Init 100+ Classloaders 1000+ Classes 1000+ Static Init 100++ Classloaders 1000++ Classes 1000++ Static Init static void Main Framework Initialization Application Initialization Source: Dan Heidinga - “Starting Fast” (QCon Plus 2021) @evacchi.dev
  14. go vs java: similarities - “large” runtime (cross-compilation) - garbage

    collection - goroutines - “large” multi-platform runtime - garbage collection - virtual threads (before: tasks+thread pools) …native executables with native-image
  15. the problem with ffi • most language runtimes have limitations

    when they need to interoperate with native code • “foreign function interface” • e.g. Python, Java with JNI, libFFI, or Go with cgo • if you interoperate with C/native, you have to follow its rules 19 @evacchi.dev
  16. the problem with cgo and jni • portability issues ◦

    you can no longer compile and run everywhere ◦ more difficult to cross-compile, a C compiler must be installed • tooling issues ◦ native code is opaque to language-specific tooling (debuggers, profilers, coverage, fuzzing etc.) • runtime issues ◦ C code takes over native threads, goroutines/virtual threads can no longer cooperate with them, garbage collection issues 21 @evacchi.dev
  17. the problem with cgo and jni • performance issues ◦

    crossing the FFI boundary has a cost • safety issues ◦ memory space shared with host code ◦ no sandboxing 22 @evacchi.dev
  18. the go environment however: - static linking prevents run-time extensibility

    (dynamic code loading) - cross-compilation depends on the go toolchain - goroutines are an abstraction over OS threads
  19. Software Extensions in Go - Go plugin (limited) - A

    scripting language implemented in Go - Native extensions
  20. func main() { a := rustFn(x) b := zigFn(y) }

    pub extern "C" fn rustFn(v: i32) -> bool export fn zigFn(v: i32) bool { ... } 27 @evacchi.dev
  21. software extensions in java - Class loading - A scripting

    language implemented in Java - Native extensions @evacchi.dev
  22. java: dynamic loading is essential Build Time Run Time 3

    Classloaders ~500 Classes ~160 Static Init 100+ Classloaders 1000+ Classes 1000+ Static Init 100++ Classloaders 1000++ Classes 1000++ Static Init static void Main Framework Initialization Application Initialization Source: Dan Heidinga - “Starting Fast” (QCon Plus 2021) @evacchi.dev
  23. Scripting Languages (JSR 223) import javax.script.*; public class InvokeScriptFunction {

    public static void main(String[] args) throws Exception { ScriptEngineManager manager = new ScriptEngineManager(); ScriptEngine engine = manager.getEngineByName("nashorn"); // evaluate JavaScript code that defines a function with one parameter engine.eval("function hello(name) { print('Hello, ' + name) }"); // create an Invocable object by casting the script engine object Invocable inv = (Invocable) engine; // invoke the function named "hello" with "Scripting!" as the argument inv.invokeFunction("hello", "Scripting!"); } }
  24. Scripting Languages (GraalVM Polyglot) import org.graalvm.polyglot.*; import org.graalvm.polyglot.proxy.*; public class

    HelloPolyglot { static String JS_CODE = "(function myFun(param){console.log('hello '+param);})"; public static void main(String[] args) { System.out.println("Hello Java!"); try (Context context = Context.create()) { Value value = context.eval("js", JS_CODE); value.execute(args[0]); } } }
  25. public static void main(String[] args) { var a = rustFn(x)

    var b = zigFn(y) } pub extern "C" fn rustFn(v: i32) -> bool export fn zigFn(v: i32) bool { ... } 35 @evacchi.dev
  26. language-native runtime a language-native Wasm runtime is a runtime that

    is written in the host language (well, isn’t that true for all run-times?) managed languages - a Wasm runtime for the Go runtime, written in go - a Wasm runtime for the JVM, written in Java chicory @evacchi.dev
  27. why using a language-native runtime - no FFI - maximum

    portability - safe interaction with platform - perf might not be state-of-the art - however: depending on the workload FFI cost might be higher! chicory @evacchi.dev
  28. runtime.CompileModule() binaryformat.DecodeModule( binary, runtime.enabledFeatures, …) decodes the binary wasm module

    and validates the contents runtime.store.Engine. CompileModule(...) interpreter/engine compiler/engine CompiledModule 43 @evacchi.dev
  29. wazeroir.Compiler • Common to interpreter and compiler • Translates the

    Wasm bytecode to an Intermediate Representation • A lowered representation of Wasm that is easier to translate ◦ to native code in compiler mode ◦ It is also interpreted straight-away in interpreter mode ◦ fewer opcodes ◦ unstructured control flow // Translate the current Wasm instruction to wazeroir's operations, // and emit the results into c.results. func (c *Compiler) handleInstruction() error { op := c.body[c.pc] ... switch op { ... case wasm.OpcodeI32Sub: c.emit(NewOperationSub(UnsignedTypeI32)) ... case wasm.OpcodeI64Sub: c.emit(NewOperationSub(UnsignedTypeI64)) ... case wasm.OpcodeF32Sub: c.emit(NewOperationSub(UnsignedTypeF32)) ... case wasm.OpcodeF64Sub: c.emit(NewOperationSub(UnsignedTypeF64)) ... } default: 44 @evacchi.dev
  30. new compiler architecture @evacchi.dev 46 Multi-stage - the new compiler

    draws from several compiler architectures - e.g. V8, LLVM and Go’s own compiler DecodeModule CompileModule Front-End Back-End ssa opt instruction selection regalloc encoding
  31. front-end: from wasm to internal representation (again) (func (param i32

    i32) local.get 0 local.get 1 i32.add local.get 0 i32.sub ) 47 @evacchi.dev blk0: (exec_ctx:i64, module_ctx:i64, v2:i32, v3:i32) v4:i32 = Iadd v2, v3 v5:i32 = Isub v4, v2 Jump blk_ret, v5
  32. what’s an SSA ? - “basic blocks” are sequences of

    instructions - control flow as edges between blocks @evacchi.dev 48
  33. what’s an SSA ? - “basic blocks” are sequences of

    instructions - instructions as single-static assignment - control flow as edges between blocks @evacchi.dev 49
  34. front-end (2): optimization passes int main(void) { int a =

    5; int b = 6; int c; c = a * (b / 2); if (0) { /* DEBUG */ printf("%d\n", c); } return c; } 50 @evacchi.dev
  35. front-end (2): optimization passes int main(void) { int a =

    5; int b = 6; int c; c = a * (b / 2); if (0) { /* DEBUG */ printf("%d\n", c); } return c; } 51 @evacchi.dev int main(void) { int a = 5; int b = 6; int c; c = a * (b / 2); if (0) { /* DEBUG */ printf("%d\n", c); } return c; } dead code elimination
  36. front-end (2): optimization passes int main(void) { int a =

    5; int b = 6; int c; c = a * (b / 2); if (0) { /* DEBUG */ printf("%d\n", c); } return c; } 52 @evacchi.dev int main(void) { int a = 5; int b = 6; int c; c = 5 * (6 / 2); if (0) { /* DEBUG */ printf("%d\n", c); } return c; } constant propagation / constant folding
  37. front-end (2): optimization passes int main(void) { int a =

    5; int b = 6; int c; c = 5 * (6 / 2); if (0) { /* DEBUG */ printf("%d\n", c); } return c; } 53 @evacchi.dev int main(void) { int a = 5; int b = 6; int c; c = 5 * (6 / 2); if (0) { /* DEBUG */ printf("%d\n", c); } return 15; }
  38. back-end: from ssa to asm blk0: (exec_ctx:i64, module_ctx:i64, v2:i32, v3:i32)

    v4:i32 = Iadd v2, v3 v5:i32 = Isub v4, v2 Jump blk_ret, v5 54 @evacchi.dev L1 (SSA Block: blk0): mov x130?, x2 mov x131?, x3 add w132?, w130?, w131? sub w133?, w132?, w130? mov x0, x133? ret
  39. register allocation @evacchi.dev 55 plus, zero, stack pointer, program counter,

    pstate 32 floating-point/SIMD registers (128 bits), numbered from 0 to 31
  40. back-end (2): register allocation L1 (SSA Block: blk0): mov x130?,

    x2 mov x131?, x3 add w132?, w130?, w131? sub w133?, w132?, w130? mov x0, x133? ret 56 @evacchi.dev (After regalloc) L1 (SSA Block: blk0): mov x2, x2 mov x3, x3 add w8, w2, w3 sub w8, w8, w2 mov x0, x8 ret
  41. back-end (3) finalization and encoding [[[after finalize for [0/0]f]]] L1

    (SSA Block: blk0): stp x30, xzr, [sp, #-0x10]! sub x27, sp, #0x20 ldr x11, [x0, #0x28] subs xzr, x27, x11 b.ge #0x14 orr x27, xzr, #0x20 str x27, [x0, #0x40] ldr x27, [x0, #0x50] bl x27 str xzr, [sp, #-0x10]! add w8, w2, w3 sub w8, w8, w2 mov x0, x8 add sp, sp, #0x10 ldr x30, [sp], #0x10 ret 57 @evacchi.dev fe7fbfa9fb8300d10b1440f97f030bebaa000054fb 037bb21b2000f91b2840f960033fd6ff0f1ff84800 030b0801024be00308aaff430091fe0741f8c0035 fd6
  42. GO CODE EXECUTABLE CODE WASM FN() Return Code (Host Fn

    ID#N) Resume WASM FN() HOST FN() @evacchi.dev
  43. JVM Bytecode 0: iload_1 1: iconst_2 2: iadd 3: iconst_3

    4: imul ( x + 2 ) * 3 (local.get $x) (i32.const 2) i32.add (i32.const 3) i32.mul 68 @evacchi.dev
  44. JVM Bytecode 0: iload_1 1: iconst_2 2: iadd 3: iconst_3

    4: imul ( x + 2 ) * 3 (local.get $x) (i32.const 2) i32.add (i32.const 3) i32.mul 69 @evacchi.dev
  45. JVM Bytecode 0: iload_1 1: iconst_2 2: iadd 3: iconst_3

    4: imul ( x + 2 ) * 3 (local.get $x) (i32.const 2) i32.add (i32.const 3) i32.mul 70 @evacchi.dev
  46. JVM Bytecode 0: iload_1 1: iconst_2 2: iadd 3: iconst_3

    4: imul ( x + 2 ) * 3 (local.get $x) (i32.const 2) i32.add (i32.const 3) i32.mul 71 @evacchi.dev
  47. JVM Bytecode 0: iload_1 1: iconst_2 2: iadd 3: iconst_3

    4: imul ( x + 2 ) * 3 (local.get $x) (i32.const 2) i32.add (i32.const 3) i32.mul 72 @evacchi.dev
  48. JVM Bytecode 0: iload_1 1: iconst_2 2: iadd 3: iconst_3

    4: imul ( x + 2 ) * 3 (local.get $x) (i32.const 2) i32.add (i32.const 3) i32.mul 73 @evacchi.dev
  49. Unstructured Control Flow void print(boolean x) { if (x) {

    System.out.println(1); } else { System.out.println(0); } } void print(boolean); Code: 0: iload_1 1: ifeq 14 4: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream; 7: iconst_1 8: invokevirtual #13 // Method java/io/PrintStream.println:(I)V 11: goto 21 14: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream; 17: iconst_0 18: invokevirtual #13 // Method java/io/PrintStream.println:(I)V 21: return 74 @evacchi.dev
  50. Unstructured Control Flow void print(boolean x) { if (x) {

    System.out.println(1); } else { System.out.println(0); } } void print(boolean); Code: 0: iload_1 1: ifeq 14 4: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream; 7: iconst_1 8: invokevirtual #13 // Method java/io/PrintStream.println:(I)V 11: goto 21 14: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream; 17: iconst_0 18: invokevirtual #13 // Method java/io/PrintStream.println:(I)V 21: return 75 @evacchi.dev
  51. Unstructured Control Flow void print(boolean x) { if (x) {

    System.out.println(1); } else { System.out.println(0); } } void print(boolean); Code: 0: iload_1 1: ifeq 14 4: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream; 7: iconst_1 8: invokevirtual #13 // Method java/io/PrintStream.println:(I)V 11: goto 21 14: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream; 17: iconst_0 18: invokevirtual #13 // Method java/io/PrintStream.println:(I)V 21: return 76 @evacchi.dev
  52. Structured Control Flow (func (param i32) local.get 0 (if (then

    i32.const 1 call $log ;; should log '1' ) (else i32.const 0 call $log ;; should log '0' ))) 77 @evacchi.dev
  53. interpreter - relatively simple - fairly 1:1 representation with Wasm

    bytecode - very easy to follow and hack - portable to any JVM (well, duh) @evacchi.dev
  54. aot compiler - translation Wasm bytecode to Java bytecode -

    “on-line”: run-time bytecode generation + class loading - “off-line”: same backend, but generates “plain” class files on disk at build-time @evacchi.dev
  55. Machines var s = new Store(); // register other modules

    ... var importValues = s.toImportValues(); var inst = Instance.builder(module) .withImportValues(importValues) .withMachineFactory(InterpreterMachineFactory::create) .build(); MACHINES! @evacchi.dev
  56. Interpreter var wasmModule = Parser.parse(wasmBytes) decodes the binary wasm module

    InterpreterMachineFactory: AotCompiler#compileModule( WasmModule module, String className) 81 @evacchi.dev Instance.builder(wasmModule) .withMachineFactory(machineFactory) .build() Instance
  57. Machines (AoT) var s = new Store(); // register other

    modules ... var importValues = s.toImportValues(); var inst = Instance.builder(module) .withImportValues(importValues) .withMachineFactory(AotMachineFactory::create) .build(); MACHINES! @evacchi.dev
  58. Bytecode Translator var wasmModule = Parser.parse(wasmBytes) decodes the binary wasm

    module AotMachineFactory: AotCompiler#compileModule( WasmModule module, String className) 83 @evacchi.dev Instance.builder(wasmModule) .withMachineFactory(machineFactory) .build() Instance
  59. generated classes public final class com/dylibso/chicory/$gen/CompiledMachine implements com/dylibso/chicory/runtime/Machine { NESTMEMBER

    com/dylibso/chicory/$gen/CompiledMachine$AotMethods NESTMEMBER com/dylibso/chicory/$gen/CompiledMachine$MachineCall private final Lcom/dylibso/chicory/runtime/Instance; instance public call(I[J)[J // implements the interface // exported function public static func_0( ILcom/dylibso/chicory/runtime/Memory;Lcom/dylibso/chicory/runtime/Instance;)I @evacchi.dev
  60. generated classes public final class com/dylibso/chicory/$gen/CompiledMachine implements com/dylibso/chicory/runtime/Machine { NESTMEMBER

    com/dylibso/chicory/$gen/CompiledMachine$AotMethods NESTMEMBER com/dylibso/chicory/$gen/CompiledMachine$MachineCall private final Lcom/dylibso/chicory/runtime/Instance; instance public call(I[J)[J // implements the interface // exported function public static func_0( ILcom/dylibso/chicory/runtime/Memory;Lcom/dylibso/chicory/runtime/Instance;)I @evacchi.dev
  61. control flow analysis switch (ins.opcode()) { ... case IF: stack.pop(ValueType.I32);

    stack.enterScope(ins.scope(), blockType(ins)); // use the same starting stack sizes for both sides of the branch if (body.instructions().get(ins.labelFalse() - 1).opcode() == OpCode.ELSE) { stack.pushTypes(); } result.add(new AotInstruction(AotOpCode.IFEQ, ins.labelFalse())); break; case ELSE: stack.popTypes(); result.add(new AotInstruction(AotOpCode.GOTO, ins.labelTrue()));
  62. opcode implementations // ====== I32 ====== .intrinsic(AotOpCode.I32_ADD, AotEmitters::I32_ADD) .intrinsic(AotOpCode.I32_AND, AotEmitters::I32_AND)

    .shared(AotOpCode.I32_CLZ, OpcodeImpl.class) .intrinsic(AotOpCode.I32_CONST, AotEmitters::I32_CONST) .shared(AotOpCode.I32_CTZ, OpcodeImpl.class) .shared(AotOpCode.I32_DIV_S, OpcodeImpl.class) .shared(AotOpCode.I32_DIV_U, OpcodeImpl.class) .shared(AotOpCode.I32_EQ, OpcodeImpl.class) .shared(AotOpCode.I32_EQZ, OpcodeImpl.class) .shared(AotOpCode.I32_EXTEND_8_S, OpcodeImpl.class) .shared(AotOpCode.I32_EXTEND_16_S, OpcodeImpl.class) .shared(AotOpCode.I32_GE_S, OpcodeImpl.class) ...
  63. “off-line” (build-time) aot …it trivially works(*) ! - The Android

    toolchain translates JVM bytecode to Dalvim/ART bytecode at build-time - great for libraries and use cases where you do not need to load NEW modules at run-time (*) in our early experiments
  64. “on-line” run-time aot - Dalvik VM / ART - register

    machine - run-time class-loading kind of an “advanced” use case - very recent API level MACHINES!
  65. opcode implementations // ====== I32 ====== .intrinsic(AotOpCode.I32_ADD, AotEmitters::I32_ADD) .intrinsic(AotOpCode.I32_AND, AotEmitters::I32_AND)

    .shared(AotOpCode.I32_CLZ, OpcodeImpl.class) .intrinsic(AotOpCode.I32_CONST, AotEmitters::I32_CONST) .shared(AotOpCode.I32_CTZ, OpcodeImpl.class) .shared(AotOpCode.I32_DIV_S, OpcodeImpl.class) .shared(AotOpCode.I32_DIV_U, OpcodeImpl.class) .shared(AotOpCode.I32_EQ, OpcodeImpl.class) .shared(AotOpCode.I32_EQZ, OpcodeImpl.class) .shared(AotOpCode.I32_EXTEND_8_S, OpcodeImpl.class) .shared(AotOpCode.I32_EXTEND_16_S, OpcodeImpl.class) .shared(AotOpCode.I32_GE_S, OpcodeImpl.class) ...
  66. Machines (Android AoT) var s = new Store(); // register

    other modules ... var importValues = s.toImportValues(); var inst = Instance.builder(module) .withImportValues(importValues) .withMachineFactory(AotAndroidMachineFactory::create) .build(); MACHINES!
  67. resources @evacchi WebAssembly for the Java Geek (Java Advent 2022)

    A Return to WebAssembly for the Java Geek (Java Advent 2023) Wasm 4 the Java Geek 3: Electric Boogaloo (Java Advent 2024) mcp.run extism.org wazero.io chicory.dev .dev @mastodon.social