Upgrade to Pro — share decks privately, control downloads, hide ads and more …

quasigo

 quasigo

Iskander (Alex) Sharipov

December 06, 2024
Tweet

More Decks by Iskander (Alex) Sharipov

Other Decks in Programming

Transcript

  1. •Go compiler (Intel, Huawei) •KPHP compiler (VK) •Static analyzers (open

    source) •Developer tools like phpgrep About me 2
  2. 3 Why does this talk exist? I’ll prove that it’s

    possible to create efficient interpreter in Go that can compete with interpreters written in C.
  3. An engine to execute dynamic rules for Go. It loads

    the rules written in Go DSL and then executes them over a given project. What is ruleguard? 5
  4. ruleguard overview 6 style.go ruleguard src/ file1.go file1.go file1.go file1.go

    file1.go src/file.go -rules arg analysis target $ ruleguard -rules style.go src/...
  5. Matcher pipeline 11 m.Match(`string($x)`). Where( m["x"].Filter(f), ). Report("message") 1. Find

    an AST (syntax) match (if AST matched) 2. Apply filters to the match
  6. Matcher pipeline 12 m.Match(`string($x)`). Where( m["x"].Filter(f), ). Report("message") 1. Find

    an AST (syntax) match (if AST matched) 2. Apply filters to the match (if filters accepted the match) 3. Perform an action
  7. f could be a custom function Matcher pipeline 13 m.Match(`string($x)`).

    Where( m["x"].Filter(f), ). Report("message")
  8. A custom filter example 14 func implementsStringer(ctx *dsl.VarFilterContext) bool {

    stringer := ctx.GetInterface(`fmt.Stringer`) // pointer to the captured, type, T -> *T ptr := types.NewPointer(ctx.Type) return types.Implements(ctx.Type, stringer) || types.Implements(ptr, stringer) }
  9. A custom filter example 15 func implementsStringer(ctx *dsl.VarFilterContext) bool {

    stringer := ctx.GetInterface(`fmt.Stringer`) // pointer to the captured, type, T -> *T ptr := types.NewPointer(ctx.Type) return types.Implements(ctx.Type, stringer) || types.Implements(ptr, stringer) } How to execute?
  10. Ruleguard resources Ruleguard by example Introduction article (RU, EN) Ruleguard

    workshop videos (RU) Ruleguard vs SemGrep vs CodeQL (EN) 16
  11. Ruleguard use case • A lot of calls to small

    functions Go -> interpreter • Filters call native bindings Interpreter -> Go calls Need performance in both directions 18
  12. 21

  13. Yaegi performance issues reflection.Value as value type 23 Tons of

    heap allocations Direct AST interpretation
  14. Scriggo multi stacks type Registers struct { Int []int64 Float

    []float64 String []string General []reflect.Value } 28
  15. Scriggo multi stacks type Registers struct { Int []int64 //

    efficient! Float []float64 // efficient! String []string // efficient! General []reflect.Value } 29
  16. Scriggo multi stacks type Registers struct { Int []int64 //

    also handles int8/16/32… Float []float64 String []string General []reflect.Value } 30
  17. Scriggo multi stacks type Registers struct { Int []int64 Float

    []float64 String []string General []reflect.Value // slow } 31
  18. Scriggo performance issues Some types have bad performance (e.g. [

    ]byte) 32 Expensive Go->interpreter call costs
  19. 43 Interpreter Count Bytes Yaegi 1189064 11006432 Scriggo 1016 201016

    Quasigo 3 147416 3 allocs 🔥 Mandelbrot benchmark (allocs)
  20. Mandelbrot benchmark (allocs) 44 Interpreter Count Bytes Yaegi 1189064 11006432

    Scriggo 1016 201016 Quasigo 3 147416 What is wrong with Scriggo here?
  21. Why does Scriggo allocate a lot in Mandelbrot? 45 Mandelbrot

    size for benchmark is 1000, so numRows=1000
  22. Why does Scriggo allocate a lot in Mandelbrot? 46 Scriggo

    stores [ ]byte in reflect.Value Slicing creates a new 24-byte allocation
  23. Why quasigo is faster than Scriggo? Faster interpreter core (instructions

    dispatch) 47 Quasigo doesn’t shy away from unsafe package: Almost free native calls Efficient frames layout and slots representation Reflection-free access to arbitrary data 1 2 3 4
  24. Is “unsafe” package usage justified here? Go runtime itself is

    written with a help of “unsafe”. It’s OK to use unsafe package in runtimes and low-level libraries that need all performance they can get. 48
  25. Interpreters comparison 49 Interpreter Eval performance Eval entry overhead Yaegi

    Very low High Scriggo High Very High Quasigo Very high Very low
  26. Interpreters comparison (more) 50 Interpreter Interpretation type Relies on Yaegi

    AST traversal reflection Scriggo Bytecode, reg VM reflection Quasigo Bytecode, reg VM unsafe
  27. VM stack frame 52 fn1 frame fn2 frame fn3 frame

    free space The stack grows this way
  28. VM stack frame 53 fn1 frame fn2 frame fn3 frame

    free space Interpreter memory
  29. VM stack frame 54 fn1 frame fn2 frame fn3 frame

    free space x y local0 local1 … func fn2(x, y int) int
  30. VM stack frame 55 x y local0 local1 … func

    fn2(x, y int) int arg0 arg1 The call args are placed on the next func frame fn1 frame fn2 frame fn3 frame free space
  31. VM stack frame 56 x y local0 local1 … arg0

    arg1 These cells are called “slots” (or “virtual registers”) fn1 frame fn2 frame fn3 frame free space
  32. VM stack frame 57 func fn2(x, y int) int {

    return x + y * 3 } LoadScalarConst local2 = 3 IntMul64 local1 = y local2 IntAdd64 local0 = x local1 ReturnScalar local0 fn2 frame x y local0 local1 local2
  33. VM stack slots type Slot struct { Ptr unsafe.Pointer Scalar

    uint64 Scalar2 uint64 } 59 sizeof(slot) == 24 bytes (on 64-bit platforms)
  34. type Slot struct { Ptr unsafe.Pointer Scalar uint64 Scalar2 uint64

    } Pointer types are stored in Pointer field VM stack slots: pointer types 60
  35. type Slot struct { Ptr unsafe.Pointer Scalar uint64 Scalar2 uint64

    } Simple numeric types are stored in Scalar field VM stack slots: scalar types 61
  36. VM stack slots: strings type Slot struct { Ptr unsafe.Pointer

    Scalar uint64 Scalar2 uint64 } 62 Strings are stored in Ptr+Scalar. This matches the Go runtime string layout!
  37. VM stack slots: slices type Slot struct { Ptr unsafe.Pointer

    Scalar uint64 Scalar2 uint64 } 63 Slices occupy all the slots. This matches the Go runtime slices layout!
  38. VM stack slots: interfaces type Slot struct { Ptr unsafe.Pointer

    Scalar uint64 Scalar2 uint64 } 64 Interfaces are stored in Ptr+Scalar. Data and typeinfo pointers are swapped.
  39. VM stack slots: structs type Slot struct { Ptr unsafe.Pointer

    Scalar uint64 Scalar2 uint64 } 65 Small structs are stored directly inside the slot, if possible. Otherwise they’re heap allocated and we store a pointer to it.
  40. Compiler architecture 67 Go code The input Go sources AST+types

    go/ast and go/types data IR Low-level intermediate representation bytecode The final compiler output
  41. Go code AST+types IR Good for optimizations and transformations bytecode

    Good for the execution (it’s also compact) Compiler architecture 68
  42. Bytecode instructions encoding Simple variadic-length scheme: • 1-byte opcode •

    1 or 2 bytes per instruction argument • Constants are loaded from external slice using the index 3-address instruction format: dst + src1 + src2
  43. Bytecode instructions encoding dst = x + y => IntAdd64

    dst = x y • Opcode=IntAdd64 (1 byte) • Arg0 dst (1 byte) • Arg1 x (1 byte) • Arg2 y (1 byte) Frame slot index
  44. LoadScalarConst local0.v0 = 1 LoadScalarConst local2.v0 = 1 IntAdd64 local1.v0

    = local0.v0 local2.v0 LoadScalarConst local3.v0 = 1 IntAdd64 local2.v1 = local1.v0 local3.v0 LoadScalarConst local4.v0 = 1 IntAdd64 local3.v1 = local2.v1 local4.v0 ReturnScalar local3.v1 # function IR func f() int { x1 := 1 x2 := x1 + 1 x3 := x2 + 1 return x3 + 1 } Constant propagation 71
  45. LoadScalarConst local0.v0 = 1 LoadScalarConst local2.v0 = 1 IntAdd64 local1.v0

    = local0.v0 local2.v0 LoadScalarConst local3.v0 = 1 IntAdd64 local2.v1 = local1.v0 local3.v0 LoadScalarConst local4.v0 = 1 IntAdd64 local3.v1 = local2.v1 local4.v0 ReturnScalar local3.v1 # Slots have unique “versions” func f() int { x1 := 1 x2 := x1 + 1 x3 := x2 + 1 return x3 + 1 } Constant propagation 72
  46. LoadScalarConst local0.v0 = 1 LoadScalarConst local2.v0 = 1 IntAdd64 local1.v0

    = local0.v0 local2.v0 LoadScalarConst local3.v0 = 1 IntAdd64 local2.v1 = local1.v0 local3.v0 LoadScalarConst local4.v0 = 1 IntAdd64 local3.v1 = local2.v1 local4.v0 ReturnScalar local3.v1 # Same slots can have different # versions in a single block func f() int { x1 := 1 x2 := x1 + 1 x3 := x2 + 1 return x3 + 1 } Constant propagation 73
  47. # Result bytecode LoadScalarConst local0 = 4 ReturnScalar local0 func

    f() int { x1 := 1 x2 := x1 + 1 x3 := x2 + 1 return x3 + 1 } Constant propagation 74
  48. Len local2.v0 = s Zero local3.v0 ScalarEq local1.v0 = local2.v0

    local3.v0 Not local0.v0 = local1.v0 JumpZero Label0 local0.v0 Jump condition optimizations 75 if !(len(s) == 0) { … }
  49. Len local2.v0 = s Zero local3.v0 ScalarEq local1.v0 = local2.v0

    local3.v0 Not local0.v0 = local1.v0 JumpZero Label0 local0.v0 # Can inverse the jump cond if !(len(s) == 0) { … } Jump condition optimizations 76
  50. Len local2.v0 = s Zero local3.v0 ScalarEq local1.v0 = local2.v0

    local3.v0 Not local0.v0 = local1.v0 JumpNotZero Label0 local1.v0 Jump condition optimizations 77 if !(len(s) == 0) { … }
  51. Len local2.v0 = s Zero local3.v0 ScalarEq local1.v0 = local2.v0

    local3.v0 JumpNotZero Label0 local1.v0 # Can inject the zero comparison # into the jump cond 78 if !(len(s) == 0) { … } Jump condition optimizations
  52. Len local2.v0 = s Zero local3.v0 ScalarEq local1.v0 = local2.v0

    local3.v0 JumpZero Label0 local2.v0 Jump condition optimizations 79 if !(len(s) == 0) { … }
  53. # Result bytecode Len local2 = s JumpZero Label0 local2

    Jump condition optimizations 80 if !(len(s) == 0) { … }
  54. Other optimizations • Inlining (with post-inlining optimizations) • Some idioms

    and corner cases recognition • Frames trimming • Unused constants removal 81 Implemented:
  55. Quasigo for game development Write a game in Go (using

    ebiten or other game library). Allow the users to write plugins/scripts for your game in Go, using quasigo as embedded interpreter. 84
  56. Quasigo for query languages For example, a DB like tarantool,

    but written in Go and with Go as a query language instead of Lua. It’s also possible to use Go scripts as custom filtering lambdas for your internal services. 85
  57. Quasigo for template engines Since Scriggo is used as a

    template engine core, I could imagine quasigo used in the same context. It will probably be at least as efficient as Scriggo in that domain. 86
  58. Quasigo state Good enough for ruleguard Not ready for general

    purpose production use yet Alpha release can be expected in 4-6 months 87
  59. Creating interpreter in Go • Higher bytecode instruction dispatch cost

    • Harder to fine-tune runtime-related code without asm • Paying extra price to be Go GC friendly Overall, the raw performance can be ~20% slower for identically optimal interpreters of the same target language. 92 Cons:
  60. Creating interpreter in Go • No need to use CGo

    to embed the interpreter • Getting a GC for free • Cheap interop with Go (in both directions) • Can use Go stdlib in the target language stdlib • Great benchmarking/testing/ profiling support 93 Pros:
  61. But why is quasigo sometimes faster? • Statically typed values

    (therefore, instructions) • Go has true integer type (and unboxed scalars in general) • For array-like data, slices are better than Lua tables • Structs are better than Lua tables The raw performance is lower, but Go is “faster” than Lua. 94
  62. Conclusions 1 2 Interpreters written in Go can be quite

    fast if done right Interpreters benefit from “unsafe” a lot
  63. Conclusions 1 2 3 Go is a good interpretation target

    language Interpreters written in Go can be quite fast if done right Interpreters benefit from “unsafe” a lot
  64. Conclusions 1 2 3 4 You can embed Go instead

    of Lua in your Go apps Go is a good interpretation target language Interpreters written in Go can be quite fast if done right Interpreters benefit from “unsafe” a lot