Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Register-based calling convention for Go functions

Register-based calling convention for Go functions

在 Go 1.18 版本中,register-based calling convention 正式實作進主流架構 (64-bit ARM & x86),而此改善有效地提升 Go 10% 以上的效能。本次議程將介紹 Go 從原先 stack-based 轉換到 register-based calling convention 的轉變過程和兩個 calling convention 的差異。

Cherie Hsieh

August 02, 2022
Tweet

Other Decks in Programming

Transcript

  1. Outline 1. Introduce to calling convention 2. Register-based v.s Stack-based

    calling convention 3. Switch to a register-based calling convention 4. Performance benchmark
  2. Introduce to calling convention Calling convention is a part of

    Application Binary Interface (ABI), it defines how subroutines receive parameters from their caller and how they return a result. https://en.wikipedia.org/wiki/Calling_convention
  3. Introduce to calling convention 0x30 (code address) func main() {

    price := calcPrice(10, 1) } 0x20 func calcPrice(price int, tax int) int { res := price + tax return res } send parameters return the result 1 2
  4. Introduce to calling convention CPU provider Operating System Compiler calling

    convention guide implement the calling convention extend the calling convention for specific languages
  5. Register-based calling convention func add(a int, b int) int {

    c := a + b return c } func main() { number1 := 2 number2 := 3 result := add(number1, number2) } MOVD $2, R0 MOVD $3, R1 CALL "".add(SB) ADD R1, R0, R0 RET (R30) R: register
  6. Stack-based calling convention func add(a int, b int) int {

    c := a + b return c } func main() { number1 := 2 number2 := 3 result := add(number1, number2) } MOVD $2, R0 MOVD R0, 8(RSP) MOVD $3, R0 MOVD R0, 16(RSP) CALL "".add(SB) MOVD 24(RSP), R0 MOVD "".a(FP), R0 MOVD "".b+8(FP), R1 ADD R1, R0, R0 MOVD R0, "".~r2+16(FP) RET (R30)
  7. Calling conventions of different languages Register-based calling conventions 1. C

    / C++ (GNU or LLVM compiler) 2. Rust (LLVM-based compiler) 3. Java (JIT-compiled) Stack-based calling conventions 1. Python 2. Java (interpreter)
  8. Switch to a register-based calling convention Discussion started on Aug/12/2020

    (go 1.15) Why Go use a stacked-based calling convention before go 1.17 1. All platforms can use essentially the same conventions 2. Simplify the implementation of loacal variable allocation 3. Simplify the stack tracing for garbage collection and stack growth Drawbacks It leaves a lot of performance on the table.
  9. Switch to a register-based calling convention Advantages of stacked-based calling

    convention accessing arguments in registers is still roughly 40% faster than accessing arguments on the stack (main memory). Drawbacks 1. It would introduce additional compile time to allocate registers. 2. Increasing the design compelxity of compiler
  10. Switch to a register-based calling convention Supported Architectures - Golang

    v1.17 64-bit x86 architecture - Golang v1.18 64-bit ARM and 64-bit PowerPC - Golang v1.19 riscv64
  11. Performance benchmark func fib(n int) int { if n >

    1 { return fib(n - 1) + fib(n - 2) } return n } func main() { n := 50 _ = fib(n) } MOVD $50, R0 MOVD R0, 8(RSP) PCDATA $1, ZR CALL "".fib(SB) # if n > 1 MOVD "".n(FP), R0 CMP $1, R0 BLE fib_pc104 # fib(n - 1) SUB $1, R0, R1 MOVD R1, 8(RSP) PCDATA $1, ZR CALL "".fib(SB) MOVD 16(RSP), R0 MOVD R0, ""..autotmp_4-8(SP) # fib(n - 2) MOVD "".n(FP), R1 SUB $2, R1, R1 MOVD R1, 8(RSP) CALL "".fib(SB) MOVD 16(RSP), R0 MOVD ""..autotmp_4-8(SP), R1 # fib(n - 1) + fib(n - 2) ADD R0, R1, R0 MOVD R0, "".~r1+8(FP) MOVD -8(RSP), R29 MOVD.P 48(RSP), R30 RET (R30) Go v1.17
  12. Performance benchmark func fib(n int) int { if n >

    1 { return fib(n - 1) + fib(n - 2) } return n } func main() { n := 50 _ = fib(n) } MOVD $50, R0 PCDATA $1, ZR CALL "".fib(SB) # if n > 1 CMP $1, R0 BLE fib_pc92 # fib(n - 1) SUB $1, R0, R1 MOVD R1, R0 PCDATA $1, ZR CALL "".fib(SB) MOVD R0, ""..autotmp_4-8(SP) # fib(n - 2) MOVD "".n(FP), R1 SUB $2, R1, R1 MOVD R1, R0 CALL "".fib(SB) MOVD ""..autotmp_4-8(SP), R1 # fib(n - 1) + fib(n - 2) ADD R0, R1, R0 MOVD -8(RSP), R29 MOVD.P 32(RSP), R30 RET (R30) Go v1.18
  13. Performance benchmark Benchmarks for a representative set of Go packages

    and programs show performance improvements of about 5%, and a typical reduction in binary size of about 2%.
  14. Performance benchmark A variety of applications can benefit from the

    64-bit Arm CPU performance improvements released in Go 1.18. Programs with an object-oriented design, recursion, or that have many function calls in their implementation will likely benefit more from the new register ABI calling convention. Making your Go workloads up to 20% faster with Go 1.18 and AWS Graviton
  15. References 1. Making your Go workloads up to 20% faster

    with Go 1.18 and AWS Graviton 2. Proposal: Register-based Go calling convention 3. Stack frame layout on x86-64