Register-based calling convention for Go functions

Register-based calling convention for Go functions Cherie Hsieh @ TSMC

Outline 1. Introduce to calling convention 2. Register-based v.s Stack-based
calling convention 3. Switch to a register-based calling convention 4. Performance benchmark

Introduce to calling convention Calling convention is a part of
Application Binary Interface (ABI), it deﬁnes how subroutines receive parameters from their caller and how they return a result. https://en.wikipedia.org/wiki/Calling_convention

Introduce to calling convention 0x30 (code address) func main() {
price := calcPrice(10, 1) } 0x20 func calcPrice(price int, tax int) int { res := price + tax return res } send parameters return the result 1 2

Introduce to calling convention CPU provider Operating System Compiler calling
convention guide implement the calling convention extend the calling convention for speciﬁc languages

Introduce to calling convention RISC-V

Register-based v.s Stack-based calling convention

Register-based calling convention func add(a int, b int) int {
c := a + b return c } func main() { number1 := 2 number2 := 3 result := add(number1, number2) } MOVD $2, R0 MOVD $3, R1 CALL "".add(SB) ADD R1, R0, R0 RET (R30) R: register

Stack-based calling convention func add(a int, b int) int {
c := a + b return c } func main() { number1 := 2 number2 := 3 result := add(number1, number2) } MOVD $2, R0 MOVD R0, 8(RSP) MOVD $3, R0 MOVD R0, 16(RSP) CALL "".add(SB) MOVD 24(RSP), R0 MOVD "".a(FP), R0 MOVD "".b+8(FP), R1 ADD R1, R0, R0 MOVD R0, "".~r2+16(FP) RET (R30)

Calling conventions of diﬀerent languages Register-based calling conventions 1. C
/ C++ (GNU or LLVM compiler) 2. Rust (LLVM-based compiler) 3. Java (JIT-compiled) Stack-based calling conventions 1. Python 2. Java (interpreter)

Switch to a register-based calling convention

Switch to a register-based calling convention Discussion started on Aug/12/2020
(go 1.15) Why Go use a stacked-based calling convention before go 1.17 1. All platforms can use essentially the same conventions 2. Simplify the implementation of loacal variable allocation 3. Simplify the stack tracing for garbage collection and stack growth Drawbacks It leaves a lot of performance on the table.

Switch to a register-based calling convention Advantages of stacked-based calling
convention accessing arguments in registers is still roughly 40% faster than accessing arguments on the stack (main memory). Drawbacks 1. It would introduce additional compile time to allocate registers. 2. Increasing the design compelxity of compiler

Switch to a register-based calling convention Supported Architectures - Golang
v1.17 64-bit x86 architecture - Golang v1.18 64-bit ARM and 64-bit PowerPC - Golang v1.19 riscv64

Performance benchmark

Performance benchmark func fib(n int) int { if n >
1 { return fib(n - 1) + fib(n - 2) } return n } func main() { n := 50 _ = fib(n) } MOVD $50, R0 MOVD R0, 8(RSP) PCDATA $1, ZR CALL "".fib(SB) # if n > 1 MOVD "".n(FP), R0 CMP $1, R0 BLE fib_pc104 # fib(n - 1) SUB $1, R0, R1 MOVD R1, 8(RSP) PCDATA $1, ZR CALL "".fib(SB) MOVD 16(RSP), R0 MOVD R0, ""..autotmp_4-8(SP) # fib(n - 2) MOVD "".n(FP), R1 SUB $2, R1, R1 MOVD R1, 8(RSP) CALL "".fib(SB) MOVD 16(RSP), R0 MOVD ""..autotmp_4-8(SP), R1 # fib(n - 1) + fib(n - 2) ADD R0, R1, R0 MOVD R0, "".~r1+8(FP) MOVD -8(RSP), R29 MOVD.P 48(RSP), R30 RET (R30) Go v1.17

Performance benchmark func fib(n int) int { if n >
1 { return fib(n - 1) + fib(n - 2) } return n } func main() { n := 50 _ = fib(n) } MOVD $50, R0 PCDATA $1, ZR CALL "".fib(SB) # if n > 1 CMP $1, R0 BLE fib_pc92 # fib(n - 1) SUB $1, R0, R1 MOVD R1, R0 PCDATA $1, ZR CALL "".fib(SB) MOVD R0, ""..autotmp_4-8(SP) # fib(n - 2) MOVD "".n(FP), R1 SUB $2, R1, R1 MOVD R1, R0 CALL "".fib(SB) MOVD ""..autotmp_4-8(SP), R1 # fib(n - 1) + fib(n - 2) ADD R0, R1, R0 MOVD -8(RSP), R29 MOVD.P 32(RSP), R30 RET (R30) Go v1.18

Performance benchmark

Performance benchmark Benchmarks for a representative set of Go packages
and programs show performance improvements of about 5%, and a typical reduction in binary size of about 2%.

Performance benchmark A variety of applications can benefit from the
64-bit Arm CPU performance improvements released in Go 1.18. Programs with an object-oriented design, recursion, or that have many function calls in their implementation will likely benefit more from the new register ABI calling convention. Making your Go workloads up to 20% faster with Go 1.18 and AWS Graviton

References 1. Making your Go workloads up to 20% faster
with Go 1.18 and AWS Graviton 2. Proposal: Register-based Go calling convention 3. Stack frame layout on x86-64

Thank You for Your Time. Cherie Hsieh @ TSMC

Register-based calling convention for Go functions

Register-based calling convention for Go functions

Cherie Hsieh

Other Decks in Programming

Featured

Transcript