Understanding Go 1.17’s performance boost

Go 1.17 released to the public on 16th August 2021. The highlight of this release was a 5% boost in performance with 0 code changes for existing code bases.

This release brings additional improvements to the compiler, namely a new way of passing function arguments and results. This change has shown about a 5% performance improvement in Go programs and reduction in binary sizes of around 2% for amd64 platforms.

https://go.dev/blog/go1.17

I am very curious to understand how this happens. How can the same code output smaller binaries and perform faster just by compiling it with a newer version of Go? Let’s dive in! Before we try to understand the improvements offered by Go 1.17 we must first try to understand how Go worked before 1.17 and the reason it slow. Let’s consider the following program:

package main

import "fmt"

func Add(x, y int) int {
	return x + y
}

func main() {
	result := Add(5, 6)
	fmt.Println("result: ", result)
}

This program is very simple. It has a function Add which adds two numbers and returns their sum. This sum is then printed out on the console.

Calling a Function is Slow

In any language, calling a function has a cost attached to it. There are overheads of putting the parameters into registers or onto the stack and then reversing the process when the function returns.

Inlining to the rescue

Let’s write a simple benchmark to try to understand what inlining is and how much of a difference it makes:

package main

import "testing"

var Result int

func BenchmarkAdd(b *testing.B) {
	var r int

	for i := 0; i < b.N; i++ {
		r = Add(1, i)
	}

	Result = r
}

To disable inlining for a function, we can add a //go:inline pragma above the Add function like below:

//go:noinline
func Add(x, y int) int {
	return x + y
}

Running a benchmark with inlining disabled gives us the following result:

Running the benchmark again without the //go:noinline pragma produces the following result:

From 1.592 ns/op to 0.25 ns/op, we have seen a whopping 83% improvement!

How exactly did we get such major improvements?

Inlining copies the code from the function definition directly into the code of the calling function. This eliminates the need for the function call and clears all the overhead. Putting the content of the function Add into its caller reduced the number of instructions executed by the processor. This made our code work faster.

What does Go 1.17 bring to the table?

To understand the improvements brought around by Go 1.17, let us compile our example program using Go 1.16 and Go 1.17 and analyse the generated assembly.

arg_0= qword ptr  8
arg_8= qword ptr  10h
arg_10= qword ptr  18h

mov     rax, [rsp+arg_8]
mov     rcx, [rsp+arg_0]
add     rax, rcx
mov     [rsp+arg_10], rax
retn

Go 1.16.7

add     rax, rbx
retn

Go 1.17

Go 1.17 just produces 2 lines of assembly code as compared to 9 lines by Go 1.16. This happens due to a change in the way function parameters are passed in Go 1.17.

Go 1.16

Go versions 1.16 and below keep the function parameters on the stack. These parameters must be moved into the registers before they can be processed.

The program first moves the parameters into the registers RAX and RCX before it is able to execute the add operation.

Once the parameters are added, the program must now move the result back from the RAX register into the stack. All of this needs 3 extra add instructions other than the one add instruction needed to actually perform the addition.

Go 1.17

Go 1.17 however, passes the function parameters directly in the registers RAX and RBX as seen above. The program now just needs to execute a single instruction to perform the addition and return!

As we are executing less number of instructions, the program runs faster and the resultant binary is smaller in size. Also, apart from not executing the extra instructions, time is also saved in not reading the parameters from the stack (memory) and not writing the result back to the stack.

Benchmarking the Performance

Let’s update our benchmark to add some complexity.

package main

import "testing"

var Result int

func BenchmarkAdd(b *testing.B) {
	var r int

	for i := 0; i < b.N; i++ {
		r = Add(Add(1, i), Add(i, -1))
	}

	Result = r
}

Running the benchmark with Go 1.16.7 produces the following result:

Running the same benchmark with Go 1.17 returns the following result:

From 0.50ns/op to 0.32ns/op we see around 35% improvement. For a real world application I think a performance increase of about 5-8% can be expected.

Binary Size

The size difference in between binaries is around 5%. For stripped binaries, this difference is around 13%. This could change subject to the number of dependencies used in the project.

Epilogue

Go 1.17 is just amazing! It’s incredible to see such a boost to performance by doing 0 changes to the code. I’d like to urge everyone to update to 1.17 and make the most of the performance gains for your applications! Please feel free to comment if you have any queries or feedback 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: