Go 1.17 released to the public on 16th August 2021. The highlight of this release was a 5% boost in performance with 0 code changes for existing code bases.
This release brings additional improvements to the compiler, namely a new way of passing function arguments and results. This change has shown about a 5% performance improvement in Go programs and reduction in binary sizes of around 2% for amd64 platforms.
https://go.dev/blog/go1.17
I am very curious to understand how this happens. How can the same code output smaller binaries and perform faster just by compiling it with a newer version of Go? Let’s dive in! Before we try to understand the improvements offered by Go 1.17 we must first try to understand how Go worked before 1.17 and the reason it slow. Let’s consider the following program:
package main
import "fmt"
func Add(x, y int) int {
return x + y
}
func main() {
result := Add(5, 6)
fmt.Println("result: ", result)
}
This program is very simple. It has a function Add which adds two numbers and returns their sum. This sum is then printed out on the console.
Calling a Function is Slow
In any language, calling a function has a cost attached to it. There are overheads of putting the parameters into registers or onto the stack and then reversing the process when the function returns.
Inlining to the rescue
Let’s write a simple benchmark to try to understand what inlining is and how much of a difference it makes:
package main
import "testing"
var Result int
func BenchmarkAdd(b *testing.B) {
var r int
for i := 0; i < b.N; i++ {
r = Add(1, i)
}
Result = r
}
To disable inlining for a function, we can add a //go:inline pragma above the Add function like below:
//go:noinline
func Add(x, y int) int {
return x + y
}
Running a benchmark with inlining disabled gives us the following result:

Running the benchmark again without the //go:noinline pragma produces the following result:

From 1.592 ns/op to 0.25 ns/op, we have seen a whopping 83% improvement!

How exactly did we get such major improvements?
Inlining copies the code from the function definition directly into the code of the calling function. This eliminates the need for the function call and clears all the overhead. Putting the content of the function Add into its caller reduced the number of instructions executed by the processor. This made our code work faster.
What does Go 1.17 bring to the table?
To understand the improvements brought around by Go 1.17, let us compile our example program using Go 1.16 and Go 1.17 and analyse the generated assembly.
arg_0= qword ptr 8
arg_8= qword ptr 10h
arg_10= qword ptr 18h
mov rax, [rsp+arg_8]
mov rcx, [rsp+arg_0]
add rax, rcx
mov [rsp+arg_10], rax
retn
Go 1.16.7
add rax, rbx
retn
Go 1.17
Go 1.17 just produces 2 lines of assembly code as compared to 9 lines by Go 1.16. This happens due to a change in the way function parameters are passed in Go 1.17.
Go 1.16

Go versions 1.16 and below keep the function parameters on the stack. These parameters must be moved into the registers before they can be processed.

The program first moves the parameters into the registers RAX and RCX before it is able to execute the add operation.

Once the parameters are added, the program must now move the result back from the RAX register into the stack. All of this needs 3 extra add instructions other than the one add instruction needed to actually perform the addition.
Go 1.17

Go 1.17 however, passes the function parameters directly in the registers RAX and RBX as seen above. The program now just needs to execute a single instruction to perform the addition and return!
As we are executing less number of instructions, the program runs faster and the resultant binary is smaller in size. Also, apart from not executing the extra instructions, time is also saved in not reading the parameters from the stack (memory) and not writing the result back to the stack.
Benchmarking the Performance
Let’s update our benchmark to add some complexity.
package main
import "testing"
var Result int
func BenchmarkAdd(b *testing.B) {
var r int
for i := 0; i < b.N; i++ {
r = Add(Add(1, i), Add(i, -1))
}
Result = r
}
Running the benchmark with Go 1.16.7 produces the following result:

Running the same benchmark with Go 1.17 returns the following result:

From 0.50ns/op to 0.32ns/op we see around 35% improvement. For a real world application I think a performance increase of about 5-8% can be expected.

Binary Size

The size difference in between binaries is around 5%. For stripped binaries, this difference is around 13%. This could change subject to the number of dependencies used in the project.
Epilogue
Go 1.17 is just amazing! It’s incredible to see such a boost to performance by doing 0 changes to the code. I’d like to urge everyone to update to 1.17 and make the most of the performance gains for your applications! Please feel free to comment if you have any queries or feedback 🙂