Anton ZhiyanovEverything about Go, SQL, and software in general.https://antonz.org/https://antonz.org/assets/favicon/favicon.pngAnton Zhiyanovhttps://antonz.org/Hugo -- gohugo.ioen-usFri, 03 Apr 2026 13:00:00 +0000Porting Go's strings package to Chttps://antonz.org/porting-go-strings/Fri, 03 Apr 2026 13:00:00 +0000https://antonz.org/porting-go-strings/With allocators, benchmarks, and some optimizations.Creating a subset of Go that translates to C was never my end goal. I liked writing C code with Go, but without the standard library it felt pretty limited. So, the next logical step was to port Go's stdlib to C.

Of course, this isn't something I could do all at once. I started with the io package, which provides core abstractions like Reader and Writer, as well as general-purpose functions like Copy. But io isn't very interesting on its own, since it doesn't include specific reader or writer implementations. So my next choices were naturally bytes and strings — the workhorses of almost every Go program. This post is about how the porting process went.

Bits and UTF-8BytesAllocatorsBuffers and buildersBenchmarksOptimizing searchOptimizing builderWrapping up

Bits and UTF-8

Before I could start porting bytes, I had to deal with its dependencies first:

  • math/bits implements bit counting and manipulation functions.
  • unicode/utf8 implements functions for UTF-8 encoded text.

Both of these packages are made up of pure functions, so they were pretty easy to port. The only minor challenge was the difference in operator precedence between Go and C — specifically, bit shifts (<<, >>). In Go, bit shifts have higher precedence than addition and subtraction. In C, they have lower precedence:

// Go: shift has HIGHER precedence than +
var x uint32 = 1<<2 + 3  // (1 << 2) + 3 == 7
// C: shift has LOWER precedence than +
uint32_t x = 1 << 2 + 3; // 1 << (2 + 3) == 32

The simplest solution was to just use parentheses everywhere shifts are involved:

// Go: Mul64 returns the 128-bit product of x and y: (hi, lo) = x * y
func Mul64(x, y uint64) (hi, lo uint64) {
    const mask32 = 1<<32 - 1
    x0 := x & mask32
    x1 := x >> 32
    y0 := y & mask32
    y1 := y >> 32
    w0 := x0 * y0
    t := x1*y0 + w0>>32
    // ...
}
// C: Mul64 returns the 128-bit product of x and y: (hi, lo) = x * y
so_Result bits_Mul64(uint64_t x, uint64_t y) {
    const so_int mask32 = ((so_int)1 << 32) - 1;
    uint64_t x0 = (x & mask32);
    uint64_t x1 = (x >> 32);
    uint64_t y0 = (y & mask32);
    uint64_t y1 = (y >> 32);
    uint64_t w0 = x0 * y0;
    uint64_t t = x1 * y0 + (w0 >> 32);
    // ...
}

With bits and utf8 done, I moved on to bytes.

Bytes

The bytes package provides functions for working with byte slices:

// Count counts the number of non-overlapping instances of sep in s.
func Count(s, sep []byte) int

// Equal reports whether a and b are the
// same length and contain the same bytes.
func Equal(a, b []byte) bool

// Index returns the index of the first instance
// of sep in s, or -1 if sep is not present in s.
func Index(s, sep []byte) int

// Repeat returns a new byte slice consisting of count copies of b.
func Repeat(b []byte, count int) []byte

// and others

Some of them were easy to port, like Equal. Here's how it looks in Go:

// Equal reports whether a and b are the
// same length and contain the same bytes.
func Equal(a, b []byte) bool {
    // Neither cmd/compile nor gccgo allocates for these string conversions.
    return string(a) == string(b)
}

And here's the C version:

// bytes_string reinterprets a byte slice as a string (zero-copy).
#define so_bytes_string(bs) ({                  \
    so_Slice _bs = (bs);                        \
    (so_String){(const char*)_bs.ptr, _bs.len}; \
})

// string_eq returns true if two strings are equal.
static inline bool so_string_eq(so_String s1, so_String s2) {
    return s1.len == s2.len &&
        (s1.len == 0 || memcmp(s1.ptr, s2.ptr, s1.len) == 0);
}

// Equal reports whether a and b are the
// same length and contain the same bytes.
bool bytes_Equal(so_Slice a, so_Slice b) {
    return so_string_eq(so_bytes_string(a), so_bytes_string(b));
}

Just like in Go, the so_bytes_string ([]bytestring) macro doesn't allocate memory; it just reinterprets the byte slice's underlying storage as a string. The so_string_eq function (which works like == in Go) is easy to implement using memcmp from the libc API.

Another example is the IndexByte function, which looks for a specific byte in a slice. Here's the pure-Go implementation:

// IndexByte returns the index of the first instance
// of c in b, or -1 if c is not present in b.
func IndexByte(b []byte, c byte) int {
    for i, x := range b {
        if x == c {
            return i
        }
    }
    return -1
}

And here's the C version:

// IndexByte returns the index of the first instance
// of c in b, or -1 if c is not present in b.
so_int bytes_IndexByte(so_Slice b, so_byte c) {
    for (so_int i = 0; i < so_len(b); i++) {
        so_byte x = so_at(so_byte, b, i);
        if (x == c) {
            return i;
        }
    }
    return -1;
}

I used a regular C for loop to mimic Go's for-range:

  • Loop over the slice indexes with for (so_len is a macro that returns b.len, similar to Go's len built-in).
  • Access the i-th byte with so_at (a bounds-checking macro that returns *((so_byte*)b.ptr + i)).

But Equal and IndexByte don't allocate memory. What should I do with Repeat, since it clearly does? I had a decision to make.

Allocators

The Go runtime handles memory allocation and deallocation automatically. In C, I had a few options:

  • Use a reliable garbage collector like Boehm GC to closely match Go's behavior.
  • Allocate memory with libc's malloc and have the caller free it later with free.
  • Introduce allocators.

An allocator is a tool that reserves memory (typically on the heap) so a program can store its data structures there. See Allocators from C to Zig if you want to learn more about them.

For me, the winner was clear. Modern systems programming languages like Zig and Odin clearly showed the value of allocators:

  • It's obvious whether a function allocates memory or not: if it has an allocator as a parameter, it allocates.
  • It's easy to use different allocation methods: you can use malloc for one function, an arena for another, and a stack allocator for a third.
  • It helps with testing and debugging: you can use a tracking allocator to find memory leaks, or a failing allocator to test error handling.

An Allocator is an interface with three methods: Alloc, Realloc, and Free. In C, it translates to a struct with function pointers:

// Allocator defines the interface for memory allocators.
typedef struct {
    void* self;
    so_Result (*Alloc)(void* self, so_int size, so_int align);
    so_Result (*Realloc)(void* self, void* ptr,
        so_int oldSize, so_int newSize, so_int align);
    void (*Free)(void* self, void* ptr, so_int size, so_int align);
} mem_Allocator;

As I mentioned in the post about porting the io package, this interface representation isn't as efficient as using a static method table, but it's simpler. If you're interested in other options, check out the post on interfaces.

By convention, if a function allocates memory, it takes an allocator as its first parameter. So Go's Repeat:

// Repeat returns a new byte slice consisting of count copies of b.
func Repeat(b []byte, count int) []byte

Translates to this C code:

// Repeat returns a new byte slice consisting of count copies of b.
//
// If the allocator is nil, uses the system allocator.
// The returned slice is allocated; the caller owns it.
so_Slice bytes_Repeat(mem_Allocator a, so_Slice b, so_int count)

If the caller doesn't care about using a specific allocator, they can just pass an empty allocator, and the implementation will use the system allocator — calloc, realloc, and free from libc.

Here's a simplified version of the system allocator (I removed safety checks to make it easier to read):

// SystemAllocator uses the system's malloc, realloc, and free functions.
// It zeros out new memory on allocation and reallocation.
typedef struct {} mem_SystemAllocator;

so_Result mem_SystemAllocator_Alloc(void* self, so_int size, so_int align) {
    void* ptr = calloc(1, (size_t)(size));
    if (ptr == NULL) {
        return (so_Result){.val.as_ptr = NULL, .err = mem_ErrOutOfMemory};
    }
    return (so_Result){ .val.as_ptr = ptr, .err = NULL};
}

so_Result mem_SystemAllocator_Realloc(void* self, void* ptr, so_int oldSize,
    so_int newSize, so_int align) {
    void* newPtr = realloc(ptr, (size_t)(newSize));
    if (newPtr == NULL) {
        return (so_Result){.val.as_ptr = NULL, .err = mem_ErrOutOfMemory};
    }
    if (newSize > oldSize) {
        // Zero new memory beyond the old size.
        memset((char*)newPtr + oldSize, 0, (size_t)(newSize - oldSize));
    }
    return (so_Result){.val.as_ptr = newPtr, .err = NULL};
}

void mem_SystemAllocator_Free(void* self, void* ptr, so_int size, so_int align) {
    free(ptr);
}

The system allocator is stateless, so it's safe to have a global instance:

// System is an instance of a memory allocator that uses
// the system's malloc, realloc, and free functions.
mem_Allocator mem_System = {
    .self = &(mem_SystemAllocator){},
    .Alloc = mem_SystemAllocator_Alloc,
    .Free = mem_SystemAllocator_Free,
    .Realloc = mem_SystemAllocator_Realloc};

Here's an example of how to call Repeat with an allocator:

so_Slice src = so_string_bytes(so_str("abc"));
so_Slice got = bytes_Repeat(mem_System, src, 3);
so_String gotStr = so_bytes_string(got);
if (so_string_ne(gotStr, so_str("abcabcabc"))) {
    so_panic("want Repeat(abc) == abcabcabc");
}
mem_FreeSlice(so_byte, mem_System, got);

Way better than hidden allocations!

Buffers and builders

Besides pure functions, bytes and strings also provide types like bytes.Buffer, bytes.Reader, and strings.Builder. I ported them using the same approach as with functions.

For types that allocate memory, like Buffer, the allocator becomes a struct field:

// A Buffer is a variable-sized buffer of bytes
// with Read and Write methods.
typedef struct {
    mem_Allocator a;
    so_Slice buf;
    so_int off;
} bytes_Buffer;
// Usage example.
bytes_Buffer buf = bytes_NewBuffer(mem_System, (so_Slice){0});
bytes_Buffer_WriteString(&buf, so_str("hello"));
bytes_Buffer_WriteString(&buf, so_str(" world"));
so_String str = bytes_Buffer_String(&buf);
if (so_string_ne(str, so_str("hello world"))) {
    so_panic("Buffer.WriteString failed");
}
bytes_Buffer_Free(&buf);

The code is pretty wordy — most C developers would dislike using bytes_Buffer_WriteString instead of something shorter like buf_writestr. My solution to this problem is to automatically translate Go code to C (which is actually what I do when porting Go's stdlib). If you're interested, check out the post about this approach — Solod: Go can be a better C.

Types that don't allocate, like bytes.Reader, need no special treatment — they translate directly to C structs without an allocator field.

The strings package is the twin of bytes, so porting it was uneventful. Here's strings.Builder usage example in Go and C side by side:

// go
var sb strings.Builder
sb.WriteString("Hello")
sb.WriteByte(',')
sb.WriteRune(' ')
sb.WriteString("world")
s := sb.String()
if s != "Hello, world" {
    panic("want sb.String() == 'Hello, world'")
}
// c
strings_Builder sb = {.a = mem_System};
strings_Builder_WriteString(&sb, so_str("Hello"));
strings_Builder_WriteByte(&sb, ',');
strings_Builder_WriteRune(&sb, U' ');
strings_Builder_WriteString(&sb, so_str("world"));
so_String s = strings_Builder_String(&sb);
if (so_string_ne(s, so_str("Hello, world"))) {
    so_panic("want sb.String() == 'Hello, world'");
}
strings_Builder_Free(&sb);

Again, the C code is just a more verbose version of Go's implementation, plus explicit memory allocation.

Benchmarks

What's the point of writing C code if it's slow, right? I decided it was time to benchmark the ported C types and functions against their Go versions.

To do that, I ported the benchmarking part of Go's testing package. Surprisingly, the simplified version was only 300 lines long and included everything I needed:

  • Figuring out how many iterations to run.
  • Running the benchmark function in a loop.
  • Recording metrics (ns/op, MB/s, B/op, allocs/op).
  • Reporting the results.

Here's a sample benchmark for the strings.Builder type:

static so_String someStr = so_str("some string sdljlk jsklj3lkjlk djlkjw");
static const so_int numWrite = 16;
volatile so_String sink = {0};

void main_WriteString_AutoGrow(testing_B* b) {
    mem_Allocator a = testing_B_Allocator(b);
    for (; testing_B_Loop(b);) {
        strings_Builder sb = strings_NewBuilder(a);
        for (so_int i = 0; i < numWrite; i++) {
            strings_Builder_WriteString(&sb, someStr);
        }
        sink = strings_Builder_String(&sb);
        strings_Builder_Free(&sb);
    }
}

// more benchmarks...

Reads almost like Go's benchmarks.

To monitor memory usage, I created Tracker — a memory allocator that wraps another allocator and keeps track of allocations:

// A Stats records statistics about the memory allocator.
typedef struct {
    uint64_t Alloc;
    uint64_t TotalAlloc;
    uint64_t Mallocs;
    uint64_t Frees;
} mem_Stats;

// A Tracker wraps an Allocator and tracks all
// allocations and deallocations made through it.
typedef struct {
    mem_Allocator Allocator;
    mem_Stats Stats;
} mem_Tracker;

so_Result mem_Tracker_Alloc(void* self, so_int size, so_int align) {
    mem_Tracker* t = self;
    so_Result res = t->Allocator.Alloc(t->Allocator.self, size, align);
    // ...
    t->Stats.Alloc += (uint64_t)(size);
    t->Stats.TotalAlloc += (uint64_t)(size);
    t->Stats.Mallocs++;
    return (so_Result){.val.as_ptr = res.val.as_ptr, .err = NULL};
}

void mem_Tracker_Free(void* self, void* ptr, so_int size, so_int align) {
    mem_Tracker* t = self;
    t->Allocator.Free(t->Allocator.self, ptr, size, align);
    t->Stats.Alloc -= (uint64_t)(size);
    t->Stats.Frees++;
}

The benchmark gets an allocator through the testing_RunBenchmarks function and wraps it in a Tracker to keep track of allocations:

int main(void) {
    so_Slice benchs = {(testing_Benchmark[4]){
        {.Name = so_str("WriteS_AutoGrow"), .F = main_WriteString_AutoGrow},
        {.Name = so_str("WriteS_PreGrow"), .F = main_WriteString_PreGrow},
        {.Name = so_str("WriteB_AutoGrow"), .F = main_Write_AutoGrow},
        {.Name = so_str("WriteB_PreGrow"), .F = main_Write_PreGrow}},
        4, 4};
    testing_RunBenchmarks(mem_System, benchs);
}

There's no auto-discovery, but the manual setup is quite straightforward.

With the benchmarking setup ready, I ran benchmarks on the strings package. Some functions did well — about 1.5-2x faster than their Go equivalents:

go
Benchmark_Clone-8      12143073      98.50 ns/op    1024 B/op    1 allocs/op
Benchmark_Fields-8       791077    1524 ns/op        288 B/op    1 allocs/op
Benchmark_Repeat-8      9197040     127.3 ns/op     1024 B/op    1 allocs/op

c
Benchmark_Clone        27935466      41.84 ns/op    1024 B/op    1 allocs/op
Benchmark_Fields        1319384     907.7 ns/op      272 B/op    1 allocs/op
Benchmark_Repeat       18445929      64.11 ns/op    1024 B/op    1 allocs/op

But Index (searching for a substring in a string) was a total disaster — it was nearly 20 times slower than in Go:

go
Benchmark_Index-8      47874408      25.14 ns/op       0 B/op    0 allocs/op

c
Benchmark_Index          483787     483.1 ns/op        0 B/op    0 allocs/op

The problem was caused by the IndexByte function we looked at earlier:

// IndexByte returns the index of the first instance
// of c in b, or -1 if c is not present in b.
func IndexByte(b []byte, c byte) int {
    for i, x := range b {
        if x == c {
            return i
        }
    }
    return -1
}

This "pure" Go implementation is just a fallback. On most platforms, Go uses a specialized version of IndexByte written in assembly.

For the C version, the easiest solution was to use memchr, which is also optimized for most platforms:

static inline so_int bytealg_IndexByte(so_Slice b, so_byte c) {
    void* at = memchr(b.ptr, (int)c, b.len);
    if (at == NULL) return -1;
    return (so_int)((char*)at - (char*)b.ptr);
}

With this fix, the benchmark results changed drastically:

go
Benchmark_Index-8        47874408    25.14 ns/op    0 B/op    0 allocs/op
Benchmark_IndexByte-8    54982188    21.98 ns/op    0 B/op    0 allocs/op

c
Benchmark_Index          33552540    35.21 ns/op    0 B/op    0 allocs/op
Benchmark_IndexByte      36868624    32.81 ns/op    0 B/op    0 allocs/op

Still not quite as fast as Go, but it's close. Honestly, I don't know why the memchr-based implementation is still slower than Go's assembly here, but I decided not to pursue it any further.

After running the rest of the strings function benchmarks, the ported versions won all of them except for two:

Benchmark Go C (mimalloc) C (arena) Winner
Clone 99ns 42ns 34ns C - 2.4x
Compare 47ns 36ns 36ns C - 1.3x
Fields 1524ns 908ns 912ns C - 1.7x
Index 25ns 35ns 34ns Go - 0.7x
IndexByte 22ns 33ns 33ns Go - 0.7x
Repeat 127ns 64ns 67ns C - 1.9x
ReplaceAll 243ns 200ns 203ns C - 1.2x
Split 1899ns 1399ns 1423ns C - 1.3x
ToUpper 2066ns 1602ns 1622ns C - 1.3x
Trim 501ns 373ns 375ns C - 1.3x

Benchmarking details

Optimizing builder

strings.Builder is a common way to compose strings from parts in Go, so I tested its performance too. The results were worse than I expected:

go
Benchmark_WriteS_AutoGrow-8   5385492   224.0 ns/op   1424 B/op   5 allocs/op
Benchmark_WriteS_PreGrow-8   10692721   112.9 ns/op    640 B/op   1 allocs/op

c
Benchmark_WriteS_AutoGrow     5659255   212.9 ns/op   1147 B/op   5 allocs/op
Benchmark_WriteS_PreGrow      9811054   122.1 ns/op    592 B/op   1 allocs/op

Here, the C version performed about the same as Go, but I expected it to be faster. Unlike Index, Builder is written entirely in Go, so there's no reason the ported version should lose in this benchmark.

The WriteString method looked almost identical in Go and C:

// WriteString appends the contents of s to b's buffer.
// It returns the length of s and a nil error.
func (b *Builder) WriteString(s string) (int, error) {
    b.buf = append(b.buf, s...)
    return len(s), nil
}
static so_Result strings_Builder_WriteString(void* self, so_String s) {
    strings_Builder* b = self;
    strings_Builder_grow(b, so_len(s));
    b->buf = so_extend(so_byte, b->buf, so_string_bytes(s));
    return (so_Result){.val.as_int = so_len(s), .err = NULL};
}

Go's append automatically grows the backing slice, while strings_Builder_grow does it manually (so_extend, on the contrary, doesn't grow the slice — it's merely a memcpy wrapper). So, there shouldn't be any difference. I had to investigate.

Looking at the compiled binary, I noticed a difference in how the functions returned results. Go returns multiple values in separate registers, so (int, error) uses three registers: one for 8-byte int, two for the error interface (implemented as two 8-byte pointers). But in C, so_Result was a single struct made up of two so_Value unions and a so_Error pointer:

typedef union {
    bool as_bool;        // 1 byte
    so_int as_int;       // 8 bytes
    int64_t as_i64;      // 8 bytes
    so_String as_string; // 16 bytes (ptr + len)
    so_Slice as_slice;   // 24 bytes (ptr + len + cap)
    void* as_ptr;        // 8 bytes
    // ... other types
} so_Value;

typedef struct {
    so_Value val;        // 24 bytes
    so_Value val2;       // 24 bytes
    so_Error err;        // 8 bytes
} so_Result;

Of course, this 56-byte monster can't be returned in registers — the C calling convention passes it through memory instead. Since WriteString is on the hot path in the benchmark, I figured this had to be the issue. So I switched from a single monolithic so_Result type to signature-specific types for multi-return pairs:

  • so_R_bool_err for (bool, error);
  • so_R_int_err for (so_int, error);
  • so_R_str_err for (so_String, error);
  • etc.

Now, the Builder.WriteString implementation in C looked like this:

typedef struct {
    so_int val;
    so_Error err;
} so_R_int_err;

static so_R_int_err strings_Builder_WriteString(void* self, so_String s) {
    // ...
}

so_R_int_err is only 16 bytes — small enough to be returned in two registers. Problem solved! But it wasn't — the benchmark only showed a slight improvement.

After looking into it more, I finally found the real issue: unlike Go, the C compiler wasn't inlining WriteString calls. Adding inline and moving strings_Builder_WriteString to the header file made all the difference:

go
Benchmark_WriteS_AutoGrow-8   5385492   224.0 ns/op   1424 B/op   5 allocs/op
Benchmark_WriteS_PreGrow-8   10692721   112.9 ns/op    640 B/op   1 allocs/op

c
Benchmark_WriteS_AutoGrow    10344024   115.9 ns/op   1147 B/op   5 allocs/op
Benchmark_WriteS_PreGrow     41045286    28.74 ns/op   592 B/op   1 allocs/op

2-4x faster. That's what I was hoping for!

Wrapping up

Porting bytes and strings was a mix of easy parts and interesting challenges. The pure functions were straightforward — just translate the syntax and pay attention to operator precedence. The real design challenge was memory management. Using allocators turned out to be a good solution, making memory allocation clear and explicit without being too difficult to use.

The benchmarks showed that the C versions outperformed Go in most cases, sometimes by 2-4x. The only exceptions were Index and IndexByte, where Go relies on hand-written assembly. The strings.Builder optimization was an interesting challenge: what seemed like a return-type issue was actually an inlining problem, and fixing it gave a nice speed boost.

There's a lot more of Go's stdlib to port. In the next post, we'll cover time — a very unique Go package. In the meantime, if you'd like to write Go that translates to C — with no runtime and manual memory management — I invite you to try Solod. The bytes and strings packages are included, of course.

]]>
Porting Go's io package to Chttps://antonz.org/porting-go-io/Wed, 25 Mar 2026 14:00:00 +0000https://antonz.org/porting-go-io/Interfaces, slices, multi-returns and alloca.Creating a subset of Go that translates to C was never my end goal. I liked writing C code with Go, but without the standard library it felt pretty limited. So, the next logical step was to port Go's stdlib to C.

Of course, this isn't something I could do all at once. So I started with the standard library packages that had the fewest dependencies, and one of them was the io package. This post is about how that went.

io packageSlicesMultiple returnsErrorsInterfacesType assertionSpecialized readersCopyWrapping up

The io package

io is one of the core Go packages. It introduces the concepts of readers and writers, which are also common in other programming languages.

In Go, a reader is anything that can read some raw data (bytes) from a source into a slice:

type Reader interface {
    Read(p []byte) (n int, err error)
}

A writer is anything that can take some raw data from a slice and write it to a destination:

type Writer interface {
    Write(p []byte) (n int, err error)
}

The io package defines many other interfaces, like Seeker and Closer, as well as combinations like ReadWriter and WriteCloser. It also provides several functions, the most well-known being Copy, which copies all data from a source (represented by a reader) to a destination (represented by a writer):

func Copy(dst Writer, src Reader) (written int64, err error)

C, of course, doesn't have interfaces. But before I get into that, I had to make several other design decisions.

Slices

In general, a slice is a linear container that holds N elements of type T. Typically, a slice is a view of some underlying data. In Go, a slice consists of a pointer to a block of allocated memory, a length (the number of elements in the slice), and a capacity (the total number of elements that can fit in the backing memory before the runtime needs to re-allocate):

type slice struct {
    array unsafe.Pointer
    len   int
    cap   int
}

Interfaces in the io package work with fixed-length slices (readers and writers should never append to a slice), and they only use byte slices. So, the simplest way to represent this in C could be:

typedef struct {
    uint8_t* ptr;
    size_t len;
} Bytes;

But since I needed a general-purpose slice type, I decided to do it the Go way instead:

typedef struct {
    void* ptr;
    size_t len;
    size_t cap;
} so_Slice;

Plus a bound-checking helper to access slice elements:

#define so_at(T, s, i) (*so_at_ptr(T, s, i))
#define so_at_ptr(T, s, i) ({            \
    so_Slice _s_at = (s);                \
    size_t _i = (size_t)(i);             \
    if (_i >= _s_at.len)                 \
        so_panic("index out of bounds"); \
    (T*)_s_at.ptr + _i;                  \
})

Usage example:

// go
nums := make([]int, 3)
nums[0] = 11
nums[1] = 22
nums[2] = 33
n1 := nums[1]
// c
so_Slice nums = so_make_slice(int, 3, 3);
so_at(int, nums, 0) = 11;
so_at(int, nums, 1) = 22;
so_at(int, nums, 2) = 33;
so_int n1 = so_at(int, nums, 1);

So far, so good.

Multiple returns

Let's look at the Read method again:

Read(p []byte) (n int, err error)

It returns two values: an int and an error. C functions can only return one value, so I needed to figure out how to handle this.

The classic approach would be to pass output parameters by pointer, like read(p, &n, &err) or n = read(p, &err). But that doesn't compose well and looks nothing like Go. Instead, I went with a result struct:

typedef union {
    bool as_bool;
    so_int as_int;
    int64_t as_i64;
    so_String as_string;
    so_Slice as_slice;
    void* as_ptr;
    // ... other types
} so_Value;

typedef struct {
    so_Value val;
    so_Error err;
} so_Result;

The so_Value union can store any primitive type, as well as strings, slices, and pointers. The so_Result type combines a value with an error. So, our Read method (let's assume it's just a regular function for now):

func Read(p []byte) (n int, err error)

Translates to:

so_Result Read(so_Slice p);

And the caller can access the result like this:

so_Result res = Read(p);
if (res.err) {
    so_panic(res.err->msg);
}
so_println("read", res.val.as_int, "bytes");

Errors

For the error type itself, I went with a simple pointer to an immutable string:

struct so_Error_ {
    const char* msg;
};
typedef struct so_Error_* so_Error;

Plus a constructor macro:

#define errors_New(s) (&(struct so_Error_){s})

I wanted to avoid heap allocations as much as possible, so decided not to support dynamic errors. Only sentinel errors are used, and they're defined at the file level like this:

so_Error io_EOF = errors_New("EOF");
so_Error io_ErrOffset = errors_New("io: invalid offset");

Errors are compared by pointer identity (==), not by string content — just like sentinel errors in Go. A nil error is a NULL pointer. This keeps error handling cheap and straightforward.

Interfaces

This was the big one. In Go, an interface is a type that specifies a set of methods. Any concrete type that implements those methods satisfies the interface — no explicit declaration needed. In C, there's no such mechanism.

For interfaces, I decided to use "fat" structs with function pointers. That way, Go's io.Reader:

type Reader interface {
    Read(p []byte) (n int, err error)
}

Becomes an io_Reader struct in C:

typedef struct {
    void* self;
    so_Result (*Read)(void* self, so_Slice p);
} io_Reader;

The self pointer holds the concrete value, and each method becomes a function pointer that takes self as its first argument. This is less efficient than using a static method table, especially if the interface has a lot of methods, but it's simpler. So I decided it was good enough for the first version.

Now functions can work with interfaces without knowing the specific implementation:

// ReadFull reads exactly len(buf) bytes from r into buf.
so_Result io_ReadFull(io_Reader r, so_Slice buf) {
    so_int n = 0;
    so_Error err = NULL;
    for (; n < so_len(buf) && err == NULL;) {
        so_Slice curBuf = so_slice(so_byte, buf, n, buf.len);
        so_Result res = r.Read(r.self, curBuf);
        err = res.err;
        n += res.val.as_int;
    }
    // ...
}

// A custom reader.
typedef struct {
    so_Slice b;
} reader;

static so_Result reader_Read(void* self, so_Slice p) {
    // ...
}

int main(void) {
    // We'll read from a string literal.
    so_String str = so_str("hello world");
    reader rdr = (reader){.b = so_string_bytes(str)};

    // Wrap the specific reader into an interface.
    io_Reader r = (io_Reader){
        .self = &rdr,
        .Read = reader_Read,
    };

    // Read the first 4 bytes from the string into a buffer.
    so_Slice buf = so_make_slice(so_byte, 4, 4);
    // ReadFull doesn't care about the specific reader implementation -
    // it could read from a file, the network, or anything else.
    so_Result res = io_ReadFull(r, buf);
}

Calling a method on the interface just goes through the function pointer:

// r.Read(buf) becomes:
r.Read(r.self, buf);

Type assertion

Go's interface is more than just a value wrapper with a method table. It also stores type information about the value it holds:

type iface struct {
    tab  *itab
    data unsafe.Pointer  // specific value
}

type itab struct {
    Inter *InterfaceType // method table
    Type  *Type          // type information
    // ...
}

Since the runtime knows the exact type inside the interface, it can try to "upgrade" the interface (for example, a regular Reader) to another interface (like WriterTo) using a type assertion:

// copyBuffer copies from src to dst using the provided buffer
// until either EOF is reached on src or an error occurs.
func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
    // If the reader has a WriteTo method, use it to do the copy.
    if wt, ok := src.(WriterTo); ok {  // try "upgrading" to WriterTo
        return wt.WriteTo(dst)
    }
    // src is not a WriterTo, proceed with the default copy implementation.

The last thing I wanted to do was reinvent Go's dynamic type system in C, so dropping this feature was an easy decision.

There's another kind of type assertion, though — when we unwrap the interface to get the value of a specific type:

// Does r (a Reader) hold a pointer to a value of concrete type LimitedReader?
// If true, lr will get the unwrapped pointer.
lr, ok := r.(*LimitedReader)

And this kind of assertion is quite possible in C. All we have to do is compare function pointers:

// Are r.Read and LimitedReader_Read the same function?
bool ok = (r.Read == LimitedReader_Read);
if (ok) {
    io_LimitedReader* lr = r.self;
}

If two different types happened to share the same method implementation, this would break. In practice, each concrete type has its own methods, so the function pointer serves as a reliable type tag.

Specialized readers

After I decided on the interface approach, porting the actual io types was pretty easy. For example, LimitedReader wraps a reader and stops with EOF after reading N bytes:

type LimitedReader struct {
    R Reader
    N int64
}

func (l *LimitedReader) Read(p []byte) (int, error) {
    if l.N <= 0 {
        return 0, EOF
    }
    if int64(len(p)) > l.N {
        p = p[0:l.N]
    }
    n, err := l.R.Read(p)
    l.N -= int64(n)
    return n, err
}

The logic is straightforward: if there are no bytes left, return EOF. Otherwise, if the buffer is bigger than the remaining size, shorten it. Then, call the underlying reader, and decrease the remaining size.

Here's what the ported C code looks like:

typedef struct {
    io_Reader R;
    int64_t N;
} io_LimitedReader;

so_Result io_LimitedReader_Read(void* self, so_Slice p) {
    io_LimitedReader* l = self;
    if (l->N <= 0) {
        return (so_Result){.val.as_int = 0, .err = io_EOF};
    }
    if ((int64_t)(so_len(p)) > l->N) {
        p = so_slice(so_byte, p, 0, l->N);
    }
    so_Result res = l->R.Read(l->R.self, p);
    so_int n = res.val.as_int;
    l->N -= (int64_t)(n);
    return (so_Result){.val.as_int = n, .err = res.err};
}

A bit more verbose, but nothing special. The multiple return values, the interface call with l.R.Read, and the slice handling are all implemented as described in previous sections.

Copy

Copy is where everything comes together. Here's the simplified Go version:

// Copy copies from src to dst until either
// EOF is reached on src or an error occurs.
func Copy(dst Writer, src Reader) (written int64, err error) {
    // Allocate a temporary buffer for copying.
    size := 32 * 1024
    buf := make([]byte, size)
    // Copy from src to dst using the buffer.
    for {
        nr, er := src.Read(buf)
        if nr > 0 {
            nw, ew := dst.Write(buf[0:nr])
            written += int64(nw)
            if ew != nil {
                err = ew
                break
            }
        }
        if er != nil {
            if er != EOF {
                err = er
            }
            break
        }
    }
    return written, err
}

In Go, Copy allocates its buffer on the heap with make([]byte, size). I could take a similar approach in C — make Copy take an allocator and use it to create the buffer like this:

so_Result io_Copy(mem_Allocator a, io_Writer dst, io_Reader src) {
    so_int size = 32 * 1024;
    so_Slice buf = mem_AllocSlice(so_byte, a, size, size);
    // ...
}

But since this is just a temporary buffer that only exists during the function call, I decided stack allocation was a better choice:

so_Result io_Copy(io_Writer dst, io_Reader src) {
    so_int size = 8 * 1024;
    so_Slice buf = so_make_slice(so_byte, size, size);
    // ...
}

so_make_slice allocates memory on a stack with a bounds-checking macro that wraps C's alloca. It moves the stack pointer and gives you a chunk of memory that's automatically freed when the function returns.

People often avoid using alloca because it can cause a stack overflow, but using a bounds-checking wrapper fixes this issue. Another common concern with alloca is that it's not block-scoped — the memory stays allocated until the function exits. However, since we only allocate once, this isn't a problem.

Here's the simplified C version of Copy:

so_Result io_Copy(io_Writer dst, io_Reader src) {
    so_int size = 8 * 1024; // smaller buffer, 8 KiB
    so_Slice buf = so_make_slice(so_byte, size, size);
    int64_t written = 0;
    so_Error err = NULL;
    for (;;) {
        so_Result resr = src.Read(src.self, buf);
        so_int nr = resr.val.as_int;
        if (nr > 0) {
            so_Result resw = dst.Write(dst.self, so_slice(so_byte, buf, 0, nr));
            so_int nw = resw.val.as_int;
            written += (int64_t)(nw);
            if (resw.err != NULL) {
                err = resw.err;
                break;
            }
        }
        if (resr.err != NULL) {
            if (resr.err != io_EOF) {
                err = resr.err;
            }
            break;
        }
    }
    return (so_Result){.val.as_i64 = written, .err = err};
}

Here, you can see all the parts from this post working together: a function accepting interfaces, slices passed to interface methods, a result type wrapping multiple return values, error sentinels compared by identity, and a stack-allocated buffer used for the copy.

Wrapping up

Porting Go's io package to C meant solving a few problems: representing slices, handling multiple return values, modeling errors, and implementing interfaces using function pointers. None of this needed anything fancy — just structs, unions, functions, and some macros. The resulting C code is more verbose than Go, but it's structurally similar, easy enough to read, and this approach should work well for other Go packages too.

The io package isn't very useful on its own — it mainly defines interfaces and doesn't provide concrete implementations. So, the next two packages to port were naturally bytes and strings — I'll talk about those in the next post.

In the meantime, if you'd like to write Go that translates to C — with no runtime and manual memory management — I invite you to try Solod. The io package is included, of course.

]]>
Solod: Go can be a better Chttps://antonz.org/solod/Sat, 21 Mar 2026 14:00:00 +0000https://antonz.org/solod/A subset of Go that transpiles to regular C, with zero runtime.I'm working on a new programming language named Solod (So). It's a strict subset of Go that translates to C, without hidden memory allocations and with source-level interop.

Highlights:

  • Go in, C out. You write regular Go code and get readable C11 as output.
  • Zero runtime. No garbage collection, no reference counting, no hidden allocations.
  • Everything is stack-allocated by default. Heap is opt-in through the standard library.
  • Native C interop. Call C from So and So from C — no CGO, no overhead.
  • Go tooling works out of the box — syntax highlighting, LSP, linting and "go test".

So supports structs, methods, interfaces, slices, multiple returns, and defer. To keep things simple, there are no channels, goroutines, closures, or generics.

So is for systems programming in C, but with Go's syntax, type safety, and tooling.

Hello worldLanguage tourCompatibilityDesign decisionsFAQFinal thoughts

'Hello world' example

This Go code in a file main.go:

package main

type Person struct {
    Name string
    Age  int
    Nums [3]int
}

func (p *Person) Sleep() int {
    p.Age += 1
    return p.Age
}

func main() {
    p := Person{Name: "Alice", Age: 30}
    p.Sleep()
    println(p.Name, "is now", p.Age, "years old.")

    p.Nums[0] = 42
    println("1st lucky number is", p.Nums[0])
}

Translates to a header file main.h:

#pragma once
#include "so/builtin/builtin.h"

typedef struct main_Person {
    so_String Name;
    so_int Age;
    so_int Nums[3];
} main_Person;

so_int main_Person_Sleep(void* self);

Plus an implementation file main.c:

#include "main.h"

so_int main_Person_Sleep(void* self) {
    main_Person* p = (main_Person*)self;
    p->Age += 1;
    return p->Age;
}

int main(void) {
    main_Person p = (main_Person){.Name = so_str("Alice"), .Age = 30};
    main_Person_Sleep(&p);
    so_println("%.*s %s %" PRId64 " %s", p.Name.len, p.Name.ptr, "is now", p.Age, "years old.");
    p.Nums[0] = 42;
    so_println("%s %" PRId64, "1st lucky number is", p.Nums[0]);
}

Language tour

In terms of features, So is an intersection between Go and C, making it one of the simplest C-like languages out there — on par with Hare.

And since So is a strict subset of Go, you already know it if you know Go. It's pretty handy if you don't want to learn another syntax.

Let's briefly go over the language features and see how they translate to C.

VariablesStringsArraysSlicesMapsIf/else and forFunctionsMultiple returnsStructsMethodsInterfacesEnumsErrorsDeferC interopPackages

Values and variables

So supports basic Go types and variable declarations:

// so
const n = 100_000
f := 3.14
var r = '本'
var v any = 42
// c
const so_int n = 100000;
double f = 3.14;
so_rune r = U'本';
void* v = &(so_int){42};

byte is translated to so_byte (uint8_t), rune to so_rune (int32_t), and int to so_int (int64_t).

any is not treated as an interface. Instead, it's translated to void*. This makes handling pointers much easier and removes the need for unsafe.Pointer.

nil is translated to NULL (for pointer types).

Strings

Strings are represented as so_String type in C:

// c
typedef struct {
    const char* ptr;
    size_t len;
} so_String;

All standard string operations are supported, including indexing, slicing, and iterating with a for-range loop.

// so
str := "Hi 世界!"
println("str[1] =", str[1])
for i, r := range str {
    println("i =", i, "r =", r)
}
// c
so_String str = so_str("Hi 世界!");
so_println("%s %u", "str[1] =", so_at(so_byte, str, 1));
for (so_int i = 0, _iw = 0; i < so_len(str); i += _iw) {
    _iw = 0;
    so_rune r = so_utf8_decode(str, i, &_iw);
    so_println("%s %" PRId64 " %s %d", "i =", i, "r =", r);
}

Converting a string to a byte slice and back is a zero-copy operation:

// so
s := "1世3"
bs := []byte(s)
s1 := string(bs)
// c
so_String s = so_str("1世3");
so_Slice bs = so_string_bytes(s);   // wraps s.ptr
so_String s1 = so_bytes_string(bs); // wraps bs.ptr

Converting a string to a rune slice and back allocates on the stack with alloca:

// so
s := "1世3"
rs := []rune(s)
s1 := string(rs)
// c
so_String s = so_str("1世3");
so_Slice rs = so_string_runes(s);   // allocates
so_String s1 = so_runes_string(rs); // allocates

There's a so/strings stdlib package for heap-allocated strings and various string operations.

Arrays

Arrays are represented as plain C arrays (T name[N]):

// so
var a [5]int                       // zero-initialized
b := [5]int{1, 2, 3, 4, 5}         // explicit values
c := [...]int{1, 2, 3, 4, 5}       // inferred size
d := [...]int{100, 3: 400, 500}    // designated initializers
// c
so_int a[5] = {0};
so_int b[5] = {1, 2, 3, 4, 5};
so_int c[5] = {1, 2, 3, 4, 5};
so_int d[5] = {100, [3] = 400, 500};

len() on arrays is emitted as compile-time constant.

Slicing an array produces a so_Slice.

Slices

Slices are represented as so_Slice type in C:

// c
typedef struct {
    void* ptr;
    size_t len;
    size_t cap;
} so_Slice;

All standard slice operations are supported, including indexing, slicing, and iterating with a for-range loop.

// so
s1 := []string{"a", "b", "c", "d", "e"}
s2 := s1[1 : len(s1)-1]
for i, v := range s2 {
    println(i, v)
}
// c
so_Slice s1 = (so_Slice){(so_String[5]){
    so_str("a"), so_str("b"), so_str("c"),
    so_str("d"), so_str("e")}, 5, 5};
so_Slice s2 = so_slice(so_String, s1, 1, so_len(s1) - 1);
for (so_int i = 0; i < so_len(s2); i++) {
    so_String v = so_at(so_String, s2, i);
    so_println("%" PRId64 " %.*s", i, v.len, v.ptr);
}

As in Go, a slice is a value type. Unlike in Go, a nil slice and an empty slice are the same thing:

// so
var nils []int = nil
var empty []int = []int{}
// c
so_Slice nils = (so_Slice){0};
so_Slice empty = (so_Slice){0};

make() allocates a fixed amount of memory on the stack (sizeof(T)*cap). append() only works up to the initial capacity and panics if it's exceeded. There's no automatic reallocation; use the so/slices stdlib package for heap allocation and dynamic arrays.

Maps

Maps are fixed-size and stack-allocated, backed by "mask-step-index" hashtables. They are pointer-based reference types, represented as so_Map* in C. No delete, no resize.

// c
typedef struct {
    void* keys;
    void* vals;
    size_t len;
    size_t cap;
} so_Map;

Only use maps when you have a small, fixed number of items (<1024). For anything else, use heap-allocated maps from the so/maps package.

Most of the standard map operations are supported, including getting/setting values and iterating with a for-range loop:

// so
m := map[string]int{"a": 11, "b": 22}
for k, v := range m {
    println(k, v)
}
// c
so_Map* m = &(so_Map){(so_String[2]){
    so_str("a"), so_str("b")},
    (so_int[2]){11, 22}, 2, 2};
for (so_int _i = 0; _i < (so_int)m->len; _i++) {
    so_String k = ((so_String*)m->keys)[_i];
    so_int v = ((so_int*)m->vals)[_i];
    so_println("%.*s %" PRId64, k.len, k.ptr, v);
}

As in Go, a map is a pointer type. A nil map emits as NULL in C.

If/else and for

If-else and for come in all shapes and sizes, just like in Go.

Standard if-else with chaining:

// so
if x > 0 {
    println("positive")
} else if x < 0 {
    println("negative")
} else {
    println("zero")
}
// c
if (x > 0) {
    so_println("%s", "positive");
} else if (x < 0) {
    so_println("%s", "negative");
} else {
    so_println("%s", "zero");
}

Init statement (scoped to the if block):

// so
if num := 9; num < 10 {
    println(num, "has 1 digit")
}
// c
{
    so_int num = 9;
    if (num < 10) {
        so_println("%" PRId64 " %s", num, "has 1 digit");
    }
}

Traditional for loop:

// so
for j := 0; j < 3; j++ {
    println(j)
}
// c
for (so_int j = 0; j < 3; j++) {
    so_println("%" PRId64, j);
}

While-style loop:

// so
i := 1
for i <= 3 {
    println(i)
    i = i + 1
}
// c
so_int i = 1;
for (; i <= 3;) {
    so_println("%" PRId64, i);
    i = i + 1;
}

Range over an integer:

// so
for k := range 3 {
    println(k)
}
// c
for (so_int k = 0; k < 3; k++) {
    so_println("%" PRId64, k);
}

Functions

Regular functions translate to C naturally:

// so
func sumABC(a, b, c int) int {
    return a + b + c
}
// c
static so_int sumABC(so_int a, so_int b, so_int c) {
    return a + b + c;
}

Named function types become typedefs:

// so
type SumFn func(int, int, int) int

fn1 := sumABC           // infer type
var fn2 SumFn = sumABC  // explicit type
s := fn2(7, 8, 9)
// main.h
typedef so_int (*main_SumFn)(so_int, so_int, so_int);

// main.c
main_SumFn fn1 = sumABC;
main_SumFn fn2 = sumABC;
so_int s = fn2(7, 8, 9);

Exported functions (capitalized) become public C symbols prefixed with the package name (package_Func). Unexported functions are static.

Variadic functions use the standard ... syntax and translate to passing a slice:

// so
func sum(nums ...int) int {
    total := 0
    for _, num := range nums {
        total += num
    }
    return total
}

func main() {
    sum(1, 2, 3, 4, 5)
}
// c
static so_int sum(so_Slice nums) {
    so_int total = 0;
    for (so_int _ = 0; _ < so_len(nums); _++) {
        so_int num = so_at(so_int, nums, _);
        total += num;
    }
    return total;
}

int main(void) {
    sum((so_Slice){(so_int[5]){1, 2, 3, 4, 5}, 5, 5});
}

Function literals (anonymous functions and closures) are not supported.

Multiple returns

So supports two-value multiple returns in two patterns: (T, error) and (T1, T2). Both cases translate to signature-specific C types:

// so
func divide(a, b int) (int, error) {
    return a / b, nil
}

func divmod(a, b int) (int, int) {
    return a / b, a % b
}
// c
typedef struct { so_int val; so_Error err; } so_R_int_err;
typedef struct { so_int val; so_int val2; } so_R_int_int;
// c
static so_R_int_err divide(so_int a, so_int b) {
    return (so_R_int_err){.val = a / b, .err = NULL};
}

static so_R_int_int divmod(so_int a, so_int b) {
    return (so_R_int_int){.val = a / b, .val2 = a % b};
}

Named return values are not supported.

Structs

Structs translate to C naturally:

// so
type person struct {
    name string
    age  int
}

bob := person{"Bob", 20}
alice := person{name: "Alice", age: 30}
fred := person{name: "Fred"}
// c
typedef struct person {
    so_String name;
    so_int age;
} person;

person bob = (person){so_str("Bob"), 20};
person alice = (person){.name = so_str("Alice"), .age = 30};
person fred = (person){.name = so_str("Fred")};

new() works with types and values:

// so
n := new(int)                    // *int, zero-initialized
p := new(person)                 // *person, zero-initialized
n2 := new(42)                    // *int with value 42
p2 := new(person{name: "Alice"}) // *person with values
// c
so_int* n = &(so_int){0};
person* p = &(person){0};
so_int* n2 = &(so_int){42};
person* p2 = &(person){.name = so_str("Alice")};

Methods

Methods are defined on struct types with pointer or value receivers:

// so
type Rect struct {
    width, height int
}

func (r *Rect) Area() int {
    return r.width * r.height
}

func (r Rect) resize(x int) Rect {
    r.height *= x
    r.width *= x
    return r
}

Pointer receivers pass void* self in C and cast to the struct pointer. Value receivers pass the struct by value, so modifications operate on a copy:

// c
typedef struct main_Rect {
    so_int width;
    so_int height;
} main_Rect;

so_int main_Rect_Area(void* self) {
    main_Rect* r = (main_Rect*)self;
    return r->width * r->height;
}

static main_Rect main_Rect_resize(main_Rect r, so_int x) {
    r.height *= x;
    r.width *= x;
    return r;
}

Calling methods on values and pointers emits pointers or values as necessary:

// so
r := Rect{width: 10, height: 5}
r.Area()      // called on value (address taken automatically)
r.resize(2)   // called on value (passed by value)

rp := &r
rp.Area()     // called on pointer
rp.resize(2)  // called on pointer (dereferenced automatically)
// c
main_Rect r = (main_Rect){.width = 10, .height = 5};
main_Rect_Area(&r);
main_Rect_resize(r, 2);

main_Rect* rp = &r;
main_Rect_Area(rp);
main_Rect_resize(*rp, 2);

Methods on named primitive types are also supported.

Interfaces

Interfaces in So are like Go interfaces, but they don't include runtime type information.

Interface declarations list the required methods:

// so
type Shape interface {
    Area() int
    Perim(n int) int
}

In C, an interface is a struct with a void* self pointer and function pointers for each method (less efficient than using a static method table, but simpler; this might change in the future):

// c
typedef struct main_Shape {
    void* self;
    so_int (*Area)(void* self);
    so_int (*Perim)(void* self, so_int n);
} main_Shape;

Just as in Go, a concrete type implements an interface by providing the necessary methods:

// so
func (r *Rect) Area() int {
    // ...
}

func (r *Rect) Perim(n int) int {
    // ...
}
// c
so_int main_Rect_Area(void* self) {
    // ...
}

so_int main_Rect_Perim(void* self, so_int n) {
    // ...
}

Passing a concrete type to functions that accept interfaces:

// so
func calcShape(s Shape) int {
    return s.Perim(2) + s.Area()
}

r := Rect{width: 10, height: 5}
calcShape(&r)         // implicit conversion
calcShape(Shape(&r))  // explicit conversion
// c
static so_int calcShape(main_Shape s) {
    return s.Perim(s.self, 2) + s.Area(s.self);
}

main_Rect r = (main_Rect){.width = 10, .height = 5};
calcShape((main_Shape){.self = &r,
    .Area = main_Rect_Area,
    .Perim = main_Rect_Perim});
calcShape((main_Shape){.self = &r,
    .Area = main_Rect_Area,
    .Perim = main_Rect_Perim});

Type assertion works for concrete types (v := iface.(*Type)), but not for interfaces (iface.(Interface)). Type switch is not supported.

Empty interfaces (interface{} and any) are translated to void*.

Enums

So supports typed constant groups as enums:

// so
type ServerState string

const (
    StateIdle      ServerState = "idle"
    StateConnected ServerState = "connected"
    StateError     ServerState = "error"
)

Each constant is emitted as a C const:

// main.h
typedef so_String main_ServerState;

extern const main_ServerState main_StateIdle;
extern const main_ServerState main_StateConnected;
extern const main_ServerState main_StateError;

// main.c
const main_ServerState main_StateIdle = so_str("idle");
const main_ServerState main_StateConnected = so_str("connected");
const main_ServerState main_StateError = so_str("error");

iota is supported for integer-typed constants:

// so
type Day int

const (
    Sunday Day = iota
    Monday
    Tuesday
)

Iota values are evaluated at compile time and translated to integer literals:

// c
typedef so_int main_Day;

const main_Day main_Sunday = 0;
const main_Day main_Monday = 1;
const main_Day main_Tuesday = 2;

Errors

Errors use the so_Error type (a pointer):

// c
struct so_Error_ {
    const char* msg;
};
typedef struct so_Error_* so_Error;

So only supports sentinel errors, which are defined at the package level using errors.New (implemented as compiler built-in):

// so
import "solod.dev/so/errors"

var ErrOutOfTea = errors.New("no more tea available")
// c
#include "so/errors/errors.h"

so_Error main_ErrOutOfTea = errors_New("no more tea available");

Errors are compared using ==. This is an O(1) operation (compares pointers, not strings):

// so
func makeTea(arg int) error {
    if arg == 42 {
        return ErrOutOfTea
    }
    return nil
}

err := makeTea(42)
if err == ErrOutOfTea {
    println("out of tea")
}
// c
static so_Error makeTea(so_int arg) {
    if (arg == 42) {
        return main_ErrOutOfTea;
    }
    return NULL;
}

so_Error err = makeTea(42);
if (err == main_ErrOutOfTea) {
    so_println("%s", "out of tea");
}

Dynamic errors (fmt.Errorf), local error variables (errors.New inside functions), and error wrapping are not supported.

Defer

defer schedules a function or method call to run at the end of the enclosing scope.

The scope can be either a function (as in Go):

// so
func funcScope() {
    xopen(&state)
    defer xclose(&state)
    if state != 1 {
        panic("unexpected state")
    }
}

Or a bare block (unlike Go):

// so
func blockScope() {
    {
        xopen(&state)
        defer xclose(&state)
        if state != 1 {
            panic("unexpected state")
        }
        // xclose(&state) runs here, at block end
    }
    // state is already closed here
}

Deferred calls are emitted inline (before returns, panics, and scope end) in LIFO order:

// c
static void funcScope(void) {
    xopen(&state);
    if (state != 1) {
        xclose(&state);
        so_panic("unexpected state");
    }
    xclose(&state);
}

Defer is not supported inside other scopes like for or if.

C interop

Include a C header file with so:include:

//so:include <stdio.h>

Declare an external C type (excluded from emission) with so:extern:

//so:extern FILE
type os_file struct{}

Declare an external C function:

//so:extern
func fopen(path string, mode string) *os_file { return nil }

When calling extern functions, string and []T arguments are automatically decayed to their C equivalents: string literals become raw C strings ("hello"), string values become char*, and slices become raw pointers. This makes interop cleaner:

// so
f := fopen("/tmp/test.txt", "w")
// c
os_file* f = fopen("/tmp/test.txt", "w");
// not like this:
// fopen(so_str("/tmp/test.txt"), so_str("w"))

The decay behavior can be turned off with the nodecay flag:

//so:extern nodecay
func set_name(acc *Account, name string)

The so/c package includes helpers for converting C pointers back to So string and slice types. The unsafe package is also available and is implemented as compiler built-ins.

Packages

Each Go package is translated into a single .h + .c pair, regardless of how many .go files it contains. Multiple .go files in the same package are merged into one .c file, separated by // -- filename.go -- comments.

Exported symbols (capitalized names) are prefixed with the package name:

// geom/geom.go
package geom

const Pi = 3.14159

func RectArea(width, height float64) float64 {
    return width * height
}

Becomes:

// geom.h
extern const double geom_Pi;
double geom_RectArea(double width, double height);

// geom.c
const double geom_Pi = 3.14159;
double geom_RectArea(double width, double height) { ... }

Unexported symbols (lowercase names) keep their original names and are marked static:

// c
static double rectArea(double width, double height);

Exported symbols are declared in the .h file (with extern for variables). Unexported symbols only appear in the .c file.

Importing a So package translates to a C #include:

// so
import "example/geom"
// c
#include "geom/geom.h"

Calling imported symbols uses the package prefix:

// so
a := geom.RectArea(5, 10)
_ = geom.Pi
// c
double a = geom_RectArea(5, 10);
(void)geom_Pi;

That's it for the language tour!

Compatibility

So generates C11 code that relies on several GCC/Clang extensions:

  • Binary literals (0b1010) in generated code.
  • Statement expressions (({...})) in macros.
  • __attribute__((constructor)) for package-level initialization.
  • __auto_type for local type inference in generated code.
  • __typeof__ for type inference in generic macros.
  • alloca for make() and other dynamic stack allocations.

You can use GCC, Clang, or zig cc to compile the transpiled C code. MSVC is not supported.

Supported operating systems: Linux, macOS, and Windows (core language only).

Design decisions

So is highly opinionated.

Simplicity is key. Fewer features are always better. Every new feature is strongly discouraged by default and should be added only if there are very convincing real-world use cases to support it. This applies to the standard library too — So tries to export as little of Go's stdlib API as possible while still remaining highly useful for real-world use cases.

No heap allocations are allowed in language built-ins (like maps, slices, new, or append). Heap allocations are allowed in the standard library, but they must clearly state when an allocation happens and who owns the allocated data.

Fast and easy C interop. Even though So uses Go syntax, it's basically C with its own standard library. Calling C from So, and So from C, should always be simple to write and run efficiently. The So standard library (translated to C) should be easy to add to any C project.

Readability. There are several languages that claim they can transpile to readable C code. Unfortunately, the C code they generate is usually unreadable or barely readable at best. So isn't perfect in this area either (though it's arguably better than others), but it aims to produce C code that's as readable as possible.

Go compatibility. So code is syntactically valid Go code, with no exceptions. Semantics may differ.

Non-goals:

Raw performance. You can definitely write C code by hand that runs faster than code produced by So. Also, some features in So, like interfaces, are currently implemented in a way that's not very efficient, mainly to keep things simple.

Hiding C entirely. So is a cleaner way to write C, not a replacement for it. You should know C to use So effectively.

Go feature parity. Less is more. Iterators aren't coming, and neither are generic methods.

Frequently asked questions

I have heard these several times, so it's worth answering.

Why not Rust/Zig/Odin/other language?

Because I like C and Go.

Why not TinyGo?

TinyGo is lightweight, but it still has a garbage collector, a runtime, and aims to support all Go features. What I'm after is something even simpler, with no runtime at all, source-level C interop, and eventually, Go's standard library ported to plain C so it can be used in regular C projects.

How does So handle memory?

Everything is stack-allocated by default. There's no garbage collector or reference counting. The standard library provides explicit heap allocation in the so/mem package when you need it.

Is it safe?

So itself has few safeguards other than the default Go type checking. It will panic on out-of-bounds array access, but it won't stop you from returning a dangling pointer or forgetting to free allocated memory.

Most memory-related problems can be caught with AddressSanitizer in modern compilers, so I recommend enabling it during development by adding -fsanitize=address to your CFLAGS.

Is it fast?

Usually on par with Go or faster — see the benchmark link at the end of the post for details.

Can I use So code from C (and vice versa)?

Yes. So compiles to plain C, therefore calling So from C is just calling C from C. Calling C from So is equally straightforward.

Can I compile existing Go packages with So?

Not really. Go uses automatic memory management, while So uses manual memory management. So also supports far fewer features than Go. Neither Go's standard library nor third-party packages will work with So without changes.

How stable is this?

Not for production at the moment.

Where's the standard library?

There is a growing set of high-level packages (so/bytes, so/mem, so/slices, ...). There are also low-level packages that wrap the libc API (so/c/stdlib, so/c/stdio, so/c/cstring, ...). Check the links below for more details.

Final thoughts

Even though So isn't ready for production yet, I encourage you to try it out on a hobby project or just keep an eye on it if you like the concept.

Further reading: Installation and usageLanguage tourStandard librarySo by examplePlaygroundBenchmarksSource code

]]>
Allocators from C to Zighttps://antonz.org/allocators/Thu, 12 Feb 2026 12:00:00 +0000https://antonz.org/allocators/Exploring allocator design in C, C3, Hare, Odin, Rust, and Zig.An allocator is a tool that reserves memory (typically on the heap) so a program can store its data structures there. Many C programs use the standard libc allocator, or at best, let you switch it out for another one like jemalloc or mimalloc.

Unlike C, modern systems languages usually treat allocators as first-class citizens. Let's look at how they handle allocation and then create a C allocator following their approach.

RustZigOdinC3HareCFinal thoughts

Rust

Rust is one of the older languages we'll be looking at, and it handles memory allocation in a more traditional way. Right now, it uses a global allocator, but there's an experimental Allocator API implemented behind a feature flag (issue #32838). We'll set the experimental API aside and focus on the stable one.

Global allocator

The documentation begins with a clear statement:

In a given program, the standard library has one "global" memory allocator that is used for example by Box<T> and Vec<T>.

Followed by a vague one:

Currently the default global allocator is unspecified.

It doesn't mean that a Rust program will abort an allocation, of course. In practice, Rust uses the system allocator as the global default (but the Rust developers don't want to commit to this, hence the "unspecified" note):

  • malloc on Unix platforms;
  • HeapAlloc on Windows;
  • dlmalloc in WASM.

The global allocator interface is defined by the GlobalAlloc trait in the std::alloc module. It requires the implementor to provide two essential methods — alloc and dealloc, and provides two more based on them — alloc_zeroed and realloc:

pub unsafe trait GlobalAlloc {
    // Allocates memory as described by the given `layout`.
    // Returns a pointer to newly-allocated memory,
    // or null to indicate allocation failure.
    unsafe fn alloc(&self, layout: Layout) -> *mut u8;

    // Deallocates the block of memory at the given `ptr`
    // pointer with the given `layout`.
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout);

    // Behaves like `alloc`, but also ensures that the contents
    // are set to zero before being returned.
    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {
        // ...
    }

    // Shrinks or grows a block of memory to the given `new_size` in bytes.
    // The block is described by the given `ptr` pointer and `layout`.
    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
        // ...
    }
}

Layout

The Layout struct describes a piece of memory we want to allocate — its size in bytes and alignment:

pub struct Layout {
    // private fields
    size: usize,
    align: Alignment,
}

Memory alignment

Alignment restricts where a piece of data can start in memory. The memory address for the data has to be a multiple of a certain number, which is always a power of 2.

Alignment depends on the type of data:

  • u8: alignment = 1. Can start at any address (0, 1, 2, 3...).
  • i32: alignment = 4. Must start at addresses divisible by 4 (0, 4, 8, 12...).
  • f64: alignment = 8. Must start at addresses divisible by 8 (0, 8, 16...).

CPUs are designed to read "aligned" memory efficiently. For example, if you read a 4-byte integer starting at address 0x03 (which is unaligned), the CPU has to do two memory reads — one for the first byte and another for the other three bytes — and then combine them. But if the integer starts at address 0x04 (which is aligned), the CPU can read all four bytes at once.

Aligned memory is also needed for vectorized CPU operations (SIMD), where one processor instruction handles a group of values at once instead of just one.

The compiler knows the size and alignment for each type, so we can use the Layout constructor or helper functions to create a valid layout:

use std::alloc::Layout;

// 64-bit integer.
let i64_layout = Layout::new::<i64>();
println!("{:?}", i64_layout);

// Ten 32-bit integers.
let array_layout = Layout::array::<i32>(10).unwrap();
println!("{:?}", array_layout);

// Custom structure.
struct Cat {
    name: String,
    is_grumpy: bool,
}

let cat_layout = Layout::new::<Cat>();
println!("{:?}", cat_layout);

// Layout from a value.
let fluffy = Cat {
    name: String::from("Fluffy"),
    is_grumpy: true,
};

let fluffy_layout = Layout::for_value(&fluffy);
println!("{:?}", fluffy_layout);
Layout { size: 8, align: 8 (1 << 3) }
Layout { size: 40, align: 4 (1 << 2) }
Layout { size: 32, align: 8 (1 << 3) }
Layout { size: 32, align: 8 (1 << 3) }

Don't be surprised that a Cat takes up 32 bytes. In Rust, the String type can grow, so it stores a data pointer, a length, and a capacity (3 × 8 = 24 bytes). There's also 1 byte for the boolean and 7 bytes of padding (because of 8-byte alignment), making a total of 32 bytes.

System allocator

System is the default memory allocator provided by the operating system. The exact implementation depends on the platform. It implements the GlobalAlloc trait and is used as the global allocator by default, but the documentation does not guarantee this (remember the "unspecified" note?). If you want to explicitly set System as the global allocator, you can use the #[global_allocator] attribute:

use std::alloc::System;

#[global_allocator]
static GLOBAL: System = System;

fn main() {
    // ...
}

You can also set a custom allocator as global, like jemalloc in this example:

use jemallocator::Jemalloc;

#[global_allocator]
static GLOBAL: Jemalloc = Jemalloc;

fn main() {}

Allocation helpers

To use the global allocator directly, call the alloc and dealloc functions:

use std::alloc::{alloc, dealloc, Layout};

unsafe {
    let layout = Layout::new::<u16>();
    let ptr = alloc(layout); // no OOM check for now
    dealloc(ptr, layout);
}
ok

In practice, people rarely use alloc or dealloc directly. Instead, they work with types like Box, String or Vec that handle allocation for them:

let num = Box::new(42); // allocates
println!("{:?}", num);

let mut vec = Vec::new();
vec.push(1); // allocates
vec.push(2);
println!("{:?}", vec);

// num and vec automatically deallocate
// when they go out of scope.
42
[1, 2]

Error handling

The System allocator doesn't abort if it can't allocate memory; instead, it returns null (which is exactly what GlobalAlloc recommends):

use std::alloc::{alloc, dealloc, handle_alloc_error, Layout};

unsafe {
    // Attempt to allocate a ton of memory.
    let layout = Layout::array::<u8>(usize::MAX / 2).unwrap();
    let ptr = alloc(layout);

    if ptr.is_null() {
        println!("Out of memory!");
        // Uncomment to abort.
        // handle_alloc_error(layout);
    } else {
        println!("Allocation succeeded.");
        dealloc(ptr, layout);
    }
}
Out of memory!

The documentation recommends using the handle_alloc_error function to signal out-of-memory errors. It immediately aborts the process, or panics if the binary isn't linked to the standard library.

Unlike the low-level alloc function, types like Box or Vec call handle_alloc_error if allocation fails, so the program usually aborts if it runs out of memory:

let v: Vec<u8> = Vec::with_capacity(usize::MAX/2);
println!("{}", v.len());
memory allocation of 9223372036854775807 bytes failed (exit status 139)

Further reading

Allocator APIMemory allocation APIs

Zig

Memory management in Zig is explicit. There is no default global allocator, and any function that needs to allocate memory accepts an allocator as a separate parameter. This makes the code a bit more verbose, but it matches Zig's goal of giving programmers as much control and transparency as possible.

Allocator interface

An allocator in Zig is a std.mem.Allocator struct with an opaque self-pointer and a method table with four methods:

const Allocator = @This();

ptr: *anyopaque,
vtable: *const VTable,

pub const VTable = struct {
    /// Return a pointer to `len` bytes with specified `alignment`,
    /// or return `null` indicating the allocation failed.
    alloc: *const fn (*anyopaque, len: usize, alignment: Alignment,
                      ret_addr: usize) ?[*]u8,

    /// Attempt to expand or shrink memory in place.
    resize: *const fn (*anyopaque, memory: []u8, alignment: Alignment,
                       new_len: usize, ret_addr: usize) bool,

    /// Attempt to expand or shrink memory, allowing relocation.
    remap: *const fn (*anyopaque, memory: []u8, alignment: Alignment,
                      new_len: usize, ret_addr: usize) ?[*]u8,

    /// Free and invalidate a region of memory.
    free: *const fn (*anyopaque, memory: []u8, alignment: Alignment,
                     ret_addr: usize) void,
};

Unlike Rust's allocator methods, which take a raw pointer and a size as arguments, Zig's allocator methods take a slice of bytes ([]u8) — a type that combines both a pointer and a length.

Another interesting difference is the optional ret_addr parameter, which is the first return address in the allocation call stack. Some allocators, like the DebugAllocator, use it to keep track of which function requested memory. This helps with debugging issues related to memory allocation.

Just like in Rust, allocator methods don't return errors. Instead, alloc and remap return null if they fail.

Allocation helpers

Zig also provides type-safe wrappers that you can use instead of calling the allocator methods directly:

// Allocate / deallocate a single object.
pub fn create(a: Allocator, comptime T: type) Error!*T
pub fn destroy(self: Allocator, ptr: anytype) void

// Allocate / deallocate multiple objects.
pub fn alloc(self: Allocator, comptime T: type, n: usize) Error![]T
pub fn free(self: Allocator, memory: anytype) void

Example:

const allocator = std.heap.page_allocator;

// Create and destroy a single integer.
const num = try allocator.create(i32);
num.* = 42;
allocator.destroy(num);

// Allocate and free a slice of 10 bytes.
const slice = try allocator.alloc(u8, 100);
@memset(slice, 'A');
allocator.free(slice);
ok

Unlike the allocator methods, these allocation functions return an error if they fail.

If a function or method allocates memory, it expects the developer to provide an allocator instance:

const allocator = std.heap.page_allocator;

var list: std.ArrayList(u8) = .empty;
defer list.deinit(allocator);

try list.append(allocator, 'z');
try list.append(allocator, 'i');
try list.append(allocator, 'g');
ok

Standard allocators

Zig's standard library includes several built-in allocators in the std.heap namespace.

page_allocator asks the operating system for entire pages of memory, each allocation is a syscall:

const allocator = std.heap.page_allocator;
const memory = try allocator.alloc(u8, 100);
allocator.free(memory);
ok

FixedBufferAllocator allocates memory into a fixed buffer and doesn't make any heap allocations:

var buffer: [1000]u8 = undefined;
var fba: std.heap.FixedBufferAllocator = .init(&buffer);
const allocator = fba.allocator();

const memory = try allocator.alloc(u8, 100);
allocator.free(memory);
ok

ArenaAllocator wraps a child allocator and allows you to allocate many times and only free once:

var arena: std.heap.ArenaAllocator = .init(std.heap.page_allocator);
defer arena.deinit();

const allocator = arena.allocator();

const mem1 = try allocator.alloc(u8, 100);
const mem2 = try allocator.alloc(u8, 100);
allocator.free(mem1); // not needed
allocator.free(mem2); // not needed
ok

The arena.deinit() call frees all memory. Individual allocator.free() calls are no-ops.

DebugAllocator (aka GeneralPurposeAllocator) is a safe allocator that can prevent double-free, use-after-free and can detect leaks:

var gpa: std.heap.DebugAllocator(.{}) = .init;
const allocator = gpa.allocator();

const memory = try allocator.alloc(u8, 100);
allocator.free(memory);
allocator.free(memory); // aborts

SmpAllocator is a general-purpose thread-safe allocator designed for maximum performance on multithreaded machines:

const allocator = std.heap.smp_allocator;
const memory = try allocator.alloc(u8, 100);
allocator.free(memory);
ok

c_allocator is a wrapper around the libc allocator:

const allocator = std.heap.c_allocator; // requires linking libc
const memory = try allocator.alloc(u8, 100);
allocator.free(memory);

Error handling

Zig doesn't panic or abort when it can't allocate memory. An allocation failure is just a regular error that you're expected to handle:

const allocator = std.heap.page_allocator;
const n = std.math.maxInt(i64);
const memory = allocator.alloc(u8, n) catch |err| {
    if (err == error.OutOfMemory) {
        print("Out of memory!\n", .{});
    }
    return err;
};
defer allocator.free(memory);
Out of memory!

Further reading

Allocatorsstd.mem.Allocatorstd.heap

Odin

Odin supports explicit allocators, but, unlike Zig, it's not the only option. In Odin, every scope has an implicit context variable that provides a default allocator:

Context :: struct {
	allocator:          Allocator,
	temp_allocator:     Allocator,
	// ...
}

// Returns the default `context` for each scope
@(require_results)
default_context :: proc "contextless" () -> Context {
	c: Context
	__init_context(&c)
	return c
}

If you don't pass an allocator to a function, it uses the one currently set in the context.

Allocator interface

An allocator in Odin is a runtime.Allocator struct with an opaque self-pointer and a single function pointer:

Allocator_Mode :: enum byte {
	Alloc,
	Free,
	Resize,
	// ...
}

Allocator_Error :: enum byte {
	None                 = 0,
	Out_Of_Memory        = 1,
	// ...
}

Allocator_Proc :: #type proc(
    allocator_data: rawptr,
    mode: Allocator_Mode,
    size, alignment: int,
    old_memory: rawptr,
    old_size: int,
    location: Source_Code_Location = #caller_location,
) -> ([]byte, Allocator_Error)

Allocator :: struct {
	procedure: Allocator_Proc,
	data:      rawptr,
}

Unlike other languages, Odin's allocator uses a single procedure for all allocation tasks. The specific action — like allocating, resizing, or freeing memory — is decided by the mode parameter.

The allocation procedure returns the allocated memory (for .Alloc and .Resize operations) and an error (.None on success).

Allocation helpers

Odin provides low-level wrapper functions in the core:mem package that call the allocator procedure using a specific mode:

alloc :: proc(
    size: int,
    alignment: int = DEFAULT_ALIGNMENT,
    allocator := context.allocator,
    loc := #caller_location,
) -> (rawptr, runtime.Allocator_Error)

free :: proc(
    ptr: rawptr,
    allocator := context.allocator,
    loc := #caller_location,
) -> runtime.Allocator_Error

// and others

There are also type-safe builtins like new/free (for a single object) and make/delete (for multiple objects) that you can use instead of the low-level interface:

num := new(int)
defer free(num)

slice := make([]int, 100)
defer delete(slice)
ok

By default, all builtins use the context allocator, but you can pass a custom allocator as an optional parameter:

ptr := new(int, allocator=context.allocator)
defer free(ptr, allocator=context.allocator)

slice := make([]int, 10, allocator=context.allocator)
defer delete(slice, allocator=context.allocator)
ok

To use a different allocator for a specific block of code, you can reassign it in the context:

alloc := custom_allocator()
context.allocator = alloc

// Uses the custom allocator.
ptr := new(int)
defer free(ptr)

Temp allocator

Odin's context provides two different allocators:

  • context.allocator is for general-purpose allocations. It uses the operating system's heap allocator.
  • context.temp_allocator is for short-lived allocations. It uses a scratch allocator (a kind of growing arena).
// Temporary allocation (no manual free required).
temp_mem, _ := mem.alloc(100, allocator=context.temp_allocator)

// Persistent allocation (requires manual free).
perm_mem, _ := mem.alloc(100, allocator=context.allocator)
defer mem.free(perm_mem, context.allocator)

// Clear the entire scratchpad at the end of the work cycle.
free_all(context.temp_allocator)
ok

When using the temp allocator, you only need a single free_all call to clear all the allocated memory.

Standard allocators

Odin's standard library includes several allocators, found in the base:runtime and core:mem packages.

The heap_allocator procedure returns a general-purpose allocator:

allocator := runtime.heap_allocator()
memory, err := mem.alloc(100, allocator=allocator)
mem.free(memory, allocator=allocator)
ok

Arena uses a single backing buffer for allocations, allowing you to allocate many times and only free once:

arena: mem.Arena
buffer := make([]byte, 1024, runtime.heap_allocator())
mem.arena_init(&arena, buffer)
defer mem.arena_free_all(&arena)

allocator := mem.arena_allocator(&arena)
m1, err1 := mem.alloc(100, allocator=allocator)
m2, err2 := mem.alloc(100, allocator=allocator)
ok

Tracking_Allocator detects leaks and invalid memory access, similar to DebugAllocator in Zig:

track: mem.Tracking_Allocator
mem.tracking_allocator_init(&track, runtime.default_allocator())
defer mem.tracking_allocator_destroy(&track)

allocator := mem.tracking_allocator(&track)
memory, err := mem.alloc(100, allocator=allocator)
free(memory, allocator=allocator)
free(memory, allocator=allocator) // aborts
Tracking allocator error: Bad free of pointer 139851252672688 (exit status 132)

There are also others, such as Stack or Buddy_Allocator.

Error handling

Like Zig, Odin doesn't panic or abort when it can't allocate memory. Instead, it returns an error code as the second return value:

data, err := mem.alloc(1 << 62)
if err != .None {
    fmt.println("Allocation failed:", err)
    return
}
defer mem.free(data)
Allocation failed: Out_Of_Memory

Further reading

Allocatorsbase:runtimecore:mem

C3

Like Zig and Odin, C3 supports explicit allocators. Like Odin, C3 provides two default allocators: heap and temp.

Allocator interface

An allocator in C3 is a core::mem::allocator::Allocator interface with an additional option of zeroing or not zeroing the allocated memory:

enum AllocInitType
{
	NO_ZERO,
	ZERO
}

interface Allocator
{
	<*
	 Acquire memory from the allocator, with the given
     alignment and initialization type.
	*>
	fn void*? acquire(usz size, AllocInitType init_type, usz alignment = 0);

	<*
	 Resize acquired memory from the allocator,
     with the given new size and alignment.
	*>
	fn void*? resize(void* ptr, usz new_size, usz alignment = 0);

	<*
	 Release memory acquired using `acquire` or `resize`.
	*>
	fn void release(void* ptr, bool aligned);
}

Unlike Zig and Odin, the resize and release methods don't take the (old) size as a parameter — neither directly like Odin nor through a slice like Zig. This makes it a bit harder to create custom allocators because the allocator has to keep track of the size along with the allocated memory. On the other hand, this approach makes C interop easier (if you use the default C3 allocator): data allocated in C can be freed in C3 without needing to pass the size parameter from the C code.

Like in Odin, allocator methods return an error if they fail.

Allocation helpers

C3 provides low-level wrapper macros in the core::mem::allocator module that call allocator methods:

macro void* malloc(Allocator allocator, usz size)
macro void*? malloc_try(Allocator allocator, usz size)

macro void* realloc(Allocator allocator, void* ptr, usz new_size)
macro void*? realloc_try(Allocator allocator, void* ptr, usz new_size)

macro void free(Allocator allocator, void* ptr)

// and others

These either return an error (the _try-suffix macros) or abort if they fail.

Example:

// `mem` is the global allocator instance.
int* ptr = allocator::malloc(mem, int.sizeof);
defer allocator::free(mem, ptr);
ok

There are also functions and macros with similar names in the core::mem module that use the global allocator::mem allocator instance:

// Call the core::mem::allocator macros directly.
fn void* malloc(usz size)
fn void free(void* ptr)

// Accept a type instead of a size.
macro new($Type, #init = ...)
macro alloc($Type)

// Allocate multiple objects.
macro new_array($Type, usz elements)
macro alloc_array($Type, usz elements)

// and others

Example:

// `malloc` and `free` are builtins,
// so they don't require the namespace.
int* num = malloc(int.sizeof);
defer free(num);

// `new_array` requires the namespace.
int[] slice = mem::new_array(int, 100);
defer free(slice);
ok

If a function or method allocates memory, it often expects the developer to provide an allocator instance:

List{int} list;
list.init(mem); // use the heap allocator
defer list.free();

list.push(11);
list.push(22);
list.push(33);
ok

Temp allocator

C3 provides two thread-local allocator instances:

  • allocator::mem is for general-purpose allocations. It uses a operating system's heap allocator (typically a libc wrapper).
  • allocator::tmem is for short-lived allocations. It uses an arena allocator.

There are functions and macros in the core::mem module that use the allocator::tmem temporary allocator:

// Calls the core::mem::allocator macro directly.
fn void* tmalloc(usz size, usz alignment = 0)

// Accept a type instead of a size.
macro tnew($Type, #init = ...)
macro talloc($Type)

// Allocate multiple objects.
macro talloc_array($Type, usz elements)

To @pool macro releases all temporary allocations when leaving the scope:

@pool()
{
    int* p1 = tmalloc(int.sizeof);
    int* p2 = tmalloc(int.sizeof);
    int* p3 = tmalloc(int.sizeof);
    // no manual free required
};  // p1, p2, p3 are freed here
ok

Some types, like List or DString, use the temp allocator by default if they are not initialized:

@pool()
{
    List{int} list;
    list.push(11);  // implicitly initialize with the temp allocator
    list.push(22);

    DString str;
    str.appendf("Hello %s", "World");  // same
};
ok

Standard allocators

C3's standard library includes several built-in allocators, found in the core::mem::allocator module.

LibcAllocator is a wrapper around libc's malloc/free:

LibcAllocator libc;
char* memory = allocator::malloc(&libc, 100*char.sizeof);
allocator::free(&libc, memory);
ok

ArenaAllocator uses a single backing buffer for allocations, allowing you to allocate many times and only free once:

char[1024] buf;
ArenaAllocator* arena = allocator::wrap(&buf);
defer arena.clear();

char* m1 = allocator::malloc(arena, 100*char.sizeof);
char* m2 = allocator::malloc(arena, 100*char.sizeof);
ok

TrackingAllocator detects leaks and invalid memory access:

TrackingAllocator track;
track.init(mem);
defer track.clear();

char* memory = allocator::malloc(&track, 100*char.sizeof);
allocator::free(&track, memory);
allocator::free(&track, memory); // aborts
ERROR: 'Attempt to release untracked pointer 0x55f5b0333330, this is likely a bug.'

There are also others, such as BackedArenaAllocator or OnStackAllocator.

Error handling

Like Zig and Odin, C3 can return an error in case of allocation failure:

void*? data = allocator::malloc_try(mem, 1uLL << 62);
if (catch err = data) {
    io::printfn("Allocation failed: %s", err);
    return;
};
defer mem::free(data);
Allocation failed: mem::OUT_OF_MEMORY

C3 can also abort in case of allocation failure:

void* data = allocator::malloc(mem, 1uLL << 62);
// void* data = malloc(1uLL << 62); // same thing
defer free(data);
ERROR: 'Unexpected fault 'mem::OUT_OF_MEMORY' was unwrapped!'

Since the functions and macros in the core::mem module use allocator::malloc instead of allocator::malloc_try, it looks like aborting on failure is the preferred approach.

Further reading

Memory Handlingcore::mem::alocatorcore::mem

Hare

Unlike other languages, Hare doesn't support explicit allocators. The standard library has multiple allocator implementations, but only one of them is used at runtime.

Global allocator

Hare's compiler expects the runtime to provide malloc and free implementations:

fn malloc(n: size) nullable *opaque;
@symbol("rt.free") fn free_(_p: nullable *opaque) void;

The programmer isn't supposed to access them directly (although it's possible by importing rt and calling rt::malloc or rt::free). Instead, Hare uses them to provide higher-level allocation helpers.

Allocation helpers

Hare offers two high-level allocation helpers that use the global allocator internally: alloc and free.

alloc can allocate individual objects. It takes a value, not a type:

let n: *int = alloc(42)!;
defer free(n);

let s: *str = alloc("hello world")!;
defer free(s);

// coords is defined as struct { x: int, y: int }
let p: *coords = alloc(coords{x=3, y=5})!;
defer free(p);
ok

alloc can also allocate slices if you provide a second parameter (the number of items):

// Allocate a slice of 100 integers.
let nums: []int = alloc([0...], 100)!;
defer free(nums);
ok

free works correctly with both pointers to single objects (like *int) and slices (like []int).

Standard allocators

Hare's standard library has three built-in memory allocators:

  • The default allocator is based on the algorithm from the Verified sequential malloc/free paper.
  • The libc allocator uses the operating system's malloc and free functions from libc.
  • The debug allocator uses a simple mmap-based method for memory allocation.

The allocator that's actually used is selected at compile time.

Error handling

Like other languages, Hare returns an error in case of allocation failure:

match (alloc([0...], 1 << 62)) {
case let nums: []int =>
    defer free(nums);
case nomem =>
    fmt::println("Out of memory")!;
};
Out of memory

You can abort on error with !:

let nums: []int = alloc([0...], 1 << 62)!;
defer free(nums);
Aborted (core dumped) (exit status 134)

Or propagate the error with ?:

let nums: []int = alloc([0...], 1 << 62)?;
defer free(nums);

Further reading

Dynamic memory allocationmalloc.ha

C

Many C programs use the standard libc allocator, or at most, let you swap it out for another one using macros:

#define LIB_MALLOC malloc
#define LIB_FREE free

Or using a simple setter:

static void *(*_lib_malloc)(size_t);
static void (*_lib_free)(void*);

void lib_set_allocator(void *(*malloc)(size_t), void (*free)(void*)) {
    _lib_malloc = malloc;
    _lib_free = free;
}

While this might work for switching the libc allocator to jemalloc or mimalloc, it's not very flexible. For example, trying to implement an arena allocator with this kind of API is almost impossible.

Now that we've seen the modern allocator design in Zig, Odin, and C3 — let's try building something similar in C. There are a lot of small choices to make, and I'm going with what I personally prefer. I'm not saying this is the only way to design an allocator — it's just one way out of many.

Allocator interface

Our allocator should return an error instead of NULL if it fails, so we'll need an error enum:

// Allocation errors.
typedef enum {
    Error_None = 0,
    Error_OutOfMemory,
    Error_SizeOverflow,
} Error;

The allocation function needs to return either a tagged union (value | error) or a tuple (value, error). Since C doesn't have these built in, let's use a custom tuple type:

// Allocation result.
typedef struct {
    void* ptr;
    Error err;
} AllocResult;

The next step is the allocator interface. I think Odin's approach of using a single function makes the implementation more complicated than it needs to be, so let's create separate methods like Zig does:

// Allocator interface.
struct _Allocator {
    AllocResult (*alloc)(void* self, size_t size, size_t align);
    AllocResult (*realloc)(void* self, void* ptr, size_t oldSize,
                           size_t newSize, size_t align);
    void (*free)(void* self, void* ptr, size_t size, size_t align);
};

typedef struct {
    const struct _Allocator* m;
    void* self;
} Allocator;

This approach to interface design is explained in detail in a separate post: Interfaces in C.

Zig uses byte slices ([]u8) instead of raw memory pointers. We could make our own byte slice type, but I don't see any real advantage to doing that in C — it would just mean more type casting. So let's keep it simple and stick with void* like our ancestors did.

Allocation helpers

Now let's create generic Alloc and Free wrappers:

// Allocates an item of type T.
// `AllocResult Alloc[T](Allocator a, T)`
#define Alloc(a, T) \
    ((a).m->alloc((a).self, sizeof(T), alignof(T)))

// Frees an item allocated with Alloc.
// Only accepts typed pointers, not void*.
// `void Free[T](Allocator a, T* ptr)`
#define Free(a, ptr) \
    ((a).m->free((a).self, (ptr), sizeof(*(ptr)), alignof(typeof(*(ptr)))))

I'm taking typeof for granted here to keep things simple. A more robust implementation should properly check if it is available or pass the type to Free directly.

We can even create a separate pair of helpers for collections:

// Helper to prevent integer overflow during N-item allocation.
static inline size_t calcSize(size_t size, size_t count) {
    if (count > 0 && size > SIZE_MAX / count) {
        return 0;
    }
    return size * count;
}

// Allocates n items of type T.
// `AllocResult AllocN[T](Allocator a, T, size_t n)`
#define AllocN(a, T, n) \
    ((a).m->alloc((a).self, calcSize(sizeof(T), (n)), alignof(T)))

// Frees n items allocated with AllocN.
// Only accepts typed pointers, not void*.
// `void FreeN[T](Allocator a, T* ptr, size_t n)`
#define FreeN(a, ptr, n)               \
    ((a).m->free(                      \
        (a).self, (ptr),               \
        calcSize(sizeof(*(ptr)), (n)), \
        alignof(typeof(*(ptr)))))

We could use some __VA_ARGS__ macro tricks to make Alloc and Free work for both a single object and a collection. But let's not do that — I prefer to avoid heavy-magic macros in this post.

Libc allocator

As for the custom allocators, let's start with a libc wrapper. It's not particularly interesting, since it ignores most of the parameters, but still:

// The libc allocator wrapper.
// Ignores alignment and treats zero-size allocations as errors.
// Doesn't support reallocation to keep things simple.
AllocResult Libc_Alloc(void* self, size_t size, size_t align) {
    (void)self;
    (void)align;

    if (size == 0) return (AllocResult){NULL, Error_SizeOverflow};
    void* ptr = malloc(size);
    if (!ptr) return (AllocResult){NULL, Error_OutOfMemory};
    return (AllocResult){ptr, Error_None};
}

void Libc_Free(void* self, void* ptr, size_t size, size_t align) {
    (void)self;
    (void)size;
    (void)align;
    free(ptr);
}

Allocator LibcAllocator(void) {
    static const struct _Allocator mtab = {
        .alloc = Libc_Alloc,
        .free = Libc_Free,
    };
    return (Allocator){.m = &mtab, .self = NULL};
}

Usage example:

int main(void) {
    Allocator allocator = LibcAllocator();

    {
        // Allocate a single integer.
        AllocResult res = Alloc(allocator, int64_t);
        if (res.err != Error_None) {
            printf("Error: %d\n", res.err);
            return 1;
        }

        int64_t* x = res.ptr;
        *x = 42;

        Free(allocator, x);
    }

    {
        // Allocate an array of integers.
        size_t n = 100;
        AllocResult res = AllocN(allocator, int64_t, n);
        if (res.err != Error_None) {
            printf("Error: %d\n", res.err);
            return 1;
        }

        int64_t* arr = res.ptr;
        for (size_t i = 0; i < n; i++) {
            arr[i] = i + 1;
        }

        FreeN(allocator, arr, n);
    }
}
ok

Arena allocator

Now let's use that self field to implement an arena allocator backed by a fixed-size buffer:

// A simple arena allocator.
// Doesn't support reallocation.
typedef struct {
    uint8_t* buf;
    size_t cap;
    size_t offset;
} Arena;

Arena NewArena(uint8_t* buf, size_t cap) {
    return (Arena){.buf = buf, .cap = cap, .offset = 0};
}

static AllocResult Arena_Alloc(void* self, size_t size, size_t align) {
    Arena* arena = (Arena*)self;

    // 1. Calculate the alignment padding.
    if (size == 0) return (AllocResult){NULL, Error_SizeOverflow};
    uintptr_t currentPtr = (uintptr_t)arena->buf + arena->offset;
    uintptr_t alignedPtr = (currentPtr + (align - 1)) & ~(align - 1);
    size_t newOffset = (alignedPtr - (uintptr_t)arena->buf) + size;

    // 2. Check for errors.
    if (newOffset < arena->offset) {
        return (AllocResult){NULL, Error_SizeOverflow};
    }
    if (newOffset > arena->cap) {
        return (AllocResult){NULL, Error_OutOfMemory};
    }

    // 3. Commit the allocation.
    arena->offset = newOffset;
    return (AllocResult){(void*)alignedPtr, Error_None};
}

static void Arena_Free(void* self, void* ptr, size_t size, size_t align) {
    // Individual deallocations are no-ops.
    (void)self;
    (void)ptr;
    (void)size;
    (void)align;
}

static void Arena_Reset(Arena* arena) {
    arena->offset = 0;
}

Allocator Arena_Allocator(Arena* arena) {
    static const struct _Allocator mtab = {
        .alloc = Arena_Alloc,
        .free = Arena_Free,
    };
    return (Allocator){.m = &mtab, .self = arena};
}

Usage example:

int main(void) {
    uint8_t buf[1024];
    Arena arena = NewArena(buf, sizeof(buf));
    Allocator allocator = Arena_Allocator(&arena);

    {
        // Allocate a single integer.
        AllocResult res = Alloc(allocator, int64_t);
        if (res.err != Error_None) {
            printf("Error: %d\n", res.err);
            return 1;
        }

        int64_t* x = res.ptr;
        *x = 42;

        // No need for Free.
    }

    {
        // Allocate an array of integers.
        size_t n = 100;
        AllocResult res = AllocN(allocator, int64_t, n);
        if (res.err != Error_None) {
            printf("Error: %d\n", res.err);
            return 1;
        }

        int64_t* arr = res.ptr;
        for (size_t i = 0; i < n; i++) {
            arr[i] = i + 1;
        }

        // No need for FreeN.
    }

    Arena_Reset(&arena);
}
ok

Nice!

Error handling

As shown in the examples above, the allocation method returns an error if something goes wrong. While checking for errors might not be as convenient as it is in Zig or Odin, it's still pretty straightforward:

int main(void) {
    Allocator allocator = LibcAllocator();

    size_t n = SIZE_MAX;
    AllocResult res = AllocN(allocator, int64_t, n);
    if (res.err != Error_None) {
        printf("Allocation failed: %d\n", res.err);
        return 1;
    }

    FreeN(allocator, res.ptr, n);
}
Allocation failed: 2 (exit status 1)

source

Final thoughts

Here's an informal table comparing allocation APIs in the languages we've discussed:

          Single object   Collection
        ┌──────────────────────────────────────────┐
Rust    │ Box::new(42)    vec![0; 100]             │
        │                                          │
Zig     │ a.create(i32)   a.alloc(i32, 100)        │
        │                                          │
Odin    │ new(int)        make([]int, 100)         │
        │ new(int, a)     make([]int, 100, a)      │
        │                                          │
C3      │ mem::new(int)   mem::new_array(int, 100) │
        │                                          │
Hare    │ alloc(42)       alloc([0...], 100)       │
        │                                          │
C       │ Alloc(a, int)   AllocN(a, int, 100)      │
        └──────────────────────────────────────────┘

In Zig, you always have to specify the allocator. In Odin, passing an allocator is optional. In C3, some functions require you to pass an allocator, while others just use the global one. In Hare, there's a single global allocator.

As we've seen, there's nothing magical about the allocators used in modern languages. While they're definitely more ergonomic and safe than C, there's nothing stopping us from using the same techniques in plain C.

]]>
(Un)portable defer in Chttps://antonz.org/defer-in-c/Thu, 05 Feb 2026 12:00:00 +0000https://antonz.org/defer-in-c/Eight ways to implement defer in C.Modern system programming languages, from Hare to Zig, seem to agree that defer is a must-have feature. It's hard to argue with that, because defer makes it much easier to free memory and other resources correctly, which is crucial in languages without garbage collection.

The situation in C is different. There was a N2895 proposal by Jens Gustedt and Robert Seacord in 2021, but it was not accepted for C23. Now, there's another N3734 proposal by JeanHeyd Meneide, which will probably be accepted in the next standard version.

Since defer isn't part of the standard, people have created lots of different implementations. Let's take a quick look at them and see if we can find the best one.

C23/GCC • C11/GCC • GCC/Clang • MSVC • Long jump • For loop • Stack • Simplified GCC/Clang • Final thoughts

C23/GCC

Jens Gustedt offers this brief version:

#define defer __DEFER(__COUNTER__)
#define __DEFER(N) __DEFER_(N)
#define __DEFER_(N) __DEFER__(__DEFER_FUNCTION_##N, __DEFER_VARIABLE_##N)

#define __DEFER__(F, V)        \
    auto void F(int*);         \
    [[gnu::cleanup(F)]] int V; \
    auto void F(int*)

Usage example:

void loud_free(void* p) {
    printf("freeing %p\n", p);
    free(p);
}

int main(void) {
    int* p = malloc(sizeof(int));
    if (!p) return 1;
    defer { loud_free(p); }

    *p = 42;
    printf("p = %d\n", *p);
}
p = 42
freeing 0x127e05b30

This approach combines C23 attribute syntax ([[attribute]]) with GCC-specific features: nested functions (auto void F(int*)) and the cleanup attribute. It also uses the non-standard __COUNTER__ macro (supported by GCC, Clang, and MSVC), which expands to an automatically increasing integer value.

Nested functions and cleanup in GCC

A nested function (also known as a local function) is a function defined inside another function:

void outer() {
    int x = 10;

    void inner() {
        x += 10;
    }

    inner();
}

Nested functions can access variables from the enclosing scope, similar to closures in other languages, but they are not first-class citizens and cannot be passed around like function pointers.

The cleanup attribute runs a function when the variable goes out of scope:

void safe_free(int **ptr) {
    if (!ptr || !*ptr) return;
    free(*ptr);
}

int main(void) {
    __attribute__((cleanup(safe_free))) int *p = malloc(sizeof(int));
    if (!p) return 1;
    *p = 42;

    // safe_free(&p) will be called automatically
    // when p goes out of scope.
}

The function should take one parameter, which is a pointer to a type that's compatible with the variable. If the function returns a value, it will be ignored.

On the plus side, this version works just like you'd expect defer to work. On the downside, it's only available in C23+ and only works with GCC (not even Clang supports it, because of the nested function).

Another downside is that using nested functions requires an executable stack, which security experts strongly discourage.

Executable stack vulnerability

When we use nested functions in GCC, the compiler often creates trampolines (small pieces of machine code) on the stack at runtime. These trampolines let the nested function access variables from the parent function's scope. For the CPU to run these code fragments, the stack's memory pages need to be marked as executable.

An executable stack is a serious security risk because it makes buffer overflow attacks much easier. In these attacks, a hacker sends more data than a program can handle, which overwrites the stack with harmful "shellcode". If the stack non-executable (which is the standard today), the CPU won't run that code and the program will just crash. But since our defer macro makes the stack executable, an attacker can jump straight to their injected code and run it, giving them complete control over the process.

C11/GCC

We can easily adapt the above version to use C11:

#define defer _DEFER(__COUNTER__)
#define _DEFER(N) __DEFER(N)
#define __DEFER(N) ___DEFER(__DEFER_FUNC_##N, __DEFER_VAR_##N)

#define ___DEFER(F, V)                                         \
    auto void F(void*);                                        \
    __attribute__((cleanup(F))) int V __attribute__((unused)); \
    auto void F(void* _dummy_ptr)

Usage example:

int main(void) {
    int* p = malloc(sizeof(int));
    if (!p) return 1;
    defer { loud_free(p); }

    *p = 42;
    printf("p = %d\n", *p);
}
p = 42
freeing 0x127e05b30

The main downside remains: it's GCC-only.

GCC/Clang

Clang fully supports the cleanup attribute, but it doesn't support nested functions. Instead, it offers the blocks extension, which works somewhat similar:

void outer() {
    __block int x = 10;

    void (^inner)(void) = ^{
        x += 10;
    };

    inner();
}

We can use Clang blocks to make a defer version that works with both GCC and Clang:

#if defined(__clang__)

// Clang implementation.
#define _DEFER_CONCAT(a, b) a##b
#define _DEFER_NAME(a, b) _DEFER_CONCAT(a, b)

static inline void _defer_cleanup(void (^*block)(void)) {
    if (*block) (*block)();
}

#define defer                                                                   \
    __attribute__((unused)) void (^_DEFER_NAME(_defer_var_, __COUNTER__))(void) \
        __attribute__((cleanup(_defer_cleanup))) = ^

#elif defined(__GNUC__)

// GCC implementation.
#define defer _DEFER(__COUNTER__)
#define _DEFER(N) __DEFER(N)
#define __DEFER(N) ___DEFER(__DEFER_FUNC_##N, __DEFER_VAR_##N)

#define ___DEFER(F, V)                                         \
    auto void F(void*);                                        \
    __attribute__((cleanup(F))) int V __attribute__((unused)); \
    auto void F(void* _dummy_ptr)

#else

// Runtime error for unsupported compilers.
#define defer assert(!"unsupported compiler");

#endif

Usage example:

int main(void) {
    int* p = malloc(sizeof(int));
    if (!p) return 1;
    defer { loud_free(p); };

    *p = 42;
    printf("p = %d\n", *p);
}
p = 42
freeing 0x127e05b30

Now it works with Clang, but there are several things to be aware of:

  1. We must compile with -fblocks.
  2. We must put a ; after the closing brace in the deferred block: defer { ... };.
  3. If we need to modify a variable inside the defer block, the variable must be declared with __block:
__block int x = 0;
defer { x += 10; };

On the plus side, this implementation works with both GCC and Clang. The downside is that it's still not standard C, and won't work with other compilers like MSVC.

MSVC

MSVC, of course, doesn't support the cleanup attribute. But it provides "structured exception handling" with the __try and __finally keywords:

int main(void) {
    int* p = malloc(sizeof(int));
    if (!p) return 1;
    __try {
        *p = 42;
        printf("p = %d\n", *p);
    }
    __finally {
        loud_free(p);
    }
}

The code in the __finally block will always run, no matter how the __try block exits — whether it finishes normally, returns early, or crashes (for example, from a null pointer dereference).

This isn't the defer we're looking for, but it's a decent alternative if you're only programming for Windows.

Long jump

There are well-known defer implementations by Jens Gustedt and moon-chilled that use setjmp and longjmp. I'm mentioning them for completeness, but honestly, I would never use them in production. The first one is extremely large, and the second one is extremely hacky. Also, I'd rather not use long jumps unless it's absolutely necessary.

Still, here's a usage example from Gustedt's library:

guard {
    void * const p = malloc(25);
    if (!p) break;
    defer free(p);

    void * const q = malloc(25);
    if (!q) break;
    defer free(q);

    if (mtx_lock(&mut)==thrd_error) break;
    defer mtx_unlock(&mut);
}

Here, all deferred statements run at the end of the guarded block, no matter how we exit the block (normally or through break).

For loop

The stc library probably has the simplest defer implementation ever:

#define defer(...) \
    for (int _c_i3 = 0; _c_i3++ == 0; __VA_ARGS__)

Usage example:

int main(void) {
    int* p = malloc(sizeof(int));
    if (!p) return 1;
    defer(loud_free(p)) {
        *p = 42;
        printf("p = %d\n", *p);
    }
}
p = 42
freeing 0x127e05b30

Here, the deferred statement is passed as __VA_ARGS__ and is used as the loop increment. The "defer-aware" block of code is the loop body. Since the increment runs after the body, the deferred statement executes after the main code.

This approach works with all mainstream compilers, but it falls apart if you try to exit early with break or return:

int main(void) {
    int* p = malloc(sizeof(int));
    if (!p) return 1;
    defer(loud_free(p)) {
        *p = 42;
        if (*p == 42) {
            printf("early exit, defer is not called\n");
            break;
        }
        printf("p = %d\n", *p);
    }
}
early exit, defer is not called

Stack

Dmitriy Kubyshkin provides a defer implementation that adds a "stack frame" of deferred calls to any function that needs them. Here's a simplified version:

#define countof(A) ((sizeof(A)) / (sizeof((A)[0])))

// Deferred function and its argument.
struct _defer_ctx {
    void (*fn)(void*);
    void* arg;
};

// Calls all deferred functions in LIFO order.
static inline void _defer_drain(
    const struct _defer_ctx* it,
    const struct _defer_ctx* end) {
    for (; it != end; it++) it->fn(it->arg);
}

// Initializes the defer stack with the given size
// for the current function.
#define defers(n)                     \
    struct {                          \
        struct _defer_ctx* first;     \
        struct _defer_ctx items[(n)]; \
    } _deferred = {&_deferred.items[(n)], {0}}

// Pushes a deferred function call onto the stack.
#define defer(_fn, _arg)                              \
    do {                                              \
        if (_deferred.first <= &_deferred.items[0]) { \
            assert(!"defer stack overflow");          \
        }                                             \
        struct _defer_ctx* d = --_deferred.first;     \
        d->fn = (void (*)(void*))(_fn);               \
        d->arg = (void*)(_arg);                       \
    } while (0)

// Calls all deferred functions and returns from the current function.
#define returnd                                          \
    while (                                              \
        _defer_drain(                                    \
            _deferred.first,                             \
            &_deferred.items[countof(_deferred.items)]), \
        1) return

Usage example:

int main(void) {
    // The function supports up to 16 deferred calls.
    defers(16);

    int* p = malloc(sizeof(int));
    if (!p) returnd 1;
    defer(loud_free, p);

    *p = 42;
    printf("p = %d\n", *p);

    // We must exit through returnd to
    // ensure deferred functions are called.
    returnd 0;
}
p = 42
freeing 0x127e05b30

This version works with all mainstream compilers. Also, unlike the STC version, defers run correctly in case of early exit:

int main(void) {
    defers(16);

    int* p = malloc(sizeof(int));
    if (!p) returnd 1;
    defer(loud_free, p);

    *p = 42;
    if (*p == 42) {
        printf("early exit\n");
        returnd 0;
    }

    printf("p = %d\n", *p);
    returnd 0;
}
early exit
freeing 0x127e05b30

Unfortunately, there are some drawbacks:

  • Defer only supports single-function calls, not code blocks.
  • We always have to call defers at the start of the function and exit using returnd. In the original implementation, Dmitriy overrides the return keyword, but this won't compile with strict compile flags (which I think we should always use).
  • The deferred function runs before the return value is evaluated, not after.

Simplified GCC/Clang

The Stack version above doesn't support deferring code blocks. In my opinion, that's not a problem, since most defers are just "free this resource" actions, which only need a single function call with one argument.

If we accept this limitation, we can simplify the GCC/Clang version by dropping GCC's nested functions and Clang's blocks:

#define _DEFER_CONCAT(a, b) a##b
#define _DEFER_NAME(a, b) _DEFER_CONCAT(a, b)

// Deferred function and its argument.
struct _defer_ctx {
    void (*fn)(void*);
    void* arg;
};

// Calls the deferred function with its argument.
static inline void _defer_cleanup(struct _defer_ctx* ctx) {
    if (ctx->fn) ctx->fn(ctx->arg);
}

// Create a deferred function call for the current scope.
#define defer(fn, ptr)                                      \
    struct _defer_ctx _DEFER_NAME(_defer_var_, __COUNTER__) \
        __attribute__((cleanup(_defer_cleanup))) =          \
            {(void (*)(void*))(fn), (void*)(ptr)}

Works like a charm:

int main(void) {
    int* p = malloc(sizeof(int));
    if (!p) return 1;
    defer(loud_free, p);

    *p = 42;
    printf("p = %d\n", *p);
}
p = 42
freeing 0x127e05b30

Final thoughts

Personally, I like the simpler GCC/Clang version better. Not having MSVC support isn't a big deal, since we can run GCC on Windows or use the Zig compiler, which works just fine.

But if I really need to support GCC, Clang, and MSVC — I'd probably go with the Stack version.

Anyway, I don't think we need to wait for defer to be added to the C standard. We already have defer at home!

]]>
Interfaces and traits in Chttps://antonz.org/interfaces-in-c/Thu, 22 Jan 2026 12:00:00 +0000https://antonz.org/interfaces-in-c/Implemented with structs and function pointers.Everyone likes interfaces in Go and traits in Rust. Polymorphism without class-based hierarchies or inheritance seems to be the sweet spot. What if we try to implement this in C?

Interfaces in Go • Traits in Rust • Toy example • Interface definition • Interface data • Method table • Method table in implementor • Type assertions • Separate self • Final thoughts

Interfaces in Go

An interface in Go is a convenient way to define a contract for some useful behavior. Take, for example, the honored io.Reader:

// Reader is the interface that wraps the basic Read method.
type Reader interface {
    // Read reads up to len(p) bytes into p. It returns the number of bytes
    // read (0 <= n <= len(p)) and any error encountered.
    Read(p []byte) (n int, err error)
}

Anything that can read data into a byte slice provided by the caller is a Reader. Quite handy, because the code doesn't need to care where the data comes from — whether it's memory, the file system, or the network. All that matters is that it can read the data into a slice:

// work processes the data read from r.
func work(r io.Reader) int {
    buf := make([]byte, 8)
    n, err := r.Read(buf)
    if err != nil && err != io.EOF {
        panic(err)
    }
    // ...
    return n
}

We can provide any kind of reader:

func main() {
    var total int
    b := bytes.NewBufferString("hello world")

    // bytes.Buffer implements io.Reader, so we can use it with work.
    total += work(b)
    total += work(b)

    fmt.Println("total =", total)
}
total = 11

Go's interfaces are structural, which is similar to duck typing. A type doesn't need to explicitly state that it implements io.Reader; it just needs to have a Read method:

// Zeros is an infinite stream of zero bytes.
type Zeros struct{}

func (z Zeros) Read(p []byte) (n int, err error) {
    clear(p)
    return len(p), nil
}

The Go compiler and runtime take care of the rest:

func main() {
    var total int
    var z Zeros

    // Zeros implements io.Reader, so we can use it with work.
    total += work(z)
    total += work(z)

    fmt.Println("total =", total)
}
total = 16

Traits in Rust

A trait in Rust is also a way to define a contract for certain behavior. Here's the std::io::Read trait:

// The Read trait allows for reading bytes from a source.
pub trait Read {
    // Readers are defined by one required method, read(). Each call to read()
    // will attempt to pull bytes from this source into a provided buffer.
    fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize>;

    // ...
}

Unlike in Go, a type must explicitly state that it implements a trait:

// An infinite stream of zero bytes.
struct Zeros;

impl io::Read for Zeros {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        buf.fill(0);
        Ok(buf.len())
    }
}

The Rust compiler takes care of the rest:

// Processes the data read from r.
fn work(r: &mut dyn io::Read) -> usize {
    let mut buf = [0; 8];
    match r.read(&mut buf) {
        Ok(n) => n,
        Err(e) => panic!("Error: {}", e),
    }
}

fn main() {
    let mut total = 0;
    let mut z = Zeros;

    // Zeros implements Read, so we can use it with work.
    total += work(&mut z);
    total += work(&mut z);

    println!("total = {}", total);
}
total = 16

Either way, whether it's Go or Rust, the caller only cares about the contract (defined as an interface or trait), not the specific implementation.

Toy example

Let's make an even simpler version of Reader — one without any error handling (Go):

// Reader an interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
type Reader interface {
    Read(p []byte) int
}

Usage example:

// Zeros is an infinite stream of zero bytes.
type Zeros struct {
    total int // total number of bytes read
}

// Read reads len(p) bytes into p.
func (z *Zeros) Read(p []byte) int {
    clear(p)
    z.total += len(p)
    return len(p)
}

// work processes the data read from r.
func work(r Reader) int {
    buf := make([]byte, 8)
    return r.Read(buf)
}

func main() {
    z := new(Zeros)
    work(z)
    work(z)
    fmt.Println("total =", z.total)
}
total = 16

Let's see how we can do this in C!

Interface definition

The main building blocks in C are structs and functions, so let's use them. Our Reader will be a struct with a single field called Read. This field will be a pointer to a function with the right signature:

// An interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
} Reader;

To make Zeros fully dynamic, let's turn it into a struct with a Read function pointer (I know, I know — just bear with me):

// An infinite stream of zero bytes.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
    size_t total;
} Zeros;

Here's the Zeros_Read "method" implementation:

// Reads up to len(p) bytes into p.
size_t Zeros_Read(void* self, uint8_t* p, size_t len) {
    Zeros* z = (Zeros*)self;
    for (size_t i = 0; i < len; i++) {
        p[i] = 0;
    }
    z->total += len;
    return len;
}

The work is pretty obvious:

// Does some work reading from r.
size_t work(Reader* r) {
    uint8_t buf[8];
    return r->Read(r, buf, sizeof(buf));
}

And, finally, the main function:

int main(void) {
    Zeros z = {.Read = Zeros_Read, .total = 0};

    Reader* r = (Reader*)&z;
    work(r);
    work(r);

    printf("total = %zu\n", z.total);
}
total = 16

See how easy it is to turn a Zeros into a Reader: all we need is (Reader*)&z. Pretty cool, right?

Not really. Actually, this implementation is seriously flawed in almost every way (except for the Reader definition).

Memory overhead. Each Zeros instance has its own function pointers (8 bytes per function on a 64-bit system) as "methods", which isn't practical even if there are only a few of them. Regular objects should store data, not functions.

Layout dependency. Converting from Zeros* to Reader* like (Reader*)&z only works if both structures have the same Read field as their first member. If we try to implement another interface:

// Reader interface.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
} Reader;

// Closer interface.
typedef struct {
    void (*Close)(void* self);
} Closer;

// Zeros implements both Reader and Closer.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
    void (*Close)(void* self);
    size_t total;
} Zeros;

Everything will fall apart:

int main(void) {
    Zeros z = {
        .Read = Zeros_Read,
        .Close = Zeros_Close,
        .total = 0,
    };
    Closer* c = (Closer*)&z;  // (X)
    c->Close(c);
}
Segmentation fault: 11

Closer and Zeros have different layouts, so type conversion in ⓧ is invalid and causes undefined behavior.

Lack of type safety. Using a void* as the receiver in Zeros_Read means the caller can pass any type, and the compiler won't even show a warning:

int main(void) {
    int x = 42;
    uint8_t buf[8];
    Zeros_Read(&x, buf, sizeof(buf));  // bad decision
}

size_t Zeros_Read(void* self, uint8_t* p, size_t len) {
    Zeros* z = (Zeros*)self;
    // ...
    z->total += len;                   // consequences
    return len;
}
Abort trap: 6

C isn't a particularly type-safe language, but this is just too much. Let's try something else.

Interface data

A better way is to store a reference to the actual object in the interface:

// An interface that wraps the basic Read method.
// Read reads up to len(p) Zeros into p.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
    void* self;
} Reader;

We could have the Read method in the interface take a Reader instead of a void*, but that would make the implementation more complicated without any real benefits. So, I'll keep it as void*.

Then Zeros will only have its own fields:

// An infinite stream of zero bytes.
typedef struct {
    size_t total;
} Zeros;

We can make the Zeros_Read method type-safe:

// Reads len(p) bytes into p.
size_t Zeros_Read(Zeros* z, uint8_t* p, size_t len) {
    for (size_t i = 0; i < len; i++) {
        p[i] = i % 256;
    }
    z->total += len;
    return len;
}

To make this work, we add a Zeros_Reader method that returns the instance wrapped in a Reader interface:

// Returns a Reader implementation for Zeros.
Reader Zeros_Reader(Zeros* z) {
    return (Reader){
        .Read = (size_t (*)(void*, uint8_t*, size_t))Zeros_Read,
        .self = z,
    };
}

Casting function pointers

Technically, casting a function pointer that takes a Zeros* to one that takes a void* is undefined behavior in standard C. The standards-compliant way is to accept void* in Zeros_Read and cast it to Zeros*, as we did in the first version of the program:

size_t Zeros_Read(void* self, uint8_t* p, size_t len) {
    Zeros* z = (Zeros*)self;
    // ...
}

Reader Zeros_Reader(Zeros* z) {
    static const ReaderTable impl = {.Read = Zeros_Read};
    return (Reader){.mtab = &impl, .self = z};
}

In practice, the cast works on virtually all architectures because pointers have the same representation. So I'll continue casting for the rest of the article.

The work and main functions remain quite simple:

// Does some work reading from r.
size_t work(Reader r) {
    uint8_t buf[8];
    return r.Read(r.self, buf, sizeof(buf));
}

int main(void) {
    Zeros z = {0};

    Reader r = Zeros_Reader(&z);
    work(r);
    work(r);

    printf("total = %zu\n", z.total);
}
total = 16

This approach is much better than the previous one:

  • The Zeros struct is lean and doesn't have any interface-related fields.
  • The Zeros_Read method takes a Zeros* instead of a void*.
  • The cast from Zeros to Reader is handled inside the Zeros_Reader method.
  • We can implement multiple interfaces if needed.

Since our Zeros type now knows about the Reader interface (through the Zeros_Reader method), our implementation is more like a basic version of a Rust dynamic trait than a true Go interface. For simplicity, I'll keep using the term "interface".

There is one downside, though: each Reader instance has its own function pointer for every interface method. Since Reader only has one method, this isn't an issue. But if an interface has a dozen methods and the program uses a lot of these interface instances, it can become a problem.

Let's fix this.

Method table

Let's extract interface methods into a separate strucute — the method table. The interface references its methods though the mtab field:

// An interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
} ReaderTable;

typedef struct {
    const ReaderTable* mtab;
    void* self;
} Reader;

Zeros and Zeros_Read don't change at all:

// An infinite stream of zero bytes.
typedef struct {
    size_t total;
} Zeros;

// Reads len(p) bytes into p.
size_t Zeros_Read(Zeros* z, uint8_t* p, size_t len) {
    for (size_t i = 0; i < len; i++) {
        p[i] = i % 256;
    }
    z->total += len;
    return len;
}

The Zeros_Reader method initializes the static method table and assigns it to the interface instance:

// Returns a Reader implementation for Zeros.
Reader Zeros_Reader(Zeros* z) {
    // The method table is only initialized once.
    static const ReaderTable impl = {
        .Read = (size_t (*)(void*, uint8_t*, size_t))Zeros_Read,
    };
    return (Reader){.mtab = &impl, .self = z};
}

The only difference in work is that it calls the Read method on the interface indirectly using the method table (r.mtab->Read instead of r.Read):

// Does some work reading from r.
size_t work(Reader r) {
    uint8_t buf[8];
    return r.mtab->Read(r.self, buf, sizeof(buf));
}

main stays the same:

int main(void) {
    Zeros z = {0};

    Reader r = Zeros_Reader(&z);
    work(r);
    work(r);

    printf("total = %zu\n", z.total);
}
total = 16

Now the Reader instance always has a single pointer field for its methods. So even for large interfaces, it only uses 16 bytes (mtab + self fields). This approach also keeps all the benefits from the previous version:

  • Lightweight Zeros structure.
  • Easy conversion from Zeros to Reader.
  • Supports multiple interfaces.

We can even add a separate Reader_Read helper so the client doesn't have to worry about r.mtab->Read implementation detail:

// Reads len(p) bytes into p.
size_t Reader_Read(Reader r, uint8_t* p, size_t len) {
    return r.mtab->Read(r.self, p, len);
}

// Does some work reading from r.
size_t work(Reader r) {
    uint8_t buf[8];
    return Reader_Read(r, buf, sizeof(buf));
}

Nice!

Method table in implementor

There's another approach I've seen out there. I don't like it, but it's still worth mentioning for completeness.

Instead of embedding the Reader method table in the interface, we can place it in the implementation (Zeros):

// An interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
} ReaderTable;

typedef ReaderTable* Reader;

// An infinite stream of zero bytes.
typedef struct {
    Reader mtab;
    size_t total;
} Zeros;

We initialize the method table in the Zeros constructor:

// Returns a new Zeros instance.
Zeros NewZeros(void) {
    static const ReaderTable impl = {
        .Read = (size_t (*)(void*, uint8_t*, size_t))Zeros_Read,
    };
    return (Zeros){
        .mtab = (Reader)&impl,
        .total = 0,
    };
}

work now takes a Reader pointer:

// Does some work reading from r.
size_t work(Reader* r) {
    uint8_t buf[8];
    return (*r)->Read(r, buf, sizeof(buf));
}

And main converts Zeros* to Reader* with a simple type cast:

int main(void) {
    Zeros z = NewZeros();

    Reader* r = (Reader*)&z;
    work(r);
    work(r);

    printf("total = %zu\n", z.total);
}
total = 16

This keeps Zeros pretty lightweight, only adding one extra mtab field. But the (Reader*)&z cast only works because Reader mtab is the first field in Zeros. If we try to implement a second interface, things will break — just like in the very first solution.

I think the "method table in the interface" approach is much better.

Type assertions

Go has an io.Copy function that copies data from a source (a reader) to a destination (a writer):

func Copy(dst Writer, src Reader) (written int64, err error)

There's an interesting comment in its documentation:

If src implements WriterTo, the copy is implemented by calling src.WriteTo(dst). Otherwise, if dst implements ReaderFrom, the copy is implemented by calling dst.ReadFrom(src).

Here's what the function looks like:

func Copy(dst Writer, src Reader) (written int64, err error) {
    // If the reader has a WriteTo method, use it to do the copy.
    // Avoids an allocation and a copy.
    if wt, ok := src.(WriterTo); ok {
        return wt.WriteTo(dst)
    }
    // Similarly, if the writer has a ReadFrom method, use it to do the copy.
    if rf, ok := dst.(ReaderFrom); ok {
        return rf.ReadFrom(src)
    }
    // The default implementation using regular Reader and Writer.
    // ...
}

src.(WriterTo) is a type assertion that checks if the src reader is not just a Reader, but also implements the WriterTo interface. The Go runtime handles these kinds of dynamic type checks.

Can we do something like this in C? I'd prefer not to make it fully dynamic, since trying to recreate parts of the Go runtime in C probably isn't a good idea.

What we can do is add an optional AsWriterTo method to the Reader interface:

// An interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
typedef struct {
    // required
    size_t (*Read)(void* self, uint8_t* p, size_t len);
    // optional
    WriterTo (*AsWriterTo)(void* self);
} ReaderTable;

typedef struct {
    const ReaderTable* mtab;
    void* self;
} Reader;

Then we can easily check if a given Reader is also a WriterTo:

void work(Reader r) {
    // Check if r implements WriterTo.
    if (r.mtab->AsWriterTo) {
        WriterTo wt = r.mtab->AsWriterTo(r.self);
        // Use r as WriterTo...
        return;
    }
    // Use r as a regular Reader...
    return;
}

Still, this feels a bit like a hack. I'd rather avoid using type assertions unless it's really necessary.

Separate self

Some C programmers don't like the "method table + data" approach to interfaces because it feels too heavy and too "object-oriented". An alternative is to just use a method table as the interface:

// An interface that groups the basic Read and Close methods.
typedef struct {
    // Read reads up to len(p) bytes into p.
    size_t (*Read)(void* self, uint8_t* p, size_t len);
    // Close frees resources associated with the reader.
    void (*Close)(void* self);
} ReadCloser;

The Zeros type and its methods are exactly the same as before (I just changed Zeros* to void* to follow the standard):

// An infinite stream of zero bytes.
typedef struct {
    size_t total;
} Zeros;

// Reads len(p) bytes into p.
size_t Zeros_Read(void* self, uint8_t* p, size_t len) {
    Zeros* z = (Zeros*)self;
    for (size_t i = 0; i < len; i++) {
        p[i] = i % 256;
    }
    z->total += len;
    return len;
}

// Closes the Zeros reader.
void Zeros_Close(void* self) {
    // No resources to free for Zeros.
    (void)self;
}

Since the interface no longer holds the data, converting from Zeros to Reader is now a constant instead of a function:

// ReadCloser implementation for Zeros.
const ReadCloser Zeros_ReadCloser = {
    .Read = Zeros_Read,
    .Close = Zeros_Close,
};

Unfortunately, this approach makes work a bit more complicated because it now has to take the instance as a separate parameter:

// Does some work reading from r, then closes it.
size_t work(ReadCloser r, void* self) {
    uint8_t buf[8];
    size_t n = r.Read(self, buf, sizeof(buf));
    r.Close(self);
    return n;
}

int main(void) {
    Zeros z = {0};
    work(Zeros_ReadCloser, &z);
    printf("total = %zu\n", z.total);
}
total = 8

This approach keeps the interface and implementation fairly simple, but it puts more responsibility on the caller, who now has to keep track of the instance and pass it in when calling interface methods.

Final thoughts

Interfaces (dynamic traits, really) in C are possible, but they're not as simple or elegant as in Go or Rust. The method table approach we discussed is a good starting point. It's memory-efficient, as type-safe as possible given C's limitations, and supports polymorphic behavior.

Here's the full source code if you are interested:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

// An interface that wraps the basic Read method.
// Read reads up to len(p) bytes into p.
typedef struct {
    size_t (*Read)(void* self, uint8_t* p, size_t len);
} ReaderTable;

typedef struct {
    const ReaderTable* mtab;
    void* self;
} Reader;

// Reads len(p) bytes into p.
size_t Reader_Read(Reader r, uint8_t* p, size_t len) {
    return r.mtab->Read(r.self, p, len);
}

// An infinite stream of zero bytes.
typedef struct {
    size_t total;
} Zeros;

// Reads len(p) bytes into p.
size_t Zeros_Read(Zeros* z, uint8_t* p, size_t len) {
    for (size_t i = 0; i < len; i++) {
        p[i] = i % 256;
    }
    z->total += len;
    return len;
}

// Returns a Reader implementation for Zeros.
Reader Zeros_Reader(Zeros* z) {
    // The method table is only initialized once.
    static const ReaderTable impl = {
        .Read = (size_t (*)(void*, uint8_t*, size_t))Zeros_Read,
    };
    return (Reader){.mtab = &impl, .self = z};
}

// Does some work reading from r.
size_t work(Reader r) {
    uint8_t buf[8];
    return Reader_Read(r, buf, sizeof(buf));
}

int main(void) {
    Zeros z = {0};

    Reader r = Zeros_Reader(&z);
    work(r);
    work(r);

    printf("total = %zu\n", z.total);
}
total = 16

Cheers!

]]>
Go 1.26 interactive tourhttps://antonz.org/go-1-26/Mon, 05 Jan 2026 13:00:00 +0000https://antonz.org/go-1-26/New with expressions, type-safe error checking, and faster everything.Go 1.26 is out, so it's a good time to explore what's new. The official release notes are pretty dry, so I prepared an interactive version with lots of examples showing what has changed and what the new behavior is.

Read on and see!

new(expr) • Recursive type constraints • Type-safe error checking • Green Tea GC • Faster cgo and syscalls • Faster memory allocation • Vectorized operations • Secret mode • Reader-less cryptography • Hybrid public key encryption • Goroutine leak profile • Goroutine metrics • Reflective iterators • Peek into a buffer • Process handle • Signal as cause • Compare IP subnets • Context-aware dialing • Fake example.com • Optimized fmt.Errorf • Optimized io.ReadAll • Multiple log handlers • Test artifacts • Modernized go fix • Final thoughts

This article is based on the official release notes from The Go Authors and the Go source code, licensed under the BSD-3-Clause license. This is not an exhaustive list; see the official release notes for that.

I provide links to the documentation (𝗗), proposals (𝗣), commits (𝗖𝗟), and authors (𝗔) for the features described. Check them out for motivation, usage, and implementation details. I also have dedicated guides (𝗚) for some of the features.

Error handling is often skipped to keep things simple. Don't do this in production ツ

# new(expr)

Previously, you could only use the new built-in with types:

p := new(int)
*p = 42
fmt.Println(*p)
42

Now you can also use it with expressions:

// Pointer to a int variable with the value 42.
p := new(42)
fmt.Println(*p)
42

If the argument expr is an expression of type T, then new(expr) allocates a variable of type T, initializes it to the value of expr, and returns its address, a value of type *T.

This feature is especially helpful if you use pointer fields in a struct to represent optional values that you marshal to JSON or Protobuf:

type Cat struct {
    Name string `json:"name"`
    Fed  *bool  `json:"is_fed"` // you can never be sure
}

cat := Cat{Name: "Mittens", Fed: new(true)}
data, _ := json.Marshal(cat)
fmt.Println(string(data))
{"name":"Mittens","is_fed":true}

You can use new with composite values:

s := new([]int{11, 12, 13})
fmt.Println(*s)

type Person struct{ name string }
p := new(Person{name: "alice"})
fmt.Println(*p)
[11 12 13]
{alice}

And function calls:

f := func() string { return "go" }
p := new(f())
fmt.Println(*p)
go

Passing nil is still not allowed:

p := new(nil)
// compilation error

𝗗 spec • 𝗣 45624 • 𝗖𝗟 704935, 704737, 704955, 705157 • 𝗔 Alan Donovan

# Recursive type constraints

Generic functions and types take types as parameters:

// A list of values.
type List[T any] struct {}

// Reverses a slice in-place.
func Reverse[T any](s []T)

We can further restrict these type parameters by using type constraints:

// The map key must have a comparable type.
type Map[K comparable, V any] struct {}

// S is a slice with values of a comparable type,
// or a type derived from such a slice (e.g., type MySlice []int).
func Compact[S ~[]E, E comparable](s S) S

Previously, type constraints couldn't directly or indirectly refer back to the generic type:

type T[P T[P]] struct{}
// compile error:
// invalid recursive type: T refers to itself

Now they can:

type T[P T[P]] struct{}
ok

A typical use case is a generic type that supports operations with arguments or results of the same type as itself:

// A value that can be compared to other values
// of the same type using the less-than operation.
type Ordered[T Ordered[T]] interface {
    Less(T) bool
}

Now we can create a generic container with Ordered values and use it with any type that implements Less:

// A tree stores comparable values.
type Tree[T Ordered[T]] struct {
    nodes []T
}

// netip.Addr has a Less method with the right signature,
// so it meets the requirements for Ordered[netip.Addr].
t := Tree[netip.Addr]{}
_ = t
ok

This makes Go's generics a bit more expressive.

𝗣 68162, 75883 • 𝗖𝗟 711420, 711422 • 𝗔 Robert Griesemer

# Type-safe error checking

The new errors.AsType function is a generic version of errors.As:

// go 1.13+
func As(err error, target any) bool
// go 1.26+
func AsType[E error](err error) (E, bool)

It's type-safe and easier to use:

// using errors.As
var target *AppError
if errors.As(err, &target) {
    fmt.Println("application error:", target)
}
application error: database is down
// using errors.AsType
if target, ok := errors.AsType[*AppError](err); ok {
    fmt.Println("application error:", target)
}
application error: database is down

AsType is especially handy when checking for multiple types of errors. It makes the code shorter and keeps error variables scoped to their if blocks:

if connErr, ok := errors.AsType[*net.OpError](err); ok {
    fmt.Println("Network operation failed:", connErr.Op)
} else if dnsErr, ok := errors.AsType[*net.DNSError](err); ok {
    fmt.Println("DNS resolution failed:", dnsErr.Name)
} else {
    fmt.Println("Unknown error")
}
DNS resolution failed: antonz.org

Another issue with As is that it uses reflection and can cause runtime panics if used incorrectly (like if you pass a non-pointer or a type that doesn't implement error):

// using errors.As
var target AppError
if errors.As(err, &target) {
    fmt.Println("application error:", target)
}
panic: errors: *target must be interface or implement error

AsType doesn't cause a runtime panic; it gives a clear compile-time error instead:

// using errors.AsType
if target, ok := errors.AsType[AppError](err); ok {
    fmt.Println("application error:", target)
}
./main.go:24:32: AppError does not satisfy error (method Error has pointer receiver)

AsType doesn't use reflect, executes faster, and allocates less than As:

goos: darwin
goarch: arm64
cpu: Apple M1
BenchmarkAs-8        12606744    95.62 ns/op    40 B/op    2 allocs/op
BenchmarkAsType-8    37961869    30.26 ns/op    24 B/op    1 allocs/op

source

Since AsType can handle everything that As does, it's a recommended drop-in replacement for new code.

𝗗 errors.AsType • 𝗣 51945 • 𝗖𝗟 707235 • 𝗔 Julien Cretel

# Green Tea garbage collector

The new garbage collector (first introduced as experimental in 1.25) is designed to make memory management more efficient on modern computers with many CPU cores.

Motivation

Go's traditional garbage collector algorithm operates on graph, treating objects as nodes and pointers as edges, without considering their physical location in memory. The scanner jumps between distant memory locations, causing frequent cache misses.

As a result, the CPU spends too much time waiting for data to arrive from memory. More than 35% of the time spent scanning memory is wasted just stalling while waiting for memory accesses. As computers get more CPU cores, this problem gets even worse.

Implementation

Green Tea shifts the focus from being processor-centered to being memory-aware. Instead of scanning individual objects, it scans memory in contiguous 8 KiB blocks called spans. The algorithm focuses on small objects (up to 512 bytes) because they are the most common and hardest to scan efficiently.

Each span is divided into equal slots based on its assigned size class, and it only contains objects of that size class. For example, if a span is assigned to the 32-byte size class, the whole block is split into 32-byte slots, and objects are placed directly into these slots, each starting at the beginning of its slot. Because of this fixed layout, the garbage collector can easily find an object's metadata using simple address arithmetic, without checking the size of each object it finds.

When the algorithm finds an object that needs to be scanned, it marks the object's location in its span but doesn't scan it immediately. Instead, it waits until there are several objects in the same span that need scanning. Then, when the garbage collector processes that span, it scans multiple objects at once. This is much faster than going over the same area of memory multiple times.

To make better use of CPU cores, GC workers share the workload by stealing tasks from each other. Each worker has its own local queue of spans to scan, and if a worker is idle, it can grab tasks from the queues of other busy workers. This decentralized approach removes the need for a central global list, prevents delays, and reduces contention between CPU cores.

Green Tea uses vectorized CPU instructions (only on amd64 architectures) to process memory spans in bulk when there are enough objects.

Benchmarks

Benchmark results vary, but the Go team expects a 10–40% reduction in garbage collection overhead in real-world programs that rely heavily on the garbage collector. Plus, with vectorized implementation, an extra 10% reduction in GC overhead when running on CPUs like Intel Ice Lake or AMD Zen 4 and newer.

Unfortunately, I couldn't find any public benchmark results from the Go team for the latest version of Green Tea, and I wasn't able to create a good synthetic benchmark myself. So, no details this time :(

The new garbage collector is enabled by default. To use the old garbage collector, set GOEXPERIMENT=nogreenteagc at build time (this option is expected to be removed in Go 1.27).

𝗣 73581 • 𝗔 Michael Knyszek

# Faster cgo and syscalls

In the Go runtime, a processor (often referred to as a P) is a resource required to run the code. For a thread (a machine or M) to execute a goroutine (G), it must first acquire a processor.

Processors move through different states. They can be running (executing code), idle (waiting for work), or gcstop (paused because of the garbage collection).

Previously, processors had a state called syscall used when a goroutine is making a system or cgo call. Now, this state has been removed. Instead of using a separate processor state, the system now checks the status of the goroutine assigned to the processor to see if it's involved in a system call.

This reduces internal runtime overhead and simplifies code paths for cgo and syscalls. The Go release notes say -30% in cgo runtime overhead, and the commit mentions an 18% sec/op improvement:

goos: linux
goarch: amd64
pkg: internal/runtime/cgobench
cpu: AMD EPYC 7B13
                   │ before.out  │             after.out              │
                   │   sec/op    │   sec/op     vs base               │
CgoCall-64           43.69n ± 1%   35.83n ± 1%  -17.99% (p=0.002 n=6)
CgoCallParallel-64   5.306n ± 1%   5.338n ± 1%        ~ (p=0.132 n=6)

I decided to run the CgoCall benchmarks locally as well:

goos: darwin
goarch: arm64
cpu: Apple M1
                      │ go1_25.txt  │             go1_26.txt              │
                      │   sec/op    │   sec/op     vs base                │
CgoCall-8               28.55n ± 4%   19.02n ± 2%  -33.40% (p=0.000 n=10)
CgoCallWithCallback-8   72.76n ± 5%   57.38n ± 2%  -21.14% (p=0.000 n=10)
geomean                 45.58n        33.03n       -27.53%

Either way, both a 20% and a 30% improvement are pretty impressive.

And here are the results from a local syscall benchmark:

goos: darwin
goarch: arm64
cpu: Apple M1
          │ go1_25.txt  │             go1_26.txt             │
          │   sec/op    │   sec/op     vs base               │
Syscall-8   195.6n ± 4%   178.1n ± 1%  -8.95% (p=0.000 n=10)
source
func BenchmarkSyscall(b *testing.B) {
    for b.Loop() {
        _, _, _ = syscall.Syscall(syscall.SYS_GETPID, 0, 0, 0)
    }
}

That's pretty good too.

𝗖𝗟 646198 • 𝗔 Michael Knyszek

# Faster memory allocation

The Go runtime now has specialized versions of its memory allocation function for small objects (from 1 to 512 bytes). It uses jump tables to quickly choose the right function for each size, instead of relying on a single general-purpose implementation.

The Go release notes say "the compiler will now generate calls to size-specialized memory allocation routines". But based on the code, that's not completely accurate: the compiler still emits calls to the general-purpose mallocgc function. Then, at runtime, mallocgc dispatches those calls to the new specialized allocation functions.

This change reduces the cost of small object memory allocations by up to 30%. The Go team expects the overall improvement to be ~1% in real allocation-heavy programs.

I couldn't find any existing benchmarks, so I came up with my own. And indeed, running it on Go 1.25 compared to 1.26 shows a significant improvement:

goos: darwin
goarch: arm64
cpu: Apple M1
           │  go1_25.txt   │              go1_26.txt              │
           │    sec/op     │    sec/op     vs base                │
Alloc1-8      8.190n ±  6%   6.594n ± 28%  -19.48% (p=0.011 n=10)
Alloc8-8      8.648n ± 16%   7.522n ±  4%  -13.02% (p=0.000 n=10)
Alloc64-8     15.70n ± 15%   12.57n ±  4%  -19.88% (p=0.000 n=10)
Alloc128-8    56.80n ±  4%   17.56n ±  4%  -69.08% (p=0.000 n=10)
Alloc512-8    81.50n ± 10%   55.24n ±  5%  -32.23% (p=0.000 n=10)
geomean       21.99n         14.33n        -34.83%
source
var sink *byte

func benchmarkAlloc(b *testing.B, size int) {
    b.ReportAllocs()
    for b.Loop() {
        obj := make([]byte, size)
        sink = &obj[0]
    }
}

func BenchmarkAlloc1(b *testing.B)   { benchmarkAlloc(b, 1) }
func BenchmarkAlloc8(b *testing.B)   { benchmarkAlloc(b, 8) }
func BenchmarkAlloc64(b *testing.B)  { benchmarkAlloc(b, 64) }
func BenchmarkAlloc128(b *testing.B) { benchmarkAlloc(b, 128) }
func BenchmarkAlloc512(b *testing.B) { benchmarkAlloc(b, 512) }

The new implementation is enabled by default. You can disable it by setting GOEXPERIMENT=nosizespecializedmalloc at build time (this option is expected to be removed in Go 1.27).

𝗖𝗟 665835 • 𝗔 Michael Matloob

# Vectorized operations (experimental)

The new simd/archsimd package provides access to architecture-specific vectorized operations (SIMD — single instruction, multiple data). This is a low-level package that exposes hardware-specific functionality. It currently only supports amd64 platforms.

Because different CPU architectures have very different SIMD operations, it's hard to create a single portable API that works for all of them. So the Go team decided to start with a low-level, architecture-specific API first, giving "power users" immediate access to SIMD features on the most common server platform — amd64.

The package defines vector types as structs, like Int8x16 (a 128-bit SIMD vector with sixteen 8-bit integers) and Float64x8 (a 512-bit SIMD vector with eight 64-bit floats). These match the hardware's vector registers. The package supports vectors that are 128, 256, or 512 bits wide.

Most operations are defined as methods on vector types. They usually map directly to hardware instructions with zero overhead.

To give you a taste, here's a custom function that uses SIMD instructions to add 32-bit float vectors:

func Add(a, b []float32) []float32 {
    if len(a) != len(b) {
        panic("slices of different length")
    }

    // If AVX-512 isn't supported, fall back to scalar addition,
    // since the Float32x16.Add method needs the AVX-512 instruction set.
    if !archsimd.X86.AVX512() {
        return fallbackAdd(a, b)
    }

    res := make([]float32, len(a))
    n := len(a)
    i := 0

    // 1. SIMD loop: Process 16 elements at a time.
    for i <= n-16 {
        // Load 16 elements from a and b vectors.
        va := archsimd.LoadFloat32x16Slice(a[i : i+16])
        vb := archsimd.LoadFloat32x16Slice(b[i : i+16])

        // Add all 16 elements in a single instruction
        // and store the results in the result vector.
        vSum := va.Add(vb) // translates to VADDPS asm instruction
        vSum.StoreSlice(res[i : i+16])

        i += 16
    }

    // 2. Scalar tail: Process any remaining elements (0-15).
    for ; i < n; i++ {
        res[i] = a[i] + b[i]
    }

    return res
}

Let's try it on two vectors:

func main() {
    a := []float32{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17}
    b := []float32{17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1}
    res := Add(a, b)
    fmt.Println(res)
}
[18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18]

Common operations in the archsimd package include:

  • Load a vector from array/slice, or Store a vector to array/slice.
  • Arithmetic: Add, Sub, Mul, Div, DotProduct.
  • Bitwise: And, Or, Not, Xor, Shift.
  • Comparison: Equal, Greater, Less, Min, Max.
  • Conversion: As, SaturateTo, TruncateTo.
  • Masking: Compress, Masked, Merge.
  • Rearrangement: Permute.

The package uses only AVX instructions, not SSE.

Here's a simple benchmark for adding two vectors (both the "plain" and SIMD versions use pre-allocated slices):

goos: linux
goarch: amd64
cpu: AMD EPYC 9575F 64-Core Processor
BenchmarkAddPlain/1k-2         	 1517698	       889.9 ns/op	13808.74 MB/s
BenchmarkAddPlain/65k-2        	   23448	     52613 ns/op	14947.46 MB/s
BenchmarkAddPlain/1m-2         	    2047	   1005628 ns/op	11932.84 MB/s
BenchmarkAddSIMD/1k-2          	36594340	        33.58 ns/op	365949.74 MB/s
BenchmarkAddSIMD/65k-2         	  410742	      3199 ns/op	245838.52 MB/s
BenchmarkAddSIMD/1m-2          	   12955	     94228 ns/op	127351.33 MB/s

source

The package is experimental and can be enabled by setting GOEXPERIMENT=simd at build time.

𝗗 simd/archsimd • 𝗣 73787 • 𝗖𝗟 701915, 712880, 729900, 732020 • 𝗔 Junyang Shao, Sean Liao, Tom Thorogood

# Secret mode (experimental)

Cryptographic protocols like WireGuard or TLS have a property called "forward secrecy". This means that even if an attacker gains access to long-term secrets (like a private key in TLS), they shouldn't be able to decrypt past communication sessions. To make this work, ephemeral keys (temporary keys used to negotiate the session) need to be erased from memory immediately after the handshake. If there's no reliable way to clear this memory, these keys could stay there indefinitely. An attacker who finds them later could re-derive the session key and decrypt past traffic, breaking forward secrecy.

In Go, the runtime manages memory, and it doesn't guarantee when or how memory is cleared. Sensitive data might remain in heap allocations or stack frames, potentially exposed in core dumps or through memory attacks. Developers often have to use unreliable "hacks" with reflection to try to zero out internal buffers in cryptographic libraries. Even so, some data might still stay in memory where the developer can't reach or control it.

The Go team's solution to this problem is the new runtime/secret package. It lets you run a function in secret mode. After the function finishes, it immediately erases (zeroes out) the registers and stack it used. Heap allocations made by the function are erased as soon as the garbage collector decides they are no longer reachable.

secret.Do(func() {
    // Generate an ephemeral key and
    // use it to negotiate the session.
})

This helps make sure sensitive information doesn't stay in memory longer than needed, lowering the risk of attackers getting to it.

Here's an example that shows how secret.Do might be used in a more or less realistic setting. Let's say you want to generate a session key while keeping the ephemeral private key and shared secret safe:

// DeriveSessionKey does an ephemeral key exchange to create a session key.
func DeriveSessionKey(peerPublicKey *ecdh.PublicKey) (*ecdh.PublicKey, []byte, error) {
    var pubKey *ecdh.PublicKey
    var sessionKey []byte
    var err error

    // Use secret.Do to contain the sensitive data during the handshake.
    // The ephemeral private key and the raw shared secret will be
    // wiped out when this function finishes.
    secret.Do(func() {
        // 1. Generate an ephemeral private key.
        // This is highly sensitive; if leaked later, forward secrecy is broken.
        privKey, e := ecdh.P256().GenerateKey(rand.Reader)
        if e != nil {
            err = e
            return
        }

        // 2. Compute the shared secret (ECDH).
        // This raw secret is also highly sensitive.
        sharedSecret, e := privKey.ECDH(peerPublicKey)
        if e != nil {
            err = e
            return
        }

        // 3. Derive the final session key (e.g., using HKDF).
        // We copy the result out; the inputs (privKey, sharedSecret)
        // will be destroyed by secret.Do when they become unreachable.
        sessionKey = performHKDF(sharedSecret)
        pubKey = privKey.PublicKey()
    })

    // The session key is returned for use, but the "recipe" to recreate it
    // is destroyed. Additionally, because the session key was allocated
    // inside the secret block, the runtime will automatically zero it out
    // when the application is finished using it.
    return pubKey, sessionKey, err
}

Here, the ephemeral private key and the raw shared secret are effectively "toxic waste" — they are necessary to create the final session key, but dangerous to keep around.

If these values stay in the heap and an attacker later gets access to the application's memory (for example, via a core dump or a vulnerability like Heartbleed), they could use these intermediates to re-derive the session key and decrypt past conversations.

By wrapping the calculation in secret.Do, we make sure that as soon as the session key is created, the "ingredients" used to make it are permanently destroyed. This means that even if the server is compromised in the future, this specific past session can't be exposed, which ensures forward secrecy.

func main() {
    // Generate a dummy peer public key.
    priv, _ := ecdh.P256().GenerateKey(nil)
    peerPubKey := priv.PublicKey()

    // Derive the session key.
    pubKey, sessionKey, err := DeriveSessionKey(peerPubKey)
    fmt.Printf("public key = %x...\n", pubKey.Bytes()[:16])
    fmt.Printf("error = %v\n", err)
    var _ = sessionKey
}
public key = 04288d5ade66bab4320a86d80993f628...
error = <nil>

The current secret.Do implementation only supports Linux (amd64 and arm64). On unsupported platforms, Do invokes the function directly. Also, trying to start a goroutine within the function causes a panic (this will be fixed in Go 1.27).

The runtime/secret package is mainly for developers who work on cryptographic libraries. Most apps should use higher-level libraries that use secret.Do behind the scenes.

The package is experimental and can be enabled by setting GOEXPERIMENT=runtimesecret at build time.

𝗗 runtime/secret • 𝗣 21865 • 𝗖𝗟 704615 • 𝗔 Daniel Morsing

# Reader-less cryptography

Current cryptographic APIs, like ecdsa.GenerateKey or rand.Prime, often accept an io.Reader as the source of random data:

// Generate a new ECDSA private key for the specified curve.
key, _ := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
fmt.Println(key.D)

// Generate a 64-bit integer that is prime with high probability.
prim, _ := rand.Prime(rand.Reader, 64)
fmt.Println(prim)
31253152889057471714062019675387570049552680140182252615946165331094890182019
17433987073571224703

These APIs don't commit to a specific way of using random bytes from the reader. Any change to underlying cryptographic algorithms can change the sequence or amount of bytes read. Because of this, if the application code (mistakenly) relies on a specific implementation in Go version X, it might fail or behave differently in version X+1.

The Go team chose a pretty bold solution to this problem. Now, most crypto APIs will just ignore the random io.Reader parameter and always use the system random source (crypto/internal/sysrand.Read).

// The reader parameter is no longer used, so you can just pass nil.

// Generate a new ECDSA private key for the specified curve.
key, _ := ecdsa.GenerateKey(elliptic.P256(), nil)
fmt.Println(key.D)

// Generate a 64-bit integer that is prime with high probability.
prim, _ := rand.Prime(nil, 64)
fmt.Println(prim)
16265662996876675161677719946085651215874831846675169870638460773593241527197
14874320216361938581

The change applies to the following crypto subpackages:

// crypto/dsa
func GenerateKey(priv *PrivateKey, rand io.Reader) error

// crypto/ecdh
type Curve interface {
    // ...
    GenerateKey(rand io.Reader) (*PrivateKey, error)
}

// crypto/ecdsa
func GenerateKey(c elliptic.Curve, rand io.Reader) (*PrivateKey, error)
func SignASN1(rand io.Reader, priv *PrivateKey, hash []byte) ([]byte, error)
func Sign(rand io.Reader, priv *PrivateKey, hash []byte) (r, s *big.Int, err error)
func (priv *PrivateKey) Sign(rand io.Reader, digest []byte, opts crypto.SignerOpts) ([]byte, error)

// crypto/rand
func Prime(rand io.Reader, bits int) (*big.Int, error)

// crypto/rsa
func GenerateKey(random io.Reader, bits int) (*PrivateKey, error)
func GenerateMultiPrimeKey(random io.Reader, nprimes int, bits int) (*PrivateKey, error)
func EncryptPKCS1v15(random io.Reader, pub *PublicKey, msg []byte) ([]byte, error)

ed25519.GenerateKey(rand) still uses the random reader if provided. But if rand is nil, it uses an internal secure source of random bytes instead of crypto/rand.Reader (which could be overridden).

To support deterministic testing, there's a new testing/cryptotest package with a single SetGlobalRandom function. It sets a global, deterministic cryptographic randomness source for the duration of the given test:

func Test(t *testing.T) {
    cryptotest.SetGlobalRandom(t, 42)

    // All test runs will generate the same numbers.
    p1, _ := rand.Prime(nil, 32)
    p2, _ := rand.Prime(nil, 32)
    p3, _ := rand.Prime(nil, 32)

    got := [3]int64{p1.Int64(), p2.Int64(), p3.Int64()}
    want := [3]int64{3713413729, 3540452603, 4293217813}
    if got != want {
        t.Errorf("got %v, want %v", got, want)
    }
}
PASS

SetGlobalRandom affects crypto/rand and all implicit sources of cryptographic randomness in the crypto/* packages:

func Test(t *testing.T) {
    cryptotest.SetGlobalRandom(t, 42)

    t.Run("rand.Read", func(t *testing.T) {
        var got [4]byte
        rand.Read(got[:])
        want := [4]byte{34, 48, 31, 184}
        if got != want {
            t.Errorf("got %v, want %v", got, want)
        }
    })

    t.Run("rand.Int", func(t *testing.T) {
        got, _ := rand.Int(rand.Reader, big.NewInt(10000))
        const want = 6185
        if got.Int64() != want {
            t.Errorf("got %v, want %v", got.Int64(), want)
        }
    })
}
PASS

To temporarily restore the old reader-respecting behavior, set GODEBUG=cryptocustomrand=1 (this option will be removed in a future release).

𝗗 testing/cryptotest • 𝗣 70942 • 𝗖𝗟 724480 • 𝗔 Filippo Valsorda, qiulaidongfeng

# Hybrid public key encryption

The new crypto/hpke package implements Hybrid Public Key Encryption (HPKE) as specified in RFC 9180.

HPKE is a relatively new IETF standard for hybrid encryption. Traditional public-key encryption methods, like RSA, are slow and can only handle small amounts of data. HPKE improves on this by combining two types of encryption: it uses asymmetric cryptography (public/private keys) to safely create a shared secret, then uses fast symmetric encryption to protect the actual data. This lets you securely and quickly encrypt large files or messages, while still using the security benefits of public-key systems.

The "asymmetric" part of HPKE (called Key Encapsulation Mechanism or KEM) can use both traditional algorithms, such as those using elliptic curves, and new post-quantum algorithms, like ML-KEM. ML-KEM is designed to remain secure even against future quantum computers that could break traditional cryptography.

I'm not going to pretend I'm an expert in cryptography, so here's an example I took straight from the Go standard library documentation. It uses ML-KEM-X25519 for asymmetric cryptography (traditional X25519 combined with ML-KEM), AES-256 for symmetric encryption, and SHA-256 as a key hash function:

// Encrypt a single message from a sender to a recipient using the one-shot API.
kem, kdf, aead := hpke.MLKEM768X25519(), hpke.HKDFSHA256(), hpke.AES256GCM()

// Recipient side
var (
    recipientPrivateKey hpke.PrivateKey
    publicKeyBytes      []byte
)
{
    k, err := kem.GenerateKey()
    if err != nil {
        panic(err)
    }
    recipientPrivateKey = k
    publicKeyBytes = k.PublicKey().Bytes()
}

// Sender side
var ciphertext []byte
{
    publicKey, err := kem.NewPublicKey(publicKeyBytes)
    if err != nil {
        panic(err)
    }

    message := []byte("secret message")
    ct, err := hpke.Seal(publicKey, kdf, aead, []byte("public"), message)
    if err != nil {
        panic(err)
    }

    ciphertext = ct
}

// Recipient side
{
    plaintext, err := hpke.Open(recipientPrivateKey, kdf, aead, []byte("public"), ciphertext)
    if err != nil {
        panic(err)
    }
    fmt.Printf("Decrypted: %s\n", plaintext)
}
Decrypted: secret message

As Filippo Valsorda (the cryptography engineer who maintains Go's crypto packages) says, HPKE is now the right way to do public key encryption.

𝗗 crypto/hpke • 𝗣 75300 • 𝗔 Filippo Valsorda

# Goroutine leak profile (experimental)

A leak occurs when one or more goroutines are indefinitely blocked on synchronization primitives like channels, while other goroutines continue running and the program as a whole keeps functioning. Here's a simple example:

func leak() <-chan int {
    out := make(chan int)
    go func() {
        out <- 42 // leaks if nobody reads from out
    }()
    return out
}

If we call leak and don't read from the output channel, the inner leak goroutine will stay blocked trying to send to the channel for the rest of the program:

func main() {
    leak()
    // ...
}
ok

Unlike deadlocks, leaks do not cause panics, so they are much harder to spot. Also, unlike data races, Go's tooling did not address them for a long time.

Things started to change in Go 1.24 with the introduction of the synctest package. Not many people talk about it, but synctest is a great tool for catching leaks during testing.

Go 1.26 adds a new experimental goroutineleak profile designed to report leaked goroutines in production. Here's how we can use it in the example above:

func main() {
    prof := pprof.Lookup("goroutineleak")
    leak()
    time.Sleep(50 * time.Millisecond)
    prof.WriteTo(os.Stdout, 2)
    // ...
}
goroutine 7 [chan send (leaked)]:
main.leak.func1()
    /tmp/sandbox/main.go:16 +0x1e
created by main.leak in goroutine 1
    /tmp/sandbox/main.go:15 +0x67

As you can see, we have a nice goroutine stack trace that shows exactly where the leak happens.

The goroutineleak profile finds leaks by using the garbage collector's marking phase to check which blocked goroutines are still connected to active code. It starts with runnable goroutines, marks all sync objects they can reach, and keeps adding any blocked goroutines waiting on those objects. When it can't add any more, any blocked goroutines left are waiting on resources that can't be reached — so they're considered leaked.

Tell me more

Here's the gist of it:

   [ Start: GC mark phase ]
             │ 1. Collect live goroutines
             v
   ┌───────────────────────┐
   │   Initial roots       │ <────────────────┐
   │ (runnable goroutines) │                  │
   └───────────────────────┘                  │
             │                                │
             │ 2. Mark reachable memory       │
             v                                │
   ┌───────────────────────┐                  │
   │   Reachable objects   │                  │
   │  (channels, mutexes)  │                  │
   └───────────────────────┘                  │
             │                                │
             │ 3a. Check blocked goroutines   │
             v                                │
   ┌───────────────────────┐          (Yes)   │
   │ Is blocked G waiting  │ ─────────────────┘
   │ on a reachable obj?   │ 3b. Add G to roots
   └───────────────────────┘
             │ (No - repeat until no new Gs found)
             v
   ┌───────────────────────┐
   │   Remaining blocked   │
   │      goroutines       │
   └───────────────────────┘
             │ 5. Report the leaks
             v
      [   LEAKED!   ]
 (Blocked on unreachable
  synchronization objects)
  1. Collect live goroutines. Start with currently active (runnable or running) goroutines as roots. Ignore blocked goroutines for now.
  2. Mark reachable memory. Trace pointers from roots to find which synchronization objects (like channels or wait groups) are currently reachable by these roots.
  3. Resurrect blocked goroutines. Check all currently blocked goroutines. If a blocked goroutine is waiting for a synchronization resource that was just marked as reachable — add that goroutine to the roots.
  4. Iterate. Repeat steps 2 and 3 until there are no more new goroutines blocked on reachable objects.
  5. Report the leaks. Any goroutines left in the blocked state are waiting for resources that no active part of the program can access. They're considered leaked.

For even more details, see the paper by Saioc et al.

If you want to see how goroutineleak (and synctest) can catch typical leaks that often happen in production — check out my article on goroutine leaks.

The goroutineleak profile is experimental and can be enabled by setting GOEXPERIMENT=goroutineleakprofile at build time. Enabling the experiment also makes the profile available as a net/http/pprof endpoint, /debug/pprof/goroutineleak.

According to the authors, the implementation is already production-ready. It's only marked as experimental so they can get feedback on the API, especially about making it a new profile.

𝗗 runtime/pprof • 𝗚 Detecting leaks • 𝗣 74609, 75280 • 𝗖𝗟 688335 • 𝗔 Vlad Saioc

# Goroutine metrics

New metrics in the runtime/metrics package give better insight into goroutine scheduling:

  • Total number of goroutines since the program started.
  • Number of goroutines in each state.
  • Number of active threads.

Here's the full list:

/sched/goroutines-created:goroutines
    Count of goroutines created since program start.

/sched/goroutines/not-in-go:goroutines
    Approximate count of goroutines running
    or blocked in a system call or cgo call.

/sched/goroutines/runnable:goroutines
    Approximate count of goroutines ready to execute,
    but not executing.

/sched/goroutines/running:goroutines
    Approximate count of goroutines executing.
    Always less than or equal to /sched/gomaxprocs:threads.

/sched/goroutines/waiting:goroutines
    Approximate count of goroutines waiting
    on a resource (I/O or sync primitives).

/sched/threads/total:threads
    The current count of live threads
    that are owned by the Go runtime.

Per-state goroutine metrics can be linked to common production issues. For example, an increasing waiting count can show a lock contention problem. A high not-in-go count means goroutines are stuck in syscalls or cgo. A growing runnable backlog suggests the CPUs can't keep up with demand.

You can read the new metric values using the regular metrics.Read function:

func main() {
    go work() // omitted for brevity
    time.Sleep(100 * time.Millisecond)

    fmt.Println("Goroutine metrics:")
    printMetric("/sched/goroutines-created:goroutines", "Created")
    printMetric("/sched/goroutines:goroutines", "Live")
    printMetric("/sched/goroutines/not-in-go:goroutines", "Syscall/CGO")
    printMetric("/sched/goroutines/runnable:goroutines", "Runnable")
    printMetric("/sched/goroutines/running:goroutines", "Running")
    printMetric("/sched/goroutines/waiting:goroutines", "Waiting")

    fmt.Println("Thread metrics:")
    printMetric("/sched/gomaxprocs:threads", "Max")
    printMetric("/sched/threads/total:threads", "Live")
}

func printMetric(name string, descr string) {
    sample := []metrics.Sample{{Name: name}}
    metrics.Read(sample)
    // Assuming a uint64 value; don't do this in production.
    // Instead, check sample[0].Value.Kind and handle accordingly.
    fmt.Printf("  %s: %v\n", descr, sample[0].Value.Uint64())
}
Goroutine metrics:
  Created: 57
  Live: 21
  Syscall/CGO: 0
  Runnable: 0
  Running: 1
  Waiting: 20
Thread metrics:
  Max: 2
  Live: 4

The per-state numbers (not-in-go + runnable + running + waiting) are not guaranteed to add up to the live goroutine count (/sched/goroutines:goroutines, available since Go 1.16).

All new metrics use uint64 counters.

𝗗 runtime/metrics • 𝗣 15490 • 𝗖𝗟 690397, 690398, 690399 • 𝗔 Michael Knyszek

# Reflective iterators

The new Type.Fields and Type.Methods methods in the reflect package return iterators for a type's fields and methods:

// List the fields of a struct type.
typ := reflect.TypeFor[http.Client]()
for f := range typ.Fields() {
    fmt.Println(f.Name, f.Type)
}
Transport http.RoundTripper
CheckRedirect func(*http.Request, []*http.Request) error
Jar http.CookieJar
Timeout time.Duration
// List the methods of a struct type.
typ := reflect.TypeFor[*http.Client]()
for m := range typ.Methods() {
    fmt.Println(m.Name, m.Type)
}
CloseIdleConnections func(*http.Client)
Do func(*http.Client, *http.Request) (*http.Response, error)
Get func(*http.Client, string) (*http.Response, error)
Head func(*http.Client, string) (*http.Response, error)
Post func(*http.Client, string, string, io.Reader) (*http.Response, error)
PostForm func(*http.Client, string, url.Values) (*http.Response, error)

The new methods Type.Ins and Type.Outs return iterators for the input and output parameters of a function type:

typ := reflect.TypeFor[filepath.WalkFunc]()

fmt.Println("Inputs:")
for par := range typ.Ins() {
    fmt.Println("-", par.Name())
}

fmt.Println("Outputs:")
for par := range typ.Outs() {
    fmt.Println("-", par.Name())
}
Input params:
- string
- FileInfo
- error
Output params:
- error

The new methods Value.Fields and Value.Methods return iterators for a value's fields and methods. Each iteration yields both the type information (StructField or Method) and the value:

client := &http.Client{}
val := reflect.ValueOf(client)

fmt.Println("Fields:")
for f, v := range val.Elem().Fields() {
    fmt.Printf("- name=%s kind=%s\n", f.Name, v.Kind())
}

fmt.Println("Methods:")
for m, v := range val.Methods() {
    fmt.Printf("- name=%s kind=%s\n", m.Name, v.Kind())
}
Fields:
- name=Transport kind=interface
- name=CheckRedirect kind=func
- name=Jar kind=interface
- name=Timeout kind=int64
Methods:
- name=CloseIdleConnections kind=func
- name=Do kind=func
- name=Get kind=func
- name=Head kind=func
- name=Post kind=func
- name=PostForm kind=func

Previously, you could get all this information by using a for-range loop with NumX methods (which is what iterators do internally):

// go 1.25
typ := reflect.TypeFor[http.Client]()
for i := range typ.NumField() {
    field := typ.Field(i)
    fmt.Println(field.Name, field.Type)
}
Transport http.RoundTripper
CheckRedirect func(*http.Request, []*http.Request) error
Jar http.CookieJar
Timeout time.Duration

Using an iterator is more concise. I hope it justifies the increased API surface.

𝗗 reflect • 𝗣 66631 • 𝗖𝗟 707356 • 𝗔 Quentin Quaadgras

# Peek into a buffer

The new Buffer.Peek method in the bytes package returns the next N bytes from the buffer without advancing it:

buf := bytes.NewBufferString("I love bytes")

sample, err := buf.Peek(1)
fmt.Printf("peek=%s err=%v\n", sample, err)

buf.Next(2)

sample, err = buf.Peek(4)
fmt.Printf("peek=%s err=%v\n", sample, err)
peek=I err=<nil>
peek=love err=<nil>

If Peek returns fewer than N bytes, it also returns io.EOF:

buf := bytes.NewBufferString("hello")
sample, err := buf.Peek(10)
fmt.Printf("peek=%s err=%v\n", sample, err)
peek=hello err=EOF

The slice returned by Peek points to the buffer's content and stays valid until the buffer is changed. So, if you change the slice right away, it will affect future reads:

buf := bytes.NewBufferString("car")
sample, err := buf.Peek(3)
fmt.Printf("peek=%s err=%v\n", sample, err)

sample[2] = 't' // changes the underlying buffer

data, err := buf.ReadBytes(0)
fmt.Printf("data=%s err=%v\n", data, err)
peek=car err=<nil>
data=cat err=EOF

The slice returned by Peek is only valid until the next call to a read or write method.

𝗗 Buffer.Peek • 𝗣 73794 • 𝗖𝗟 674415 • 𝗔 Ilia Choly

# Process handle

After you start a process in Go, you can access its ID:

attr := &os.ProcAttr{Files: []*os.File{os.Stdin, os.Stdout, os.Stderr}}
proc, _ := os.StartProcess("/bin/echo", []string{"echo", "hello"}, attr)
defer proc.Wait()

fmt.Println("pid =", proc.Pid)
pid = 41
hello

Internally, the os.Process type uses a process handle instead of the PID (which is just an integer), if the operating system supports it. Specifically, in Linux it uses pidfd, which is a file descriptor that refers to a process. Using the handle instead of the PID makes sure that Process methods always work with the same OS process, and not a different process that just happens to have the same ID.

Previously, you couldn't access the process handle. Now you can, thanks to the new Process.WithHandle method:

func (p *Process) WithHandle(f func(handle uintptr)) error

WithHandle calls a specified function and passes a process handle as an argument:

attr := &os.ProcAttr{Files: []*os.File{os.Stdin, os.Stdout, os.Stderr}}
proc, _ := os.StartProcess("/bin/echo", []string{"echo", "hello"}, attr)
defer proc.Wait()

fmt.Println("pid =", proc.Pid)
proc.WithHandle(func(handle uintptr) {
    fmt.Println("handle =", handle)
})
pid = 49
handle = 6
hello

The handle is guaranteed to refer to the process until the callback function returns, even if the process has already terminated. That's why it's implemented as a callback instead of a Process.Handle field or method.

WithHandle is only supported on Linux 5.4+ and Windows. On other operating systems, it doesn't execute the callback and returns an os.ErrNoHandle error.

𝗗 Process.WithHandle • 𝗣 70352 • 𝗖𝗟 699615 • 𝗔 Kir Kolyshkin

# Signal as cause

signal.NotifyContext returns a context that gets canceled when any of the specified signals is received. Previously, the canceled context only showed the standard "context canceled" cause:

// go 1.25

// The context will be canceled on SIGINT signal.
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
defer stop()

// Send SIGINT to self.
p, _ := os.FindProcess(os.Getpid())
_ = p.Signal(syscall.SIGINT)

// Wait for SIGINT.
<-ctx.Done()
fmt.Println("err =", ctx.Err())
fmt.Println("cause =", context.Cause(ctx))
err = context canceled
cause = context canceled

Now the context's cause shows exactly which signal was received:

// go 1.26

// The context will be canceled on SIGINT signal.
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
defer stop()

// Send SIGINT to self.
p, _ := os.FindProcess(os.Getpid())
_ = p.Signal(syscall.SIGINT)

// Wait for SIGINT.
<-ctx.Done()
fmt.Println("err =", ctx.Err())
fmt.Println("cause =", context.Cause(ctx))
err = context canceled
cause = interrupt signal received

The returned type, signal.signalError, is based on string, so it doesn't provide the actual os.Signal value — just its string representation.

𝗗 signal.NotifyContext • 𝗣 60756 • 𝗖𝗟 721700 • 𝗔 Filippo Valsorda

# Compare IP subnets

An IP address prefix represents an IP subnet. These prefixes are usually written in CIDR notation:

10.0.0.0/16
127.0.0.0/8
169.254.0.0/16
203.0.113.0/24

In Go, an IP prefix is represented by the netip.Prefix type.

The new Prefix.Compare method lets you compare two IP prefixes, making it easy to sort them without having to write your own comparison code:

prefixes := []netip.Prefix{
    netip.MustParsePrefix("10.1.0.0/16"),
    netip.MustParsePrefix("203.0.113.0/24"),
    netip.MustParsePrefix("10.0.0.0/16"),
    netip.MustParsePrefix("169.254.0.0/16"),
    netip.MustParsePrefix("203.0.113.0/8"),
}

slices.SortFunc(prefixes, netip.Prefix.Compare)

for _, p := range prefixes {
    fmt.Println(p.String())
}
10.0.0.0/16
10.1.0.0/16
169.254.0.0/16
203.0.113.0/8
203.0.113.0/24

Compare orders two prefixes as follows:

  • First by validity (invalid before valid).
  • Then by address family (IPv4 before IPv6).
    10.0.0.0/8 < ::/8
  • Then by masked IP address (network IP).
    10.0.0.0/16 < 10.1.0.0/16
  • Then by prefix length.
    10.0.0.0/8 < 10.0.0.0/16
  • Then by unmasked address (original IP).
    10.0.0.0/8 < 10.0.0.1/8

This follows the same order as Python's netaddr.IPNetwork and the standard IANA (Internet Assigned Numbers Authority) convention.

𝗗 Prefix.Compare • 𝗣 61642 • 𝗖𝗟 700355 • 𝗔 database64128

# Context-aware dialing

The net package has top-level functions for connecting to an address using different networks (protocols) — DialTCP, DialUDP, DialIP, and DialUnix. They were made before context.Context was introduced, so they don't support cancellation:

raddr, _ := net.ResolveTCPAddr("tcp", "127.0.0.1:12345")
conn, err := net.DialTCP("tcp", nil, raddr)
fmt.Printf("connected, err=%v\n", err)
defer conn.Close()
connected, err=<nil>

There's also a net.Dialer type with a general-purpose DialContext method. It supports cancellation and can be used to connect to any of the known networks:

var d net.Dialer
ctx := context.Background()
conn, err := d.DialContext(ctx, "tcp", "127.0.0.1:12345")
fmt.Printf("connected, err=%v\n", err)
defer conn.Close()
connected, err=<nil>

However, DialContext a bit less efficient than network-specific functions like net.DialTCP — because of the extra overhead from address resolution and network type dispatching.

So, network-specific functions in the net package are more efficient, but they don't support cancellation. The Dialer type supports cancellation, but it's less efficient. The Go team decided to resolve this contradiction.

The new context-aware Dialer methods (DialTCP, DialUDP, DialIP, and DialUnix) combine the efficiency of the existing network-specific net functions with the cancellation capabilities of Dialer.DialContext:

var d net.Dialer
ctx := context.Background()
raddr := netip.MustParseAddrPort("127.0.0.1:12345")
conn, err := d.DialTCP(ctx, "tcp", netip.AddrPort{}, raddr)
fmt.Printf("connected, err=%v\n", err)
defer conn.Close()
connected, err=<nil>

I wouldn't say that having three different ways to dial is very convenient, but that's the price of backward compatibility.

𝗗 net.Dialer • 𝗣 49097 • 𝗖𝗟 490975 • 𝗔 Michael Fraenkel

# Fake example.com

The default httptest.Server certificate already lists example.com in its DNSNames (a list of hostnames or domain names that the certificate is authorized to secure). Because of this, Server.Client doesn't trust responses from the real example.com:

// go 1.25
func Test(t *testing.T) {
    handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("hello"))
    })
    srv := httptest.NewTLSServer(handler)
    defer srv.Close()

    _, err := srv.Client().Get("https://example.com")
    if err != nil {
        t.Fatal(err)
    }
}
--- FAIL: Test (0.29s)
    main_test.go:19: Get "https://example.com":
    tls: failed to verify certificate:
    x509: certificate signed by unknown authority

To fix this issue, the HTTP client returned by httptest.Server.Client now redirects requests for example.com and its subdomains to the test server:

// go 1.26
func Test(t *testing.T) {
    handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("hello"))
    })
    srv := httptest.NewTLSServer(handler)
    defer srv.Close()

    resp, err := srv.Client().Get("https://example.com")
    if err != nil {
        t.Fatal(err)
    }

    body, _ := io.ReadAll(resp.Body)
    resp.Body.Close()

    if string(body) != "hello" {
        t.Errorf("Unexpected response body: %s", body)
    }
}
PASS

𝗗 Server.Client • 𝗖𝗟 666855 • 𝗔 Sean Liao

# Optimized fmt.Errorf

People often point out that using fmt.Errorf("x") for plain strings causes more memory allocations than errors.New("x"). Because of this, some suggest switching code from fmt.Errorf to errors.New when formatting isn't needed.

The Go team disagrees. Here's a quote from Russ Cox:

Using fmt.Errorf("foo") is completely fine, especially in a program where all the errors are constructed with fmt.Errorf. Having to mentally switch between two functions based on the argument is unnecessary noise.

With the new Go release, this debate should finally be settled. For unformatted strings, fmt.Errorf now allocates less and generally matches the allocations for errors.New.

Specifically, fmt.Errorf goes from 2 allocations to 0 allocations for a non-escaping error, and from 2 allocations to 1 allocation for an escaping error:

_ = fmt.Errorf("foo")    // non-escaping error
sink = fmt.Errorf("foo") // escaping error

This matches the allocations for errors.New in both cases.

The difference in CPU cost is also much smaller now. Previously, it was ~64ns vs. ~21ns for fmt.Errorf vs. errors.New for escaping errors, now it's ~25ns vs. ~21ns.

Tell me more

Here are the "before and after" benchmarks for the fmt.Errorf change. The non-escaping case is called local, and the escaping case is called sink. If there's just a plain error string, it's no-args. If the error includes formatting, it's int-arg.

Seconds per operation:

goos: linux
goarch: amd64
pkg: fmt
cpu: AMD EPYC 7B13
                         │    old.txt    │        new.txt        │
                         │      sec/op   │   sec/op     vs base  │
Errorf/no-arsg/local-16     63.76n ± 1%     4.874n ± 0%  -92.36% (n=120)
Errorf/no-args/sink-16      64.25n ± 1%     25.81n ± 0%  -59.83% (n=120)
Errorf/int-arg/local-16     90.86n ± 1%     90.97n ± 1%        ~ (p=0.713 n=120)
Errorf/int-arg/sink-16      91.81n ± 1%     91.10n ± 1%   -0.76% (p=0.036 n=120)

Bytes per operation:

                         │    old.txt    │        new.txt       │
                         │       B/op    │    B/op     vs base  │
Errorf/no-args/local-16      19.00 ± 0%      0.00 ± 0%  -100.00% (n=120)
Errorf/no-args/sink-16       19.00 ± 0%     16.00 ± 0%   -15.79% (n=120)
Errorf/int-arg/local-16      24.00 ± 0%     24.00 ± 0%         ~ (p=1.000 n=120)
Errorf/int-arg/sink-16       24.00 ± 0%     24.00 ± 0%         ~ (p=1.000 n=120)

Allocations per operation:

                         │    old.txt    │        new.txt       │
                         │    allocs/op  │  allocs/op   vs base │
Errorf/no-args/local-16      2.000 ± 0%     0.000 ± 0%  -100.00% (n=120)
Errorf/no-args/sink-16       2.000 ± 0%     1.000 ± 0%   -50.00% (n=120)
Errorf/int-arg/local-16      2.000 ± 0%     2.000 ± 0%         ~ (p=1.000 n=120)
Errorf/int-arg/sink-16       2.000 ± 0%     2.000 ± 0%         ~ (p=1.000 n=120)

source

If you're interested in the details, I highly recommend reading the CL — it's perfectly written.

𝗗 fmt.Errorf • 𝗖𝗟 708836 • 𝗔 thepudds

# Optimized io.ReadAll

Previously, io.ReadAll allocated a lot of intermediate memory as it grew its result slice to the size of the input data. Now, it uses intermediate slices of exponentially growing size, and then copies them into a final perfectly-sized slice at the end.

The new implementation is about twice as fast and uses roughly half the memory for a 65KiB input; it's even more efficient with larger inputs. Here are the geomean results comparing the old and new versions for different input sizes:

                      │     old     │      new       vs base    │
          sec/op           132.2µ        66.32µ     -49.83%
            B/op          645.4Ki       324.6Ki     -49.70%
  final-capacity           178.3k        151.3k     -15.10%
    excess-ratio            1.216         1.033     -15.10%

See the full benchmark results in the commit. Unfortunately, the author didn't provide the benchmark source code.

Ensuring the final slice is minimally sized is also quite helpful. The slice might persist for a long time, and the unused capacity in a backing array (as in the old version) would just waste memory.

As with the fmt.Errorf optimization, I recommend reading the CL — it's very good. Both changes come from thepudds, whose change descriptions are every reviewer's dream come true.

𝗗 io.ReadAll • 𝗖𝗟 722500 • 𝗔 thepudds

# Multiple log handlers

The log/slog package, introduced in version 1.21, offers a reliable, production-ready logging solution. Since its release, many projects have switched from third-party logging packages to use it. However, it was missing one key feature: the ability to send log records to multiple handlers, such as stdout or a log file.

The new MultiHandler type solves this problem. It implements the standard Handler interface and calls all the handlers you set up.

For example, we can create a log handler that writes to stdout:

stdoutHandler := slog.NewTextHandler(os.Stdout, nil)

And another handler that writes to a file:

const flags = os.O_CREATE | os.O_WRONLY | os.O_APPEND
file, _ := os.OpenFile("/tmp/app.log", flags, 0644)
defer file.Close()
fileHandler := slog.NewJSONHandler(file, nil)

Finally, combine them using a MultiHandler:

// MultiHandler that writes to both stdout and app.log.
multiHandler := slog.NewMultiHandler(stdoutHandler, fileHandler)
logger := slog.New(multiHandler)

// Log a sample message.
logger.Info("login",
    slog.String("name", "whoami"),
    slog.Int("id", 42),
)
time=2025-12-31T11:46:14.521Z level=INFO msg=login name=whoami id=42
{"time":"2025-12-31T11:46:14.521126342Z","level":"INFO","msg":"login","name":"whoami","id":42}

I'm also printing the file contents here to show the results.

When the MultiHandler receives a log record, it sends it to each enabled handler one by one. If any handler returns an error, MultiHandler doesn't stop; instead, it combines all the errors using errors.Join:

hInfo := slog.NewTextHandler(
    os.Stdout, &slog.HandlerOptions{Level: slog.LevelInfo},
)
hErrorsOnly := slog.NewTextHandler(
    os.Stdout, &slog.HandlerOptions{Level: slog.LevelError},
)
hBroken := &BrokenHandler{
    Handler: hInfo,
    err:     fmt.Errorf("broken handler"),
}

handler := slog.NewMultiHandler(hBroken, hInfo, hErrorsOnly)
rec := slog.NewRecord(time.Now(), slog.LevelInfo, "hello", 0)

// Calls hInfo and hBroken, skips hErrorsOnly.
// Returns an error from hBroken.
err := handler.Handle(context.Background(), rec)
fmt.Println(err)
time=2025-12-31T13:32:52.110Z level=INFO msg=hello
broken handler

The Enable method reports whether any of the configured handlers is enabled:

hInfo := slog.NewTextHandler(
    os.Stdout, &slog.HandlerOptions{Level: slog.LevelInfo},
)
hErrors := slog.NewTextHandler(
    os.Stdout, &slog.HandlerOptions{Level: slog.LevelError},
)
handler := slog.NewMultiHandler(hInfo, hErrors)

// hInfo is enabled.
enabled := handler.Enabled(context.Background(), slog.LevelInfo)
fmt.Println(enabled)
true

Other methods — WithAttr and WithGroup — call the corresponding methods on each of the enabled handlers.

𝗗 slog.MultiHandler • 𝗣 65954 • 𝗖𝗟 692237 • 𝗔 Jes Cok

# Test artifacts

Test artifacts are files created by tests or benchmarks, such as execution logs, memory dumps, or analysis reports. They are important for debugging failures in remote environments (like CI), where developers can't step through the code manually.

Previously, the Go test framework and tools didn't support test artifacts. Now they do.

The new methods T.ArtifactDir, B.ArtifactDir, and F.ArtifactDir return a directory where you can write test output files:

func TestFunc(t *testing.T) {
    dir := t.ArtifactDir()
    logFile := filepath.Join(dir, "app.log")
    content := []byte("Loading user_id=123...\nERROR: Connection failed\n")
    os.WriteFile(logFile, content, 0644)
    t.Log("Saved app.log")
}

If you use go test with -artifacts, this directory will be inside the output directory (specified by -outputdir, or the current directory by default):

go test -v -artifacts -outputdir=/tmp/output
=== RUN   TestFunc
=== ARTIFACTS TestFunc /tmp/output/_artifacts/2933211134
    artifacts_test.go:14: Saved app.log
--- PASS: TestFunc (0.00s)

As you can see, the first time ArtifactDir is called, it writes the directory location to the test log, which is quite handy.

If you don't use -artifacts, artifacts are stored in a temporary directory which is deleted after the test completes.

Each test or subtest within each package has its own unique artifact directory. Subtest outputs are not stored inside the parent test's output directory — all artifact directories for a given package are created at the same level:

func TestFunc(t *testing.T) {
    t.ArtifactDir()
    t.Run("subtest 1", func(t *testing.T) {
        t.ArtifactDir()
    })
    t.Run("subtest 2", func(t *testing.T) {
        t.ArtifactDir()
    })
}
=== RUN   TestFunc
=== ARTIFACTS TestFunc /tmp/output/_artifacts/2878232317
=== RUN   TestFunc/subtest_1
=== ARTIFACTS TestFunc/subtest_1 /tmp/output/_artifacts/1651881503
=== RUN   TestFunc/subtest_2
=== ARTIFACTS TestFunc/subtest_2 /tmp/output/_artifacts/3341607601

The artifact directory path normally looks like this:

<output dir>/_artifacts/<test package>/<test name>/<random>

But if this path can't be safely converted into a local file path (which, for some reason, always happens on my machine), the path will simply be:

<output dir>/_artifacts/<random>

(which is what happens in the examples above)

Repeated calls to ArtifactDir in the same test or subtest return the same directory.

𝗗 T.ArtifactDir • 𝗣 71287 • 𝗖𝗟 696399 • 𝗔 Damien Neil

# Modernized go fix

Over the years, the go fix command became a sad, neglected bag of rewrites for very ancient Go features. But now, it's making a comeback.

The new go fix is re-implemented using the Go analysis framework — the same one go vet uses.

While go fix and go vet now use the same infrastructure, they have different purposes and use different sets of analyzers:

  • Vet is for reporting problems. Its analyzers describe actual issues, but they don't always suggest fixes, and the fixes aren't always safe to apply.
  • Fix is (mostly) for modernizing the code to use newer language and library features. Its analyzers produce fixes are always safe to apply, but don't necessarily indicate problems with the code.
usage: go fix [build flags] [-fixtool prog] [fix flags] [packages]

Fix runs the Go fix tool (cmd/fix) on the named packages
and applies suggested fixes.

It supports these flags:

  -diff
        instead of applying each fix, print the patch as a unified diff

The -fixtool=prog flag selects a different analysis tool with
alternative or additional fixers.

By default, go fix runs a full set of analyzers (currently, there are more than 20). To choose specific analyzers, use the -NAME flag for each one, or use -NAME=false to run all analyzers except the ones you turned off.

For example, here we only enable the forvar analyzer:

go fix -forvar .

And here, we enable all analyzers except omitzero:

go fix -omitzero=false .

Currently, there's no way to suppress specific analyzers for certain files or sections of code.

To give you a taste of go fix analyzers, here's one of them in action. It replaces loops with slices.Contains or slices.ContainsFunc:

// before go fix
func find(s []int, x int) bool {
    for _, v := range s {
        if x == v {
            return true
        }
    }
    return false
}
// after go fix
func find(s []int, x int) bool {
    return slices.Contains(s, x)
}

If you're interested, check out the dedicated blog post for the full list of analyzers with examples.

𝗗 cmd/fix • 𝗚 go fix • 𝗣 71859 • 𝗔 Alan Donovan

# Final thoughts

Go 1.26 is incredibly big — it's the largest release I've ever seen, and for good reason:

  • It brings a lot of useful updates, like the improved new builtin, type-safe error checking, and goroutine leak detector.
  • There are also many performance upgrades, including the new garbage collector, faster cgo and memory allocation, and optimized fmt.Errorf and io.ReadAll.
  • On top of that, it adds quality-of-life features like multiple log handlers, test artifacts, and the updated go fix tool.
  • Finally, there are two specialized experimental packages: one with SIMD support and another with protected mode for forward secrecy.

All in all, a great release!

You might be wondering about the json/v2 package that was introduced as experimental in 1.25. It's still experimental and available with the GOEXPERIMENT=jsonv2 flag.

P.S. To catch up on other Go releases, check out the Go features by version list or explore the interactive tours for Go 1.25 and 1.24.

P.P.S. Want to learn more about Go? Check out my interactive book on concurrency

]]>
Fear is not advocacyhttps://antonz.org/ai-advocacy/Sun, 04 Jan 2026 12:00:00 +0000https://antonz.org/ai-advocacy/And you are going to be fine.AI advocates seem to be the only kind of technology advocates who feel this imminent urge to constantly criticize developers for not being excited enough about their tech.

It would be crazy if I presented new Go features like this:

If you still don't use the synctest package, all your systems will eventually succumb to concurrency bugs.

or

If you don't use iterators, you have absolutely nothing interesting to build.

The job of an advocate is to spark interest, not to reproach people or instill FOMO. And yet that's exactly what AI advocates do.

What a weird way to advocate.

It's okay not to be early

This whole "devote your life to AI right now, or you'll be out of a job soon" narrative is false.

You don't have to be a world-class algorithm expert to write good software. You don't have to be a Linux expert to use containers. And you don't have to spend all your time now trying to become an expert in chasing ever-changing AI tech.

As with any new technology, developers adopting AI typically fall into four groups: early adopters, early majority, late majority, and laggards. Right now, AI advocates are trying to shame everyone into becoming early adopters. But it's perfectly okay to wait if you're sceptical. Being part of the late majority is a safe and reasonable choice. If anything, you'll have fewer bugs to deal with.

As the industry adopts AI practices, you'll naturally absorb just the right amount of them.

You are going to be fine.

]]>
2026https://antonz.org/2026/Thu, 01 Jan 2026 00:00:00 +0000https://antonz.org/2026/'Better C' playgroundshttps://antonz.org/better-c/Fri, 26 Dec 2025 19:00:00 +0000https://antonz.org/better-c/Playgrounds for C3, Hare, Odin, V, and Zig.I have a soft spot for the "better C" family of languages: C3, Hare, Odin, V, and Zig.

I'm not saying these languages are actually better than C — they're just different. But I needed to come up with an umbrella term for them, and "better C" was the only thing that came to mind.

I believe playgrounds and interactive documentation make programming languages easier for more people to learn. That's why I created online sandboxes for these langs. You can try them out below, embed them on your own website, or self-host and customize them.

If you're already familiar with one of these languages, maybe you could even create an interactive guide for it? I'm happy to help if you want to give it a try.

C3 • Hare • Odin • V • Zig • Editors

C3

An ergonomic, safe, and familiar evolution of C.

import std::io;

fn void greet(String name)
{
    io::printfn("Hello, %s!", name);
}

fn void main()
{
    greet("World");
}
Hello, World!

⛫ homepage • αω tutorial • ⚘ community

Hare

A systems programming language designed to be simple, stable, and robust.

use fmt;

fn greet(user: str) void = {
	fmt::printfln("Hello, {}!", user)!;
};

export fn main() void = {
	greet("World");
};
Hello, World!

⛫ homepage • αω tutorial • ⚘ community

Odin

A high-performance, data-oriented systems programming language.

package main

import "core:fmt"

greet :: proc(name: string) {
    fmt.printf("Hello, %s!\n", name)
}

main :: proc() {
    greet("World")
}
Hello, World!

⛫ homepage • αω tutorial • ⚘ community

V

A language with C-level performance and rapid compilation speeds.

fn greet(name string) {
	println('Hello, ${name}!')
}

fn main() {
    greet("World")
}
Hello, World!

⛫ homepage • αω tutorial • ⚘ community

Zig

A language designed for performance and explicit control with powerful metaprogramming.

const std = @import("std");

pub fn greet(name: []const u8) void {
    std.debug.print("Hello, {s}!\n", .{name});
}

pub fn main() void {
    greet("World");
}
Hello, World!

⛫ homepage • αω tutorial • ⚘ community

Editors

If you want to do more than just "hello world," there are also full-size online editors. They're pretty basic, but still can be useful.

]]>