Gist of Go: Concurrency testing

This is a chapter from my book on Go concurrency, which teaches the topic from the ground up through interactive examples.

Testing concurrent programs is a lot like testing single-task programs. If the code is well-designed, you can test the state of a concurrent program with standard tools like channels, wait groups, and other abstractions built on top of them.

But if you've made it so far, you know that concurrency is never that easy. In this chapter, we'll go over common testing problems and the solutions that Go offers.

Waiting for goroutines • Checking channels • Checking for leaks • Durable blocking • Instant waiting • Time inside the bubble • Thoughts on time 1 ✎ • Thoughts on time 2 ✎ • Checking for cleanup • Bubble rules • Keep it up

Waiting for goroutines to finish

Let's say we want to test this function:

// Calc calculates something asynchronously.
func Calc() <-chan int {
    out := make(chan int, 1)
    go func() {
        out <- 42
    }()
    return out
}

Calculations run asynchronously in a separate goroutine. However, the function returns a result channel, so this isn't a problem:

func Test(t *testing.T) {
    got := <-Calc() // (X)
    if got != 42 {
        t.Errorf("got: %v; want: 42", got)
    }
}

PASS

At point ⓧ, the test is guaranteed to wait for the inner goroutine to finish. The rest of the test code doesn't need to know anything about how concurrency works inside the Calc function. Overall, the test isn't any more complicated than if Calc were synchronous.

But we're lucky that Calc returns a channel. What if it doesn't?

Naive approach

Let's say the Calc function looks like this:

var state atomic.Int32

// Calc calculates something asynchronously.
func Calc() {
    go func() {
        state.Store(42)
    }()
}

We write a simple test and run it:

func TestNaive(t *testing.T) {
    Calc()
    got := state.Load() // (X)
    if got != 42 {
        t.Errorf("got: %v; want: 42", got)
    }
}

=== RUN   TestNaive
    main_test.go:27: got: 0; want: 42
--- FAIL: TestNaive (0.00s)

The assertion fails because at point ⓧ, we didn't wait for the inner Calc goroutine to finish. In other words, we didn't synchronize the TestNaive and Calc goroutines. That's why state still has its initial value (0) when we do the check.

Waiting with time.Sleep

We can add a short delay with time.Sleep:

func TestSleep(t *testing.T) {
    Calc()

    // Wait for the goroutine to finish (if we're lucky).
    time.Sleep(50 * time.Millisecond)

    got := state.Load()
    if got != 42 {
        t.Errorf("got: %v; want: 42", got)
    }
}

=== RUN   TestSleep
--- PASS: TestSleep (0.05s)

The test is now passing. But using time.Sleep to sync goroutines isn't a great idea, even in tests. We don't want to set a custom delay for every function we're testing. Also, the function's execution time may be different on the local machine compared to a CI server. If we use a longer delay just to be safe, the tests will end up taking too long to run.

Sometimes you can't avoid using time.Sleep in tests, but since Go 1.25, the synctest package has made these cases much less common. Let's see how it works.

Waiting with synctest

The synctest package has a lot going on under the hood, but its public API is very simple:

func Test(t *testing.T, f func(*testing.T))
func Wait()

The synctest.Test function creates an isolated bubble where you can control time to some extent. Any new goroutines started inside this bubble become part of the bubble. So, if we wrap the test code with synctest.Test, everything will run inside the bubble — the test code, the Calc function we're testing, and its goroutine.

func TestSync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        Calc()

        // (X)

        got := state.Load()
        if got != 42 {
            t.Errorf("got: %v; want: 42", got)
        }
    })
}

At point ⓧ, we want to wait for the Calc goroutine to finish. The synctest.Wait function comes to the rescue! It blocks the calling goroutine until all other goroutines in the bubble are finished. (It's actually a bit more complicated than that, but we'll talk about it later.)

In our case, there's only one other goroutine (the inner Calc goroutine), so Wait will pause until it finishes, and then the test will move on.

func TestSync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        Calc()

        // Wait for the goroutine to finish.
        synctest.Wait()

        got := state.Load()
        if got != 42 {
            t.Errorf("got: %v; want: 42", got)
        }
    })
}

=== RUN   TestSync
--- PASS: TestSync (0.00s)

Now the test passes instantly. That's better!

✎ Exercise: Wait until done

Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.

If you are okay with just theory for now, let's continue.

Checking the channel state

As we've seen, you can use synctest.Wait to wait for the tested goroutine to finish, and then check the state of the data you are interested in. You can also use it to check the state of channels.

Let's say there's a function that generates N numbers like 11, 22, 33, and so on:

// Generate produces n numbers like 11, 22, 33, ...
func Generate(n int) <-chan int {
    out := make(chan int)
    go func() {
        for i := range n {
            out <- (i+1)*10 + (i + 1)
        }
    }()
    return out
}

And a simple test:

func Test(t *testing.T) {
    out := Generate(2)
    var got int

    got = <-out
    if got != 11 {
        t.Errorf("#1: got %v, want 11", got)
    }
    got = <-out
    if got != 22 {
        t.Errorf("#1: got %v, want 22", got)
    }
}

PASS

Set N=2, get the first number from the generator's output channel, then get the second number. The test passed, so the function works correctly. But does it really?

Let's use Generate in "production":

func main() {
    for v := range Generate(3) {
        fmt.Print(v, " ")
    }
}

11 22 33 fatal error: all goroutines are asleep - deadlock!

Panic! We forgot to close the out channel when exiting the inner Generate goroutine, so the for-range loop waiting on that channel got stuck.

Let's fix the code:

// Generate produces n numbers like 11, 22, 33, ...
func Generate(n int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for i := range n {
            out <- (i+1)*10 + (i + 1)
        }
    }()
    return out
}

And add a test for the out channel state:

func Test(t *testing.T) {
    out := Generate(2)
    <-out // 11
    <-out // 22

    // (X)

    // Check that the channel is closed.
    select {
    case _, ok := <-out:
        if ok {
            t.Errorf("expected channel to be closed")
        }
    default:
        t.Errorf("expected channel to be closed")
    }
}

--- FAIL: Test (0.00s)
    main_test.go:41: expected channel to be closed

The test is still failing, even though we're now closing the channel when the Generate goroutine exits.

This is a familiar problem: at point ⓧ, we didn't wait for the inner Generate goroutine to finish. So when we check the out channel, it hasn't closed yet. That's why the test fails.

We can delay the check using time.After:

func Test(t *testing.T) {
    out := Generate(2)
    <-out
    <-out

    // Check that the channel is closed.
    select {
    case _, ok := <-out:
        if ok {
            t.Errorf("expected channel to be closed")
        }
    case <-time.After(50 * time.Millisecond):
        t.Fatalf("timeout waiting for channel to close")
    }
}

PASS

But it's better to use synctest:

func TestClose(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        out := Generate(2)
        <-out
        <-out

        // Wait for the goroutine to finish.
        synctest.Wait()

        // Check that the channel is closed.
        select {
        case _, ok := <-out:
            if ok {
                t.Errorf("expected channel to be closed")
            }
        default:
            t.Errorf("expected channel to be closed")
        }
    })
}

PASS

At point ⓧ, synctest.Wait blocks the test until the only other goroutine (the inner Generate goroutine) finishes. Once the goroutine has exited, the channel is already closed. So, in the select statement, the <-out case triggers with ok set to false, allowing the test to pass.

As you can see, the synctest package helped us avoid delays in the test, and the test itself didn't get much more complicated.

Checking for goroutine leaks

As we've seen, you can use synctest.Wait to wait for the tested goroutine to finish, and then check the state of the data or channels. You can also use it to detect goroutine leaks.

Let's say there's a function that runs the given functions concurrently and sends their results to an output channel:

// Map runs the given functions concurently.
func Map(funcs ...func() int) <-chan int {
    out := make(chan int)
    for _, f := range funcs {
        go func() {
            out <- f()
        }()
    }
    return out
}

And a simple test:

func Test(t *testing.T) {
    out := Map(
        func() int { return 11 },
        func() int { return 22 },
        func() int { return 33 },
    )

    got := <-out
    if got != 11 && got != 22 && got != 33 {
        t.Errorf("got %v, want 11, 22 or 33", got)
    }
}

PASS

Send three functions to be executed, get the first result from the output channel, and check it. The test passed, so the function works correctly. But does it really?

Let's run Map three times, passing three functions each time:

func main() {
    for range 3 {
        Map(
            func() int { return 11 },
            func() int { return 22 },
            func() int { return 33 },
        )
    }

    time.Sleep(50 * time.Millisecond)
    nGoro := runtime.NumGoroutine() - 1 // minus the main goroutine
    fmt.Println("nGoro =", nGoro)
}

nGoro = 9

After 50 ms — when all the functions should definitely have finished — there are still 9 running goroutines (runtime.NumGoroutine). In other words, all the goroutines are stuck.

The reason is that the out channel is unbuffered. If the client doesn't read from it, or doesn't read all the results, the goroutines inside Map get blocked when they try to send the result of f() to out.

Let's fix this by adding a buffer of the right size to the channel:

// Map runs the given functions concurently.
func Map(funcs ...func() int) <-chan int {
    out := make(chan int, len(funcs))
    for _, f := range funcs {
        go func() {
            out <- f()
        }()
    }
    return out
}

Then add a test to check the number of goroutines:

func Test(t *testing.T) {
    for range 3 {
        Map(
            func() int { return 11 },
            func() int { return 22 },
            func() int { return 33 },
        )
    }

    // (X)

    nGoro := runtime.NumGoroutine() - 2 // minus the main and Test goroutines

    if nGoro != 0 {
        t.Fatalf("expected 0 goroutines, got %d", nGoro)
    }
}

--- FAIL: Test (0.00s)
    main_test.go:44: expected 0 goroutines, got 9

The test is still failing, even though the channel is now buffered, and the goroutines shouldn't block on sending to it.

This is a familiar problem: at point ⓧ, we didn't wait for the running Map goroutines to finish. So nGoro is greater than zero, which makes the test fail.

We can delay the check using time.Sleep (not recommended), or use a third-party package like goleak (a better option):

func Test(t *testing.T) {
    defer goleak.VerifyNone(t)

    for range 3 {
        Map(
            func() int { return 11 },
            func() int { return 22 },
            func() int { return 33 },
        )
    }
}

PASS

The test passes now.

By the way, goleak also uses time.Sleep internally, but it does so much more efficiently. It tries up to 20 times, with the wait time between checks increasing exponentially, starting at 1 microsecond and going up to 100 milliseconds. This way, the test runs almost instantly.

Even better, we can check for leaks without any third-party packages by using synctest:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        for range 3 {
            Map(
                func() int { return 11 },
                func() int { return 22 },
                func() int { return 33 },
            )
        }
        synctest.Wait()
    })
}

PASS

Earlier, I said that synctest.Wait blocks the calling goroutine until all other goroutines finish. Actually, it's a bit more complicated. synctest.Wait blocks until all other goroutines either finish or become durably blocked.

We'll talk about "durably" later. For now, let's focus on "become blocked." Let's temporarily remove the buffer from the channel and check the test results:

// Map runs the given functions concurently.
func Map(funcs ...func() int) <-chan int {
    out := make(chan int)
    for _, f := range funcs {
        go func() {
            out <- f()
        }()
    }
    return out
}

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        for range 3 {
            Map(
                func() int { return 11 },
                func() int { return 22 },
                func() int { return 33 },
            )
        }
        synctest.Wait()
    })
}

--- FAIL: Test (0.00s)
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked]

Here's what happens:

Three calls to Map start 9 goroutines.
The call to synctest.Wait blocks the root bubble goroutine (synctest.Test).
One of the goroutines finishes its work, tries to write to out, and gets blocked (because no one is reading from out).
The same thing happens to the other 8 goroutines.
synctest.Wait sees that all the child goroutines in the bubble are blocked, so it unblocks the root goroutine.
The root goroutine finishes.

Next, synctest.Test comes into play. It not only starts the bubble goroutine, but also tries to wait for all child goroutines to finish before it returns. If Test sees that some goroutines are stuck (in our case, all 9 are blocked trying to send to the channel), it panics:

main bubble goroutine has exited but blocked goroutines remain

So, we found the leak without using time.Sleep or goleak, thanks to the useful features of synctest.Wait and synctest.Test:

synctest.Wait unblocks as soon as all other goroutines are durably blocked.
synctest.Test panics when finished if there are still blocked goroutines left in the bubble.

Now let's make the channel buffered and run the test again:

=== RUN   Test
--- PASS: Test (0.00s)

Perfect!

Durable blocking

As we've found, synctest.Wait blocks until all goroutines in the bubble — except the one that called Wait — have either finished or are durably blocked. Let's figure out what "durably blocked" means.

For synctest, a goroutine inside a bubble is considered durably blocked if it is blocked by any of the following operations:

Sending to or receiving from a channel created within the bubble.
A select statement where every case is a channel created within the bubble.
Calling WaitGroup.Wait if all WaitGroup.Add calls were made inside the bubble.
Calling Cond.Wait.
Calling time.Sleep.

Other blocking operations are not considered durable, and synctest.Wait ignores them. For example:

Sending to or receiving from a channel created outside the bubble.
Calling Mutex.Lock or RWMutex.Lock.
I/O operations (like reading a file from disk or waiting for a network response).
System calls and cgo calls.

The distinction between "durable" and other types of blocks is just a implementation detail of the synctest package. It's not a fundamental property of the blocking operations themselves. In real-world applications, this distinction doesn't exist, and "durable" blocks are neither better nor worse than any others.

Let's look at an example.

Asynchronous processor

Let's say there's a Proc type that performs some asynchronous computation:

// Proc calculates something asynchronously.
type Proc struct {
    // ...
}

// NewProc starts the calculation in a separate goroutine.
// The calculation keep running until Stop is called.
func NewProc() *Proc

// Res returns the current calculation result.
// It's only available until Stop is called; after that, it resets to zero.
func (p *Proc) Res() int

// Stop terminates the calculation.
func (p *Proc) Stop()

Our goal is to write a test that checks the result while the calculation is still running. Let's see how the test changes depending on how Proc is implemented (except for the time.Sleep version — we'll cover that one a bit later).

Blocking on a channel

Let's say Proc is implemented using a done channel:

// Proc calculates something asynchronously.
type Proc struct {
    res  int
    done chan struct{}
}

// NewProc starts the calculation.
func NewProc() *Proc {
    p := &Proc{done: make(chan struct{})}
    go func() {
        p.res = 42
        <-p.done // (X)
        p.res = 0
    }()
    return p
}

// Stop terminates the calculation.
func (p *Proc) Stop() {
    close(p.done)
}

Naive test:

func TestNaive(t *testing.T) {
    p := NewProc()
    defer p.Stop()

    if got := p.Res(); got != 42 {
        t.Fatalf("got %v, want 42", got)
    }
}

--- FAIL: TestNaive (0.00s)
    main_test.go:52: got 0, want 42

The check fails because when p.Res() is called, the goroutine in NewProc hasn't set p.res = 42 yet.

Let's use synctest.Wait to wait until the goroutine is blocked at point ⓧ:

func TestSync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        p := NewProc()
        defer p.Stop()

        // Wait for the goroutine to block at point X.
        synctest.Wait()
        if got := p.Res(); got != 42 {
            t.Fatalf("got %v, want 42", got)
        }
    })
}

PASS

In ⓧ, the goroutine is blocked on reading from the p.done channel. This channel is created inside the bubble, so the block is durable. The synctest.Wait call in the test returns as soon as <-p.done happens, and we get the current value of p.res.

Blocking on a select

Let's say Proc is implemented using select:

// Proc calculates something asynchronously.
type Proc struct {
    res  int
    in   chan int
    done chan struct{}
}

// NewProc starts the calculation.
func NewProc() *Proc {
    p := &Proc{
        res:  0,
        in:   make(chan int),
        done: make(chan struct{}),
    }
    go func() {
        p.res = 42
        select { // (X)
        case n := <-p.in:
            p.res = n
        case <-p.done:
        }
    }()
    return p
}

// Stop terminates the calculation.
func (p *Proc) Stop() {
    close(p.done)
}

Let's use synctest.Wait to wait until the goroutine is blocked at point ⓧ:

func TestSync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        p := NewProc()
        defer p.Stop()

        // Wait for the goroutine to block at point X.
        synctest.Wait()
        if got := p.Res(); got != 42 {
            t.Fatalf("got %v, want 42", got)
        }
    })
}

PASS

In ⓧ, the goroutine is blocked on a select statement. Both channels used in the select (p.in and p.done) are created inside the bubble, so the block is durable. The synctest.Wait call in the test returns as soon as select happens, and we get the current value of p.res.

Blocking on a wait group

Let's say Proc is implemented using a wait group:

// Proc calculates something asynchronously.
type Proc struct {
    res int
    wg  sync.WaitGroup
}

// NewProc starts the calculation.
func NewProc() *Proc {
    p := &Proc{}
    p.wg.Add(1)
    go func() {
        p.res = 42
        p.wg.Wait() // (X)
        p.res = 0
    }()
    return p
}

// Stop terminates the calculation.
func (p *Proc) Stop() {
    p.wg.Done()
}

Let's use synctest.Wait to wait until the goroutine is blocked at point ⓧ:

func TestSync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        p := NewProc()
        defer p.Stop()

        // Wait for the goroutine to block at point X.
        synctest.Wait()
        if got := p.Res(); got != 42 {
            t.Fatalf("got %v, want 42", got)
        }
    })
}

PASS

In ⓧ, the goroutine is blocked on the wait group's p.wg.Wait() call. The group's Add method was called inside the bubble, so this is a durable block. The synctest.Wait call in the test returns as soon as p.wg.Wait() happens, and we get the current value of p.res.

Blocking on a condition variable

Let's say Proc is implemented using a condition variable:

// Proc calculates something asynchronously.
type Proc struct {
    res  int
    cond *sync.Cond
}

// NewProc starts the calculation.
func NewProc() *Proc {
    p := &Proc{
        cond: sync.NewCond(&sync.Mutex{}),
    }
    go func() {
        p.cond.L.Lock()
        p.res = 42
        p.cond.Wait() // (X)
        p.res = 0
        p.cond.L.Unlock()
    }()
    return p
}

// Stop terminates the calculation.
func (p *Proc) Stop() {
    p.cond.Signal()
}

Let's use synctest.Wait to wait until the goroutine is blocked at point ⓧ:

func TestSync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        p := NewProc()
        defer p.Stop()

        // Wait for the goroutine to block at point X.
        synctest.Wait()
        if got := p.Res(); got != 42 {
            t.Fatalf("got %v, want 42", got)
        }
    })
}

PASS

In ⓧ, the goroutine is blocked on the condition variable's p.cond.Wait() call. This is a durable block. The synctest.Wait call returns as soon as p.cond.Wait() happens, and we get the current value of p.res.

Blocking on a mutex

Let's say Proc is implemented using a mutex:

// Proc calculates something asynchronously.
type Proc struct {
    res int
    mu  sync.Mutex
}

// NewProc starts the calculation.
func NewProc() *Proc {
    p := &Proc{}
    p.mu.Lock()
    go func() {
        p.res = 42
        p.mu.Lock() // (X)
        p.res = 0
        p.mu.Unlock()
    }()
    return p
}

// Stop terminates the calculation.
func (p *Proc) Stop() {
    p.mu.Unlock()
}

Let's try using synctest.Wait to wait until the goroutine is blocked at point ⓧ:

func TestSync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        p := NewProc()
        defer p.Stop()

        // Hangs because synctest ignores blocking on a mutex.
        synctest.Wait()
        if got := p.Res(); got != 42 {
            t.Fatalf("got %v, want 42", got)
        }
    })
}

code execution timeout

In ⓧ, the goroutine is blocked on the mutex's p.mu.Lock() call. synctest doesn't consider blocking on a mutex to be durable. The synctest.Wait call ignores the block and never returns. The test hangs and only fails when the overall go test timeout is reached.

You might be wondering why the synctest authors didn't consider blocking on mutexes to be durable. There are a couple of reasons:

Mutexes are usually used to protect shared state, not to coordinate goroutines (the example above is completely unrealistic). In tests, you usually don't need to pause before locking a mutex to check something.
Mutex locks are usually held for a very short time, and mutexes themselves need to be as fast as possible. Adding extra logic to support synctest could slow them down in normal (non-test) situations.

⌘ ⌘ ⌘

Let's go back to the original question: how does the test change depending on how Proc is implemented? It doesn't change at all. We used the exact same test code every time:

func TestSync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        p := NewProc()
        defer p.Stop()

        synctest.Wait()
        if got := p.Res(); got != 42 {
            t.Fatalf("got %v, want 42", got)
        }
    })
}

If your program uses durably blocking operations, synctest.Wait always works the same way:

It waits until all other goroutines in the bubble are blocked.
Then, it unblocks the goroutine that called it.

Very convenient!

✎ Exercise: Blocking queue

Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.

If you are okay with just theory for now, let's continue.

Instant waiting

Inside the synctest.Test bubble, time works differently. Instead of using a regular wall clock, the bubble uses a fake clock that can jump forward to any point in the future. This can be quite handy when testing time-sensitive code.

Let's say we want to test this function:

// Calc processes a value from the input channel.
// Times out if no input is received after 3 seconds.
func Calc(in chan int) (int, error) {
    select {
    case v := <-in:
        return v * 2, nil
    case <-time.After(3 * time.Second):
        return 0, ErrTimeout
    }
}

The positive scenario is straightforward: send a value to the channel, call the function, and check the result:

func TestCalc_result(t *testing.T) {
    ch := make(chan int)
    go func() { ch <- 11 }()
    got, err := Calc(ch)

    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }
    if got != 22 {
        t.Errorf("got: %v; want: 22", got)
    }
}

PASS

The negative scenario, where the function times out, is also pretty straightforward. But the test takes the full three seconds to complete:

func TestCalc_timeout_naive(t *testing.T) {
    ch := make(chan int)
    got, err := Calc(ch) // runs for 3 seconds

    if err != ErrTimeout {
        t.Errorf("got: %v; want: %v", err, ErrTimeout)
    }
    if got != 0 {
        t.Errorf("got: %v; want: 0", got)
    }
}

=== RUN   TestCalc_timeout_naive
--- PASS: TestCalc_timeout_naive (3.00s)

We're actually lucky the timeout is only three seconds. It could have been as long as sixty!

To make the test run instantly, let's wrap it in synctest.Test:

func TestCalc_timeout_synctest(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        ch := make(chan int)
        got, err := Calc(ch) // runs instantly

        if err != ErrTimeout {
            t.Errorf("got: %v; want: %v", err, ErrTimeout)
        }
        if got != 0 {
            t.Errorf("got: %v; want: 0", got)
        }
    })
}

=== RUN   TestCalc_timeout_synctest
--- PASS: TestCalc_timeout_synctest (0.00s)

Note that there is no synctest.Wait call here, and the only goroutine in the bubble (the root one) gets durably blocked on a select statement in Calc. Here's what happens next:

The bubble checks if the goroutine can be unblocked by waiting. In our case, it can — we just need to wait 3 seconds.
The bubble's clock instantly jumps forward 3 seconds.
The select in Calc chooses the timeout case, and the function returns ErrTimeout.
The test assertions for err and got both pass successfully.

Thanks to the fake clock, the test runs instantly instead of taking three seconds like it would with the "naive" approach.

You might have noticed that quite a few circumstances coincided here:

There's no synctest.Wait call.
There's only one goroutine.
The goroutine is durably blocked.
It will be unblocked at certain point in the future.

We'll look at the alternatives soon, but first, here's a quick exercise.

✎ Exercise: Wait, repeat

Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.

If you are okay with just theory for now, let's continue.

Time inside the bubble

The fake clock in synctest.Test can be tricky. It move forward only if: ➊ all goroutines in the bubble are durably blocked; ➋ there's a future moment when at least one goroutine will unblock; and ➌ synctest.Wait isn't running.

Let's look at the alternatives. I'll say right away, this isn't an easy topic. But when has time travel ever been easy? :)

Not all goroutines are blocked

Here's the Calc function we're testing:

// Calc processes a value from the input channel.
// Times out if no input is received after 3 seconds.
func Calc(in chan int) (int, error) {
    select {
    case v := <-in:
        return v * 2, nil
    case <-time.After(3 * time.Second):
        return 0, ErrTimeout
    }
}

Let's run Calc in a separate goroutine, so there will be two goroutines in the bubble:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        var got int
        var err error

        go func() {
            ch := make(chan int)
            got, err = Calc(ch)
        }()

        if err != ErrTimeout {
            t.Errorf("got: %v; want: %v", err, ErrTimeout)
        }
        if got != 0 {
            t.Errorf("got: %v; want: 0", got)
        }
    })
}

--- FAIL: Test (0.00s)
    main_test.go:45: got: <nil>; want: timeout
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked]

synctest.Test panicked because the root bubble goroutine finished while the Calc goroutine was still blocked on a select.

Reason: synctest.Test only advances the clock if all goroutines are blocked — including the root bubble goroutine.

How to fix: Use time.Sleep to make sure the root goroutine is also durably blocked.

func Test_fixed(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        ch := make(chan int)
        var got int

        go func() {
            got, _ = Calc(ch)
        }()

        // Wait for the Calc goroutine to finish.
        time.Sleep(5 * time.Second)

        if got != 0 {
            t.Errorf("got: %v; want: 0", got)
        }
    })
}

PASS

Now all three conditions are met again (all goroutines are durably blocked; the moment of future unblocking is known; there is no call to synctest.Wait). The fake clock moves forward 3 seconds, which unblocks the Calc goroutine. The goroutine finishes, leaving only the root one, which is still blocked on time.Sleep. The clock moves forward another 2 seconds, unblocking the root goroutine. The assertion passes, and the test completes successfully.

But if we run the test with the race detector enabled (using the -race flag), it reports a data race on the got variable:

race detected during execution of test

Logically, using time.Sleep in the root goroutine doesn't guarantee that the Calc goroutine (which writes to the got variable) will finish before the root goroutine reads from got. That's why the race detector reports a problem. Technically, the test passes because of how synctest is implemented, but the race still exists in the code. The right way to handle this is to call synctest.Wait after time.Sleep:

func Test_fixed(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        ch := make(chan int)
        var got int

        go func() {
            got, _ = Calc(ch)
        }()

        // Wait for the Calc goroutine to finish.
        time.Sleep(3 * time.Second)
        synctest.Wait()

        if got != 0 {
            t.Errorf("got: %v; want: 0", got)
        }
    })
}

PASS

Calling synctest.Wait ensures that the Calc goroutine finishes before the root goroutine reads got, so there's no data race anymore.

synctest.Wait is running

Here's the Calc function we're testing:

// Calc processes a value from the input channel.
// Times out if no input is received after 3 seconds.
func Calc(in chan int) (int, error) {
    select {
    case v := <-in:
        return v * 2, nil
    case <-time.After(3 * time.Second):
        return 0, ErrTimeout
    }
}

Let's replace time.Sleep() in the root goroutine with synctest.Wait():

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        var got int
        var err error

        go func() {
            ch := make(chan int)
            got, err = Calc(ch)
        }()

        // Doesn't wait for the Calc goroutine to finish.
        synctest.Wait()

        if err != ErrTimeout {
            t.Errorf("got: %v; want: %v", err, ErrTimeout)
        }
        if got != 0 {
            t.Errorf("got: %v; want: 0", got)
        }
    })
}

--- FAIL: Test (0.00s)
    main_test.go:48: got: <nil>; want: timeout
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked]

synctest.Test panicked because the root bubble goroutine finished while the Calc goroutine was still blocked on a select.

Reason: synctest.Test only advances the clock if there is no active synctest.Wait running.

If all bubble goroutines are durably blocked but a synctest.Wait is running, synctest.Test won't advance the clock. Instead, it will simply finish the synctest.Wait call and return control to the goroutine that called it (in this case, the root bubble goroutine).

How to fix: don't use synctest.Wait.

The moment of unblocking is unclear

Let's update Calc to use context cancellation instead of a timer:

// Calc processes a value from the input channel.
// Exits if the context is canceled.
func Calc(in chan int, ctx context.Context) (int, error) {
    select {
    case v := <-in:
        return v * 2, nil
    case <-ctx.Done():
        return 0, ctx.Err()
    }
}

We won't cancel the context in the test:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        ch := make(chan int)
        ctx, _ := context.WithCancel(context.Background())
        got, err := Calc(ch, ctx)

        if err != nil {
            t.Errorf("got: %v; want: nil", err)
        }
        if got != 0 {
            t.Errorf("got: %v; want: 0", got)
        }
    })
}

--- FAIL: Test (0.00s)
panic: deadlock: all goroutines in bubble are blocked [recovered, repanicked]

synctest.Test panicked because all goroutines in the bubble are hopelessly blocked.

Reason: synctest.Test only advances the clock if it knows how much to advance it. In this case, there is no future moment that would unblock the select in Calc.

How to fix: Manually unblock the goroutine and call synctest.Wait to wait for it to finish.

func Test_fixed(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        var got int
        var err error
        ctx, cancel := context.WithCancel(context.Background())

        go func() {
            ch := make(chan int)
            got, err = Calc(ch, ctx)
        }()

        // Unblock the Calc goroutine.
        cancel()
        // Wait for it to finish.
        synctest.Wait()

        if err != context.Canceled {
            t.Errorf("got: %v; want: %v", err, context.Canceled)
        }
        if got != 0 {
            t.Errorf("got: %v; want: 0", got)
        }
    })
}

PASS

Now, cancel() cancels the context and unblocks the select in Calc, while synctest.Wait makes sure the Calc goroutine finishes before the test checks got and err.

The goroutine isn't durably blocked

Let's update Calc to lock the mutex before doing any calculations:

// Calc processes a value and returns the result.
func Calc(v int, mu *sync.Mutex) int {
    mu.Lock()
    defer mu.Unlock()
    v = v * 2
    return v
}

In the test, we'll lock the mutex before calling Calc, so it will block:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        var mu sync.Mutex
        mu.Lock()

        go func() {
            time.Sleep(10 * time.Millisecond)
            mu.Unlock()
        }()

        got := Calc(11, &mu)

        if got != 22 {
            t.Errorf("got: %v; want: 22", got)
        }
    })
}

code execution timeout

The test failed because it hit the overall timeout set in go test.

Reason: synctest.Test only works with durable blocks. Blocking on a mutex lock isn't considered durable, so the bubble can't do anything about it — even though the sleeping inner goroutine would have unlocked the mutex in 10 ms if the bubble had used the wall clock.

How to fix: Don't use synctest.

func Test_fixed(t *testing.T) {
    var mu sync.Mutex
    mu.Lock()

    go func() {
        time.Sleep(10 * time.Millisecond)
        mu.Unlock()
    }()

    got := Calc(11, &mu)

    if got != 22 {
        t.Errorf("got: %v; want: 22", got)
    }
}

PASS

Now the mutex unlocks after 10 milliseconds (wall clock), Calc finishes successfully, and the got check passes.

Summary

The clock inside the buuble won't move forward if:

There are any goroutines that aren't durably blocked.
It's unclear how much time to advance.
synctest.Wait is running.

Phew.

✎ Exercise: Asynchronous repeater

Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.

If you are okay with just theory for now, let's continue.

✎ Thoughts on time 1

Let's practice understanding time in the bubble with some thinking exercises. Try to solve the problem in your head before using the playground.

Here's a function that performs synchronous work:

var done atomic.Bool

// workSync performs synchronous work.
func workSync() {
    time.Sleep(3 * time.Second)
    done.Store(true)
}

And a test for it:

func TestWorkSync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        workSync()

        // (X)

        if !done.Load() {
            t.Errorf("work not done")
        }
    })
}

What is the test missing at point ⓧ?

synctest.Wait()
time.Sleep(3 * time.Second)
synctest.Wait, then time.Sleep
time.Sleep, then synctest.Wait
Nothing.

func TestWorkSync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        workSync()

        // (X)

        if !done.Load() {
            t.Errorf("work not done")
        }
    })
}

✎ Thoughts on time 2

Let's keep practicing our understanding of time in the bubble with some thinking exercises. Try to solve the problem in your head before using the playground.

Here's a function that performs asynchronous work:

var done atomic.Bool

// workAsync performs asynchronous work.
func workAsync() {
    go func() {
        time.Sleep(3 * time.Second)
        done.Store(true)
    }()
}

And a test for it:

func TestWorkAsync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        workAsync()

        // (X)

        if !done.Load() {
            t.Errorf("work not done")
        }
    })
}

What is the test missing at point ⓧ?

synctest.Wait()
time.Sleep(3 * time.Second)
synctest.Wait, then time.Sleep
time.Sleep, then synctest.Wait
Nothing.

func TestWorkAsync(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        workAsync()

        // (X)

        if !done.Load() {
            t.Errorf("work not done")
        }
    })
}

Checking for cancellation and stopping

Sometimes you need to test objects that use resources and should be able to release them. For example, this could be a server that, when started, creates a pool of network connections, connects to a database, and writes file caches. When stopped, it should clean all this up.

Let's see how we can make sure everything is properly stopped in the tests.

Delayed stop

We're going to test this server:

// IncServer produces consecutive integers starting from 0.
type IncServer struct {
    // ...
}

// NewIncServer creates a new server.
func NewIncServer() *IncServer

// Start runs the server in a separate goroutine and
// sends numbers to the out channel until Stop is called.
func (s *IncServer) Start(out chan<- int)

// Stop shuts down the server.
func (s *IncServer) Stop()

Let's say we wrote a basic functional test:

func Test(t *testing.T) {
    nums := make(chan int)

    srv := NewIncServer()
    srv.Start(nums)
    defer srv.Stop()

    got := [3]int{<-nums, <-nums, <-nums}
    want := [3]int{0, 1, 2}
    if got != want {
        t.Errorf("First 3: got: %v; want: %v", got, want)
    }
}

PASS

The test passes, but does that really mean the server stopped when we called Stop? Not necessarily. For example, here's a buggy implementation where our test would still pass:

// Start runs the server in a separate goroutine and
// sends numbers to the out channel until Stop is called.
func (s *IncServer) Start(out chan<- int) {
    go func() {
        for {
            out <- s.current
            s.current++
        }
    }()
}

// Stop shuts down the server.
func (s *IncServer) Stop() {}

As you can see, the author simply forgot to stop the server here. To detect the problem, we can wrap the test in synctest.Test and see it panic:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        nums := make(chan int)

        srv := NewIncServer()
        srv.Start(nums)
        defer srv.Stop()

        got := [3]int{<-nums, <-nums, <-nums}
        want := [3]int{0, 1, 2}
        if got != want {
            t.Errorf("First 3: got: %v; want: %v", got, want)
        }
    })
}

panic: deadlock: main bubble goroutine has exited but blocked goroutines remain

The server ignores the Stop call and doesn't stop the goroutine running inside Start. Because of this, the goroutine gets blocked while writing to the out channel. When synctest.Test finishes, it detects the blocked goroutine and panics.

Let's fix the server code (to keep things simple, we won't support multiple Start or Stop calls):

// IncServer produces consecutive integers starting from 0.
type IncServer struct {
    current int
    done    chan struct{}
}

// Start runs the server in a separate goroutine and
// sends numbers to the out channel until Stop is called.
func (s *IncServer) Start(out chan<- int) {
    go func() {
        for {
            select {
            case out <- s.current:
                s.current++
            case <-s.done:
                // Release used resources.
                close(out)
                return
            }
        }
    }()
}

// Stop shuts down the server.
func (s *IncServer) Stop() {
    close(s.done)
}

PASS

Now the test passes. Here's how it works:

The main test code runs.
Before the test finishes, the deferred srv.Stop() is called.
In the server goroutine, the <-src.done case in the select statement triggers, and the goroutine ends.
synctest.Test sees that there are no blocked goroutines and finishes without panicking.

T.Cleanup

Instead of using defer to stop something, it's common to use the T.Cleanup method. It registers a function that will run when the test finishes:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        nums := make(chan int)

        srv := NewIncServer()
        srv.Start(nums)
        t.Cleanup(srv.Stop)

        got := [3]int{<-nums, <-nums, <-nums}
        want := [3]int{0, 1, 2}
        if got != want {
            t.Errorf("First 3: got: %v; want: %v", got, want)
        }
    })
}

PASS

Functions registered with Cleanup run in last-in, first-out (LIFO) order, after all deferred functions have executed.

In the test above, there's not much difference between using defer and Cleanup. But the difference becomes important if we move the server setup into a separate helper function, so we don't have to repeat the setup code in different tests:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        nums := newServer(t)
        got := [3]int{<-nums, <-nums, <-nums}
        want := [3]int{0, 1, 2}
        if got != want {
            t.Errorf("First 3: got: %v; want: %v", got, want)
        }
    })
}

The defer approach doesn't work because it calls Stop when newServer returns — before the test assertions run:

func newServer(t *testing.T) <-chan int {
    t.Helper()
    nums := make(chan int)

    srv := NewIncServer()
    srv.Start(nums)
    defer srv.Stop()

    return nums
}

--- FAIL: Test (0.00s)
    main_test.go:48: First 3: got: [0 0 0]; want: [0 1 2]

The t.Cleanup approach works because it calls Stop when synctest.Test has finished — after all the assertions have already run:

func newServer(t *testing.T) <-chan int {
    t.Helper()
    nums := make(chan int)

    srv := NewIncServer()
    srv.Start(nums)
    t.Cleanup(srv.Stop)

    return nums
}

PASS

T.Context

Sometimes, a context (context.Context) is used to stop the server instead of a separate method. In that case, our server interface might look like this:

// IncServer produces consecutive integers starting from 0.
type IncServer struct {
    // ...
}

// Start runs the server in a separate goroutine and
// sends numbers to the out channel until the context is canceled.
func (s *IncServer) Start(ctx context.Context, out chan<- int)

Now we don't even need to use defer or t.Cleanup to check whether the server stops when the context is canceled. Just pass t.Context() as the context:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        nums := make(chan int)
        server := new(IncServer)
        server.Start(t.Context(), nums)

        got := [3]int{<-nums, <-nums, <-nums}
        want := [3]int{0, 1, 2}
        if got != want {
            t.Errorf("First 3: got: %v; want: %v", got, want)
        }
    })
}

PASS

t.Context() returns a context that is automatically created when the test starts and is automatically canceled when the test finishes.

Here's how it works:

The main test code runs.
Before the test finishes, the t.Context() context is automatically canceled.
The server goroutine stops (as long as the server is implemented correctly and checks for context cancellation).
synctest.Test sees that there are no blocked goroutines and finishes without panicking.

Summary

To check for stopping via a method or function, use defer or t.Cleanup().

To check for cancellation or stopping via context, use t.Context().

Inside a bubble, t.Context() returns a context whose channel is associated with the bubble. The context is automatically canceled when synctest.Test ends.

Functions registered with t.Cleanup() inside the bubble run just before synctest.Test finishes.

Bubble rules

Let's go over the rules for living in the synctest bubble.

General:

A bubble is created by calling synctest.Test. Each call creates a separate bubble.
Goroutines started inside the bubble become part of it.
The bubble can only manage durable blocks. Other types of blocks are invisible to it.

synctest.Test:

If all goroutines in the bubble are durably blocked with no way to unblock them (such as by advancing the clock or returning from a synctest.Wait call), Test panics.
When Test finishes, it tries to wait for all child goroutines to complete. However, if even a single goroutine is durably blocked, Test panics.
Calling t.Context() returns a context whose channel is associated with the bubble.
Functions registered with t.Cleanup() run inside the bubble, immediately before Test returns.

synctest.Wait:

Calling Wait in a bubble blocks the goroutine that called it.
Wait returns when all other goroutines in the bubble are durably blocked.
Wait returns when all other goroutines in the bubble have finished.

Time:

The bubble uses a fake clock (starting at 2000-01-01 00:00:00 UTC).
Time in the bubble only moves forward if all goroutines are durably blocked.
Time advances by the smallest amount needed to unblock at least one goroutine.
If the bubble has to choose between moving time forward or returning from a running synctest.Wait, it returns from Wait.

The following operations durably block a goroutine:

A blocking send or receive on a channel created within the bubble.
A blocking select statement where every case is a channel created within the bubble.
Calling Cond.Wait.
Calling WaitGroup.Wait if all WaitGroup.Add calls were made inside the bubble.
Calling time.Sleep.

Limitations

The synctest limitations are quite logical, and you probably won't run into them.

Don't create channels or objects that contain channels (like tickers or timers) outside the bubble. Otherwise, the bubble won't be able to manage them, and the test will hang:

func Test(t *testing.T) {
    ch := make(chan int)
    synctest.Test(t, func(t *testing.T) {
        go func() { <-ch }()
        synctest.Wait()
        close(ch)
    })
}

panic: test timed out after 3s

Don't access synchronization primitives associated with a bubble from outside the bubble:

func Test(t *testing.T) {
    var ch chan int
    synctest.Test(t, func(t *testing.T) {
        ch = make(chan int)
    })
    close(ch)
}

panic: close of synctest channel from outside bubble

Don't call T.Run, T.Parallel, or T.Deadline inside a bubble:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        t.Run("subtest", func(t *testing.T) {
            t.Log("ok")
        })
    })
}

panic: testing: t.Run called inside synctest bubble

Don't call synctest.Test inside the bubble:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        synctest.Test(t, func(t *testing.T) {
            t.Log("ok")
        })
    })
}

panic: synctest.Run called from within a synctest bubble

Don't call synctest.Wait from outside the bubble:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        t.Log("ok")
    })
    synctest.Wait()
}

panic: goroutine is not in a bubble [recovered, repanicked]

Don't call synctest.Wait concurrently from multiple goroutines:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        go synctest.Wait()
        go synctest.Wait()
    })
}

panic: wait already in progress

That's it!

✎ Exercise: Testing a pipeline

Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.

If you are okay with just theory for now, let's continue.

Keep it up

The synctest package is a complicated beast. But now that you've studied it, you can test concurrent programs no matter what synchronization tools they use—channels, selects, wait groups, timers or tickers, or even time.Sleep.

In the next chapter, we'll talk about concurrency internals (coming soon).

Pre-order for $10 or read online

★ Subscribe to keep up with new posts.