Gist of Go: Concurrency testing
This is a chapter from my book on Go concurrency, which teaches the topic from the ground up through interactive examples.
Testing concurrent programs is a lot like testing single-task programs. If the code is well-designed, you can test the state of a concurrent program with standard tools like channels, wait groups, and other abstractions built on top of them.
But if you've made it so far, you know that concurrency is never that easy. In this chapter, we'll go over common testing problems and the solutions that Go offers.
Waiting for goroutines • Checking channels • Checking for leaks • Durable blocking • Instant waiting • Time inside the bubble • Thoughts on time 1 ✎ • Thoughts on time 2 ✎ • Checking for cleanup • Bubble rules • Keep it up
Waiting for goroutines to finish
Let's say we want to test this function:
// Calc calculates something asynchronously.
func Calc() <-chan int {
out := make(chan int, 1)
go func() {
out <- 42
}()
return out
}
Calculations run asynchronously in a separate goroutine. However, the function returns a result channel, so this isn't a problem:
func Test(t *testing.T) {
got := <-Calc() // (X)
if got != 42 {
t.Errorf("got: %v; want: 42", got)
}
}
PASS
At point ⓧ, the test is guaranteed to wait for the inner goroutine to finish. The rest of the test code doesn't need to know anything about how concurrency works inside the Calc function. Overall, the test isn't any more complicated than if Calc were synchronous.
But we're lucky that Calc returns a channel. What if it doesn't?
Naive approach
Let's say the Calc function looks like this:
var state atomic.Int32
// Calc calculates something asynchronously.
func Calc() {
go func() {
state.Store(42)
}()
}
We write a simple test and run it:
func TestNaive(t *testing.T) {
Calc()
got := state.Load() // (X)
if got != 42 {
t.Errorf("got: %v; want: 42", got)
}
}
=== RUN TestNaive
main_test.go:27: got: 0; want: 42
--- FAIL: TestNaive (0.00s)
The assertion fails because at point ⓧ, we didn't wait for the inner Calc goroutine to finish. In other words, we didn't synchronize the TestNaive and Calc goroutines. That's why state still has its initial value (0) when we do the check.
Waiting with time.Sleep
We can add a short delay with time.Sleep:
func TestSleep(t *testing.T) {
Calc()
// Wait for the goroutine to finish (if we're lucky).
time.Sleep(50 * time.Millisecond)
got := state.Load()
if got != 42 {
t.Errorf("got: %v; want: 42", got)
}
}
=== RUN TestSleep
--- PASS: TestSleep (0.05s)
The test is now passing. But using time.Sleep to sync goroutines isn't a great idea, even in tests. We don't want to set a custom delay for every function we're testing. Also, the function's execution time may be different on the local machine compared to a CI server. If we use a longer delay just to be safe, the tests will end up taking too long to run.
Sometimes you can't avoid using time.Sleep in tests, but since Go 1.25, the synctest package has made these cases much less common. Let's see how it works.
Waiting with synctest
The synctest package has a lot going on under the hood, but its public API is very simple:
func Test(t *testing.T, f func(*testing.T))
func Wait()
The synctest.Test function creates an isolated bubble where you can control time to some extent. Any new goroutines started inside this bubble become part of the bubble. So, if we wrap the test code with synctest.Test, everything will run inside the bubble — the test code, the Calc function we're testing, and its goroutine.
func TestSync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
Calc()
// (X)
got := state.Load()
if got != 42 {
t.Errorf("got: %v; want: 42", got)
}
})
}
At point ⓧ, we want to wait for the Calc goroutine to finish. The synctest.Wait function comes to the rescue! It blocks the calling goroutine until all other goroutines in the bubble are finished. (It's actually a bit more complicated than that, but we'll talk about it later.)
In our case, there's only one other goroutine (the inner Calc goroutine), so Wait will pause until it finishes, and then the test will move on.
func TestSync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
Calc()
// Wait for the goroutine to finish.
synctest.Wait()
got := state.Load()
if got != 42 {
t.Errorf("got: %v; want: 42", got)
}
})
}
=== RUN TestSync
--- PASS: TestSync (0.00s)
Now the test passes instantly. That's better!
✎ Exercise: Wait until done
Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.
If you are okay with just theory for now, let's continue.
Checking the channel state
As we've seen, you can use synctest.Wait to wait for the tested goroutine to finish, and then check the state of the data you are interested in. You can also use it to check the state of channels.
Let's say there's a function that generates N numbers like 11, 22, 33, and so on:
// Generate produces n numbers like 11, 22, 33, ...
func Generate(n int) <-chan int {
out := make(chan int)
go func() {
for i := range n {
out <- (i+1)*10 + (i + 1)
}
}()
return out
}
And a simple test:
func Test(t *testing.T) {
out := Generate(2)
var got int
got = <-out
if got != 11 {
t.Errorf("#1: got %v, want 11", got)
}
got = <-out
if got != 22 {
t.Errorf("#1: got %v, want 22", got)
}
}
PASS
Set N=2, get the first number from the generator's output channel, then get the second number. The test passed, so the function works correctly. But does it really?
Let's use Generate in "production":
func main() {
for v := range Generate(3) {
fmt.Print(v, " ")
}
}
11 22 33 fatal error: all goroutines are asleep - deadlock!
Panic! We forgot to close the out channel when exiting the inner Generate goroutine, so the for-range loop waiting on that channel got stuck.
Let's fix the code:
// Generate produces n numbers like 11, 22, 33, ...
func Generate(n int) <-chan int {
out := make(chan int)
go func() {
defer close(out)
for i := range n {
out <- (i+1)*10 + (i + 1)
}
}()
return out
}
And add a test for the out channel state:
func Test(t *testing.T) {
out := Generate(2)
<-out // 11
<-out // 22
// (X)
// Check that the channel is closed.
select {
case _, ok := <-out:
if ok {
t.Errorf("expected channel to be closed")
}
default:
t.Errorf("expected channel to be closed")
}
}
--- FAIL: Test (0.00s)
main_test.go:41: expected channel to be closed
The test is still failing, even though we're now closing the channel when the Generate goroutine exits.
This is a familiar problem: at point ⓧ, we didn't wait for the inner Generate goroutine to finish. So when we check the out channel, it hasn't closed yet. That's why the test fails.
We can delay the check using time.After:
func Test(t *testing.T) {
out := Generate(2)
<-out
<-out
// Check that the channel is closed.
select {
case _, ok := <-out:
if ok {
t.Errorf("expected channel to be closed")
}
case <-time.After(50 * time.Millisecond):
t.Fatalf("timeout waiting for channel to close")
}
}
PASS
But it's better to use synctest:
func TestClose(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
out := Generate(2)
<-out
<-out
// Wait for the goroutine to finish.
synctest.Wait()
// Check that the channel is closed.
select {
case _, ok := <-out:
if ok {
t.Errorf("expected channel to be closed")
}
default:
t.Errorf("expected channel to be closed")
}
})
}
PASS
At point ⓧ, synctest.Wait blocks the test until the only other goroutine (the inner Generate goroutine) finishes. Once the goroutine has exited, the channel is already closed. So, in the select statement, the <-out case triggers with ok set to false, allowing the test to pass.
As you can see, the synctest package helped us avoid delays in the test, and the test itself didn't get much more complicated.
Checking for goroutine leaks
As we've seen, you can use synctest.Wait to wait for the tested goroutine to finish, and then check the state of the data or channels. You can also use it to detect goroutine leaks.
Let's say there's a function that runs the given functions concurrently and sends their results to an output channel:
// Map runs the given functions concurently.
func Map(funcs ...func() int) <-chan int {
out := make(chan int)
for _, f := range funcs {
go func() {
out <- f()
}()
}
return out
}
And a simple test:
func Test(t *testing.T) {
out := Map(
func() int { return 11 },
func() int { return 22 },
func() int { return 33 },
)
got := <-out
if got != 11 && got != 22 && got != 33 {
t.Errorf("got %v, want 11, 22 or 33", got)
}
}
PASS
Send three functions to be executed, get the first result from the output channel, and check it. The test passed, so the function works correctly. But does it really?
Let's run Map three times, passing three functions each time:
func main() {
for range 3 {
Map(
func() int { return 11 },
func() int { return 22 },
func() int { return 33 },
)
}
time.Sleep(50 * time.Millisecond)
nGoro := runtime.NumGoroutine() - 1 // minus the main goroutine
fmt.Println("nGoro =", nGoro)
}
nGoro = 9
After 50 ms — when all the functions should definitely have finished — there are still 9 running goroutines (runtime.NumGoroutine). In other words, all the goroutines are stuck.
The reason is that the out channel is unbuffered. If the client doesn't read from it, or doesn't read all the results, the goroutines inside Map get blocked when they try to send the result of f() to out.
Let's fix this by adding a buffer of the right size to the channel:
// Map runs the given functions concurently.
func Map(funcs ...func() int) <-chan int {
out := make(chan int, len(funcs))
for _, f := range funcs {
go func() {
out <- f()
}()
}
return out
}
Then add a test to check the number of goroutines:
func Test(t *testing.T) {
for range 3 {
Map(
func() int { return 11 },
func() int { return 22 },
func() int { return 33 },
)
}
// (X)
nGoro := runtime.NumGoroutine() - 2 // minus the main and Test goroutines
if nGoro != 0 {
t.Fatalf("expected 0 goroutines, got %d", nGoro)
}
}
--- FAIL: Test (0.00s)
main_test.go:44: expected 0 goroutines, got 9
The test is still failing, even though the channel is now buffered, and the goroutines shouldn't block on sending to it.
This is a familiar problem: at point ⓧ, we didn't wait for the running Map goroutines to finish. So nGoro is greater than zero, which makes the test fail.
We can delay the check using time.Sleep (not recommended), or use a third-party package like goleak (a better option):
func Test(t *testing.T) {
defer goleak.VerifyNone(t)
for range 3 {
Map(
func() int { return 11 },
func() int { return 22 },
func() int { return 33 },
)
}
}
PASS
The test passes now.
By the way, goleak also uses time.Sleep internally, but it does so much more efficiently. It tries up to 20 times, with the wait time between checks increasing exponentially, starting at 1 microsecond and going up to 100 milliseconds. This way, the test runs almost instantly.
Even better, we can check for leaks without any third-party packages by using synctest:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
for range 3 {
Map(
func() int { return 11 },
func() int { return 22 },
func() int { return 33 },
)
}
synctest.Wait()
})
}
PASS
Earlier, I said that synctest.Wait blocks the calling goroutine until all other goroutines finish. Actually, it's a bit more complicated. synctest.Wait blocks until all other goroutines either finish or become durably blocked.
We'll talk about "durably" later. For now, let's focus on "become blocked." Let's temporarily remove the buffer from the channel and check the test results:
// Map runs the given functions concurently.
func Map(funcs ...func() int) <-chan int {
out := make(chan int)
for _, f := range funcs {
go func() {
out <- f()
}()
}
return out
}
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
for range 3 {
Map(
func() int { return 11 },
func() int { return 22 },
func() int { return 33 },
)
}
synctest.Wait()
})
}
--- FAIL: Test (0.00s)
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked]
Here's what happens:
- Three calls to
Mapstart 9 goroutines. - The call to
synctest.Waitblocks the root bubble goroutine (synctest.Test). - One of the goroutines finishes its work, tries to write to
out, and gets blocked (because no one is reading fromout). - The same thing happens to the other 8 goroutines.
synctest.Waitsees that all the child goroutines in the bubble are blocked, so it unblocks the root goroutine.- The root goroutine finishes.
Next, synctest.Test comes into play. It not only starts the bubble goroutine, but also tries to wait for all child goroutines to finish before it returns. If Test sees that some goroutines are stuck (in our case, all 9 are blocked trying to send to the channel), it panics:
main bubble goroutine has exited but blocked goroutines remain
So, we found the leak without using time.Sleep or goleak, thanks to the useful features of synctest.Wait and synctest.Test:
synctest.Waitunblocks as soon as all other goroutines are durably blocked.synctest.Testpanics when finished if there are still blocked goroutines left in the bubble.
Now let's make the channel buffered and run the test again:
=== RUN Test
--- PASS: Test (0.00s)
Perfect!
Durable blocking
As we've found, synctest.Wait blocks until all goroutines in the bubble — except the one that called Wait — have either finished or are durably blocked. Let's figure out what "durably blocked" means.
For synctest, a goroutine inside a bubble is considered durably blocked if it is blocked by any of the following operations:
- Sending to or receiving from a channel created within the bubble.
- A select statement where every case is a channel created within the bubble.
- Calling
WaitGroup.Waitif allWaitGroup.Addcalls were made inside the bubble. - Calling
Cond.Wait. - Calling
time.Sleep.
Other blocking operations are not considered durable, and synctest.Wait ignores them. For example:
- Sending to or receiving from a channel created outside the bubble.
- Calling
Mutex.LockorRWMutex.Lock. - I/O operations (like reading a file from disk or waiting for a network response).
- System calls and cgo calls.
The distinction between "durable" and other types of blocks is just a implementation detail of the
synctestpackage. It's not a fundamental property of the blocking operations themselves. In real-world applications, this distinction doesn't exist, and "durable" blocks are neither better nor worse than any others.
Let's look at an example.
Asynchronous processor
Let's say there's a Proc type that performs some asynchronous computation:
// Proc calculates something asynchronously.
type Proc struct {
// ...
}
// NewProc starts the calculation in a separate goroutine.
// The calculation keep running until Stop is called.
func NewProc() *Proc
// Res returns the current calculation result.
// It's only available until Stop is called; after that, it resets to zero.
func (p *Proc) Res() int
// Stop terminates the calculation.
func (p *Proc) Stop()
Our goal is to write a test that checks the result while the calculation is still running. Let's see how the test changes depending on how Proc is implemented (except for the time.Sleep version — we'll cover that one a bit later).
Blocking on a channel
Let's say Proc is implemented using a done channel:
// Proc calculates something asynchronously.
type Proc struct {
res int
done chan struct{}
}
// NewProc starts the calculation.
func NewProc() *Proc {
p := &Proc{done: make(chan struct{})}
go func() {
p.res = 42
<-p.done // (X)
p.res = 0
}()
return p
}
// Stop terminates the calculation.
func (p *Proc) Stop() {
close(p.done)
}
Naive test:
func TestNaive(t *testing.T) {
p := NewProc()
defer p.Stop()
if got := p.Res(); got != 42 {
t.Fatalf("got %v, want 42", got)
}
}
--- FAIL: TestNaive (0.00s)
main_test.go:52: got 0, want 42
The check fails because when p.Res() is called, the goroutine in NewProc hasn't set p.res = 42 yet.
Let's use synctest.Wait to wait until the goroutine is blocked at point ⓧ:
func TestSync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
p := NewProc()
defer p.Stop()
// Wait for the goroutine to block at point X.
synctest.Wait()
if got := p.Res(); got != 42 {
t.Fatalf("got %v, want 42", got)
}
})
}
PASS
In ⓧ, the goroutine is blocked on reading from the p.done channel. This channel is created inside the bubble, so the block is durable. The synctest.Wait call in the test returns as soon as <-p.done happens, and we get the current value of p.res.
Blocking on a select
Let's say Proc is implemented using select:
// Proc calculates something asynchronously.
type Proc struct {
res int
in chan int
done chan struct{}
}
// NewProc starts the calculation.
func NewProc() *Proc {
p := &Proc{
res: 0,
in: make(chan int),
done: make(chan struct{}),
}
go func() {
p.res = 42
select { // (X)
case n := <-p.in:
p.res = n
case <-p.done:
}
}()
return p
}
// Stop terminates the calculation.
func (p *Proc) Stop() {
close(p.done)
}
Let's use synctest.Wait to wait until the goroutine is blocked at point ⓧ:
func TestSync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
p := NewProc()
defer p.Stop()
// Wait for the goroutine to block at point X.
synctest.Wait()
if got := p.Res(); got != 42 {
t.Fatalf("got %v, want 42", got)
}
})
}
PASS
In ⓧ, the goroutine is blocked on a select statement. Both channels used in the select (p.in and p.done) are created inside the bubble, so the block is durable. The synctest.Wait call in the test returns as soon as select happens, and we get the current value of p.res.
Blocking on a wait group
Let's say Proc is implemented using a wait group:
// Proc calculates something asynchronously.
type Proc struct {
res int
wg sync.WaitGroup
}
// NewProc starts the calculation.
func NewProc() *Proc {
p := &Proc{}
p.wg.Add(1)
go func() {
p.res = 42
p.wg.Wait() // (X)
p.res = 0
}()
return p
}
// Stop terminates the calculation.
func (p *Proc) Stop() {
p.wg.Done()
}
Let's use synctest.Wait to wait until the goroutine is blocked at point ⓧ:
func TestSync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
p := NewProc()
defer p.Stop()
// Wait for the goroutine to block at point X.
synctest.Wait()
if got := p.Res(); got != 42 {
t.Fatalf("got %v, want 42", got)
}
})
}
PASS
In ⓧ, the goroutine is blocked on the wait group's p.wg.Wait() call. The group's Add method was called inside the bubble, so this is a durable block. The synctest.Wait call in the test returns as soon as p.wg.Wait() happens, and we get the current value of p.res.
Blocking on a condition variable
Let's say Proc is implemented using a condition variable:
// Proc calculates something asynchronously.
type Proc struct {
res int
cond *sync.Cond
}
// NewProc starts the calculation.
func NewProc() *Proc {
p := &Proc{
cond: sync.NewCond(&sync.Mutex{}),
}
go func() {
p.cond.L.Lock()
p.res = 42
p.cond.Wait() // (X)
p.res = 0
p.cond.L.Unlock()
}()
return p
}
// Stop terminates the calculation.
func (p *Proc) Stop() {
p.cond.Signal()
}
Let's use synctest.Wait to wait until the goroutine is blocked at point ⓧ:
func TestSync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
p := NewProc()
defer p.Stop()
// Wait for the goroutine to block at point X.
synctest.Wait()
if got := p.Res(); got != 42 {
t.Fatalf("got %v, want 42", got)
}
})
}
PASS
In ⓧ, the goroutine is blocked on the condition variable's p.cond.Wait() call. This is a durable block. The synctest.Wait call returns as soon as p.cond.Wait() happens, and we get the current value of p.res.
Blocking on a mutex
Let's say Proc is implemented using a mutex:
// Proc calculates something asynchronously.
type Proc struct {
res int
mu sync.Mutex
}
// NewProc starts the calculation.
func NewProc() *Proc {
p := &Proc{}
p.mu.Lock()
go func() {
p.res = 42
p.mu.Lock() // (X)
p.res = 0
p.mu.Unlock()
}()
return p
}
// Stop terminates the calculation.
func (p *Proc) Stop() {
p.mu.Unlock()
}
Let's try using synctest.Wait to wait until the goroutine is blocked at point ⓧ:
func TestSync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
p := NewProc()
defer p.Stop()
// Hangs because synctest ignores blocking on a mutex.
synctest.Wait()
if got := p.Res(); got != 42 {
t.Fatalf("got %v, want 42", got)
}
})
}
code execution timeout
In ⓧ, the goroutine is blocked on the mutex's p.mu.Lock() call. synctest doesn't consider blocking on a mutex to be durable. The synctest.Wait call ignores the block and never returns. The test hangs and only fails when the overall go test timeout is reached.
You might be wondering why the synctest authors didn't consider blocking on mutexes to be durable. There are a couple of reasons:
- Mutexes are usually used to protect shared state, not to coordinate goroutines (the example above is completely unrealistic). In tests, you usually don't need to pause before locking a mutex to check something.
- Mutex locks are usually held for a very short time, and mutexes themselves need to be as fast as possible. Adding extra logic to support
synctestcould slow them down in normal (non-test) situations.
⌘ ⌘ ⌘
Let's go back to the original question: how does the test change depending on how Proc is implemented? It doesn't change at all. We used the exact same test code every time:
func TestSync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
p := NewProc()
defer p.Stop()
synctest.Wait()
if got := p.Res(); got != 42 {
t.Fatalf("got %v, want 42", got)
}
})
}
If your program uses durably blocking operations, synctest.Wait always works the same way:
- It waits until all other goroutines in the bubble are blocked.
- Then, it unblocks the goroutine that called it.
Very convenient!
✎ Exercise: Blocking queue
Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.
If you are okay with just theory for now, let's continue.
Instant waiting
Inside the synctest.Test bubble, time works differently. Instead of using a regular wall clock, the bubble uses a fake clock that can jump forward to any point in the future. This can be quite handy when testing time-sensitive code.
Let's say we want to test this function:
// Calc processes a value from the input channel.
// Times out if no input is received after 3 seconds.
func Calc(in chan int) (int, error) {
select {
case v := <-in:
return v * 2, nil
case <-time.After(3 * time.Second):
return 0, ErrTimeout
}
}
The positive scenario is straightforward: send a value to the channel, call the function, and check the result:
func TestCalc_result(t *testing.T) {
ch := make(chan int)
go func() { ch <- 11 }()
got, err := Calc(ch)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if got != 22 {
t.Errorf("got: %v; want: 22", got)
}
}
PASS
The negative scenario, where the function times out, is also pretty straightforward. But the test takes the full three seconds to complete:
func TestCalc_timeout_naive(t *testing.T) {
ch := make(chan int)
got, err := Calc(ch) // runs for 3 seconds
if err != ErrTimeout {
t.Errorf("got: %v; want: %v", err, ErrTimeout)
}
if got != 0 {
t.Errorf("got: %v; want: 0", got)
}
}
=== RUN TestCalc_timeout_naive
--- PASS: TestCalc_timeout_naive (3.00s)
We're actually lucky the timeout is only three seconds. It could have been as long as sixty!
To make the test run instantly, let's wrap it in synctest.Test:
func TestCalc_timeout_synctest(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
ch := make(chan int)
got, err := Calc(ch) // runs instantly
if err != ErrTimeout {
t.Errorf("got: %v; want: %v", err, ErrTimeout)
}
if got != 0 {
t.Errorf("got: %v; want: 0", got)
}
})
}
=== RUN TestCalc_timeout_synctest
--- PASS: TestCalc_timeout_synctest (0.00s)
Note that there is no synctest.Wait call here, and the only goroutine in the bubble (the root one) gets durably blocked on a select statement in Calc. Here's what happens next:
- The bubble checks if the goroutine can be unblocked by waiting. In our case, it can — we just need to wait 3 seconds.
- The bubble's clock instantly jumps forward 3 seconds.
- The select in
Calcchooses the timeout case, and the function returnsErrTimeout. - The test assertions for
errandgotboth pass successfully.
Thanks to the fake clock, the test runs instantly instead of taking three seconds like it would with the "naive" approach.
You might have noticed that quite a few circumstances coincided here:
- There's no
synctest.Waitcall. - There's only one goroutine.
- The goroutine is durably blocked.
- It will be unblocked at certain point in the future.
We'll look at the alternatives soon, but first, here's a quick exercise.
✎ Exercise: Wait, repeat
Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.
If you are okay with just theory for now, let's continue.
Time inside the bubble
The fake clock in synctest.Test can be tricky. It move forward only if: ➊ all goroutines in the bubble are durably blocked; ➋ there's a future moment when at least one goroutine will unblock; and ➌ synctest.Wait isn't running.
Let's look at the alternatives. I'll say right away, this isn't an easy topic. But when has time travel ever been easy? :)
Not all goroutines are blocked
Here's the Calc function we're testing:
// Calc processes a value from the input channel.
// Times out if no input is received after 3 seconds.
func Calc(in chan int) (int, error) {
select {
case v := <-in:
return v * 2, nil
case <-time.After(3 * time.Second):
return 0, ErrTimeout
}
}
Let's run Calc in a separate goroutine, so there will be two goroutines in the bubble:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
var got int
var err error
go func() {
ch := make(chan int)
got, err = Calc(ch)
}()
if err != ErrTimeout {
t.Errorf("got: %v; want: %v", err, ErrTimeout)
}
if got != 0 {
t.Errorf("got: %v; want: 0", got)
}
})
}
--- FAIL: Test (0.00s)
main_test.go:45: got: <nil>; want: timeout
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked]
synctest.Test panicked because the root bubble goroutine finished while the Calc goroutine was still blocked on a select.
Reason: synctest.Test only advances the clock if all goroutines are blocked — including the root bubble goroutine.
How to fix: Use time.Sleep to make sure the root goroutine is also durably blocked.
func Test_fixed(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
ch := make(chan int)
var got int
go func() {
got, _ = Calc(ch)
}()
// Wait for the Calc goroutine to finish.
time.Sleep(5 * time.Second)
if got != 0 {
t.Errorf("got: %v; want: 0", got)
}
})
}
PASS
Now all three conditions are met again (all goroutines are durably blocked; the moment of future unblocking is known; there is no call to synctest.Wait). The fake clock moves forward 3 seconds, which unblocks the Calc goroutine. The goroutine finishes, leaving only the root one, which is still blocked on time.Sleep. The clock moves forward another 2 seconds, unblocking the root goroutine. The assertion passes, and the test completes successfully.
But if we run the test with the race detector enabled (using the -race flag), it reports a data race on the got variable:
race detected during execution of test
Logically, using time.Sleep in the root goroutine doesn't guarantee that the Calc goroutine (which writes to the got variable) will finish before the root goroutine reads from got. That's why the race detector reports a problem. Technically, the test passes because of how synctest is implemented, but the race still exists in the code. The right way to handle this is to call synctest.Wait after time.Sleep:
func Test_fixed(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
ch := make(chan int)
var got int
go func() {
got, _ = Calc(ch)
}()
// Wait for the Calc goroutine to finish.
time.Sleep(3 * time.Second)
synctest.Wait()
if got != 0 {
t.Errorf("got: %v; want: 0", got)
}
})
}
PASS
Calling synctest.Wait ensures that the Calc goroutine finishes before the root goroutine reads got, so there's no data race anymore.
synctest.Wait is running
Here's the Calc function we're testing:
// Calc processes a value from the input channel.
// Times out if no input is received after 3 seconds.
func Calc(in chan int) (int, error) {
select {
case v := <-in:
return v * 2, nil
case <-time.After(3 * time.Second):
return 0, ErrTimeout
}
}
Let's replace time.Sleep() in the root goroutine with synctest.Wait():
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
var got int
var err error
go func() {
ch := make(chan int)
got, err = Calc(ch)
}()
// Doesn't wait for the Calc goroutine to finish.
synctest.Wait()
if err != ErrTimeout {
t.Errorf("got: %v; want: %v", err, ErrTimeout)
}
if got != 0 {
t.Errorf("got: %v; want: 0", got)
}
})
}
--- FAIL: Test (0.00s)
main_test.go:48: got: <nil>; want: timeout
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked]
synctest.Test panicked because the root bubble goroutine finished while the Calc goroutine was still blocked on a select.
Reason: synctest.Test only advances the clock if there is no active synctest.Wait running.
If all bubble goroutines are durably blocked but a synctest.Wait is running, synctest.Test won't advance the clock. Instead, it will simply finish the synctest.Wait call and return control to the goroutine that called it (in this case, the root bubble goroutine).
How to fix: don't use synctest.Wait.
The moment of unblocking is unclear
Let's update Calc to use context cancellation instead of a timer:
// Calc processes a value from the input channel.
// Exits if the context is canceled.
func Calc(in chan int, ctx context.Context) (int, error) {
select {
case v := <-in:
return v * 2, nil
case <-ctx.Done():
return 0, ctx.Err()
}
}
We won't cancel the context in the test:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
ch := make(chan int)
ctx, _ := context.WithCancel(context.Background())
got, err := Calc(ch, ctx)
if err != nil {
t.Errorf("got: %v; want: nil", err)
}
if got != 0 {
t.Errorf("got: %v; want: 0", got)
}
})
}
--- FAIL: Test (0.00s)
panic: deadlock: all goroutines in bubble are blocked [recovered, repanicked]
synctest.Test panicked because all goroutines in the bubble are hopelessly blocked.
Reason: synctest.Test only advances the clock if it knows how much to advance it. In this case, there is no future moment that would unblock the select in Calc.
How to fix: Manually unblock the goroutine and call synctest.Wait to wait for it to finish.
func Test_fixed(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
var got int
var err error
ctx, cancel := context.WithCancel(context.Background())
go func() {
ch := make(chan int)
got, err = Calc(ch, ctx)
}()
// Unblock the Calc goroutine.
cancel()
// Wait for it to finish.
synctest.Wait()
if err != context.Canceled {
t.Errorf("got: %v; want: %v", err, context.Canceled)
}
if got != 0 {
t.Errorf("got: %v; want: 0", got)
}
})
}
PASS
Now, cancel() cancels the context and unblocks the select in Calc, while synctest.Wait makes sure the Calc goroutine finishes before the test checks got and err.
The goroutine isn't durably blocked
Let's update Calc to lock the mutex before doing any calculations:
// Calc processes a value and returns the result.
func Calc(v int, mu *sync.Mutex) int {
mu.Lock()
defer mu.Unlock()
v = v * 2
return v
}
In the test, we'll lock the mutex before calling Calc, so it will block:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
var mu sync.Mutex
mu.Lock()
go func() {
time.Sleep(10 * time.Millisecond)
mu.Unlock()
}()
got := Calc(11, &mu)
if got != 22 {
t.Errorf("got: %v; want: 22", got)
}
})
}
code execution timeout
The test failed because it hit the overall timeout set in go test.
Reason: synctest.Test only works with durable blocks. Blocking on a mutex lock isn't considered durable, so the bubble can't do anything about it — even though the sleeping inner goroutine would have unlocked the mutex in 10 ms if the bubble had used the wall clock.
How to fix: Don't use synctest.
func Test_fixed(t *testing.T) {
var mu sync.Mutex
mu.Lock()
go func() {
time.Sleep(10 * time.Millisecond)
mu.Unlock()
}()
got := Calc(11, &mu)
if got != 22 {
t.Errorf("got: %v; want: 22", got)
}
}
PASS
Now the mutex unlocks after 10 milliseconds (wall clock), Calc finishes successfully, and the got check passes.
Summary
The clock inside the buuble won't move forward if:
- There are any goroutines that aren't durably blocked.
- It's unclear how much time to advance.
synctest.Waitis running.
Phew.
✎ Exercise: Asynchronous repeater
Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.
If you are okay with just theory for now, let's continue.
✎ Thoughts on time 1
Let's practice understanding time in the bubble with some thinking exercises. Try to solve the problem in your head before using the playground.
Here's a function that performs synchronous work:
var done atomic.Bool
// workSync performs synchronous work.
func workSync() {
time.Sleep(3 * time.Second)
done.Store(true)
}
And a test for it:
func TestWorkSync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
workSync()
// (X)
if !done.Load() {
t.Errorf("work not done")
}
})
}
What is the test missing at point ⓧ?
synctest.Wait()time.Sleep(3 * time.Second)synctest.Wait, thentime.Sleeptime.Sleep, thensynctest.Wait- Nothing.
func TestWorkSync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
workSync()
// (X)
if !done.Load() {
t.Errorf("work not done")
}
})
}
✓ Thoughts on time 1
There's only one goroutine in the test, so when workSync gets blocked by time.Sleep, the time in the bubble jumps forward by 3 seconds. Then workSync sets done to true and finishes. Finally, the test checks done and passes successfully.
No need to add anything.
✎ Thoughts on time 2
Let's keep practicing our understanding of time in the bubble with some thinking exercises. Try to solve the problem in your head before using the playground.
Here's a function that performs asynchronous work:
var done atomic.Bool
// workAsync performs asynchronous work.
func workAsync() {
go func() {
time.Sleep(3 * time.Second)
done.Store(true)
}()
}
And a test for it:
func TestWorkAsync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
workAsync()
// (X)
if !done.Load() {
t.Errorf("work not done")
}
})
}
What is the test missing at point ⓧ?
synctest.Wait()time.Sleep(3 * time.Second)synctest.Wait, thentime.Sleeptime.Sleep, thensynctest.Wait- Nothing.
func TestWorkAsync(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
workAsync()
// (X)
if !done.Load() {
t.Errorf("work not done")
}
})
}
✓ Thoughts on time 2
Let's go over the options.
✘ synctest.Wait
This won't help because Wait returns as soon as time.Sleep inside workAsync is called. The done check fails, and synctest.Test panics with the error: "main bubble goroutine has exited but blocked goroutines remain".
✘ time.Sleep
Because of the time.Sleep call in the root goroutine, the wait inside time.Sleep in workAsync is already over by the time done is checked. However, there's no guarantee that done.Store(true) has run yet. That's why the test might pass or might fail.
✘ synctest.Wait, then time.Sleep
This option is basically the same as just using time.Sleep, because synctest.Wait returns before the time.Sleep in workAsync even starts. The test might pass or might fail.
✓ time.Sleep, then synctest.Wait
This is the correct answer:
- Because of the
time.Sleepcall in the root goroutine, the wait insidetime.SleepinworkAsyncis already over by the timedoneis checked. - Because of the
synctest.Waitcall, theworkAsyncgoroutine is guaranteed to finish (and hence to calldone.Store(true)) beforedoneis checked.
✘ Nothing
Since the root goroutine isn't blocked, it checks done while the workAsync goroutine is blocked by the time.Sleep call. The check fails, and synctest.Test panics with the message: "main bubble goroutine has exited but blocked goroutines remain".
Checking for cancellation and stopping
Sometimes you need to test objects that use resources and should be able to release them. For example, this could be a server that, when started, creates a pool of network connections, connects to a database, and writes file caches. When stopped, it should clean all this up.
Let's see how we can make sure everything is properly stopped in the tests.
Delayed stop
We're going to test this server:
// IncServer produces consecutive integers starting from 0.
type IncServer struct {
// ...
}
// NewIncServer creates a new server.
func NewIncServer() *IncServer
// Start runs the server in a separate goroutine and
// sends numbers to the out channel until Stop is called.
func (s *IncServer) Start(out chan<- int)
// Stop shuts down the server.
func (s *IncServer) Stop()
Let's say we wrote a basic functional test:
func Test(t *testing.T) {
nums := make(chan int)
srv := NewIncServer()
srv.Start(nums)
defer srv.Stop()
got := [3]int{<-nums, <-nums, <-nums}
want := [3]int{0, 1, 2}
if got != want {
t.Errorf("First 3: got: %v; want: %v", got, want)
}
}
PASS
The test passes, but does that really mean the server stopped when we called Stop? Not necessarily. For example, here's a buggy implementation where our test would still pass:
// Start runs the server in a separate goroutine and
// sends numbers to the out channel until Stop is called.
func (s *IncServer) Start(out chan<- int) {
go func() {
for {
out <- s.current
s.current++
}
}()
}
// Stop shuts down the server.
func (s *IncServer) Stop() {}
As you can see, the author simply forgot to stop the server here. To detect the problem, we can wrap the test in synctest.Test and see it panic:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
nums := make(chan int)
srv := NewIncServer()
srv.Start(nums)
defer srv.Stop()
got := [3]int{<-nums, <-nums, <-nums}
want := [3]int{0, 1, 2}
if got != want {
t.Errorf("First 3: got: %v; want: %v", got, want)
}
})
}
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
The server ignores the Stop call and doesn't stop the goroutine running inside Start. Because of this, the goroutine gets blocked while writing to the out channel. When synctest.Test finishes, it detects the blocked goroutine and panics.
Let's fix the server code (to keep things simple, we won't support multiple Start or Stop calls):
// IncServer produces consecutive integers starting from 0.
type IncServer struct {
current int
done chan struct{}
}
// Start runs the server in a separate goroutine and
// sends numbers to the out channel until Stop is called.
func (s *IncServer) Start(out chan<- int) {
go func() {
for {
select {
case out <- s.current:
s.current++
case <-s.done:
// Release used resources.
close(out)
return
}
}
}()
}
// Stop shuts down the server.
func (s *IncServer) Stop() {
close(s.done)
}
PASS
Now the test passes. Here's how it works:
- The main test code runs.
- Before the test finishes, the deferred
srv.Stop()is called. - In the server goroutine, the
<-src.donecase in the select statement triggers, and the goroutine ends. synctest.Testsees that there are no blocked goroutines and finishes without panicking.
T.Cleanup
Instead of using defer to stop something, it's common to use the T.Cleanup method. It registers a function that will run when the test finishes:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
nums := make(chan int)
srv := NewIncServer()
srv.Start(nums)
t.Cleanup(srv.Stop)
got := [3]int{<-nums, <-nums, <-nums}
want := [3]int{0, 1, 2}
if got != want {
t.Errorf("First 3: got: %v; want: %v", got, want)
}
})
}
PASS
Functions registered with
Cleanuprun in last-in, first-out (LIFO) order, after all deferred functions have executed.
In the test above, there's not much difference between using defer and Cleanup. But the difference becomes important if we move the server setup into a separate helper function, so we don't have to repeat the setup code in different tests:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
nums := newServer(t)
got := [3]int{<-nums, <-nums, <-nums}
want := [3]int{0, 1, 2}
if got != want {
t.Errorf("First 3: got: %v; want: %v", got, want)
}
})
}
The defer approach doesn't work because it calls Stop when newServer returns — before the test assertions run:
func newServer(t *testing.T) <-chan int {
t.Helper()
nums := make(chan int)
srv := NewIncServer()
srv.Start(nums)
defer srv.Stop()
return nums
}
--- FAIL: Test (0.00s)
main_test.go:48: First 3: got: [0 0 0]; want: [0 1 2]
The t.Cleanup approach works because it calls Stop when synctest.Test has finished — after all the assertions have already run:
func newServer(t *testing.T) <-chan int {
t.Helper()
nums := make(chan int)
srv := NewIncServer()
srv.Start(nums)
t.Cleanup(srv.Stop)
return nums
}
PASS
T.Context
Sometimes, a context (context.Context) is used to stop the server instead of a separate method. In that case, our server interface might look like this:
// IncServer produces consecutive integers starting from 0.
type IncServer struct {
// ...
}
// Start runs the server in a separate goroutine and
// sends numbers to the out channel until the context is canceled.
func (s *IncServer) Start(ctx context.Context, out chan<- int)
Now we don't even need to use defer or t.Cleanup to check whether the server stops when the context is canceled. Just pass t.Context() as the context:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
nums := make(chan int)
server := new(IncServer)
server.Start(t.Context(), nums)
got := [3]int{<-nums, <-nums, <-nums}
want := [3]int{0, 1, 2}
if got != want {
t.Errorf("First 3: got: %v; want: %v", got, want)
}
})
}
PASS
t.Context() returns a context that is automatically created when the test starts and is automatically canceled when the test finishes.
Here's how it works:
- The main test code runs.
- Before the test finishes, the
t.Context()context is automatically canceled. - The server goroutine stops (as long as the server is implemented correctly and checks for context cancellation).
synctest.Testsees that there are no blocked goroutines and finishes without panicking.
Summary
To check for stopping via a method or function, use defer or t.Cleanup().
To check for cancellation or stopping via context, use t.Context().
Inside a bubble, t.Context() returns a context whose channel is associated with the bubble. The context is automatically canceled when synctest.Test ends.
Functions registered with t.Cleanup() inside the bubble run just before synctest.Test finishes.
Bubble rules
Let's go over the rules for living in the synctest bubble.
General:
- A bubble is created by calling
synctest.Test. Each call creates a separate bubble. - Goroutines started inside the bubble become part of it.
- The bubble can only manage durable blocks. Other types of blocks are invisible to it.
synctest.Test:
- If all goroutines in the bubble are durably blocked with no way to unblock them (such as by advancing the clock or returning from a
synctest.Waitcall),Testpanics. - When
Testfinishes, it tries to wait for all child goroutines to complete. However, if even a single goroutine is durably blocked,Testpanics. - Calling
t.Context()returns a context whose channel is associated with the bubble. - Functions registered with
t.Cleanup()run inside the bubble, immediately beforeTestreturns.
synctest.Wait:
- Calling
Waitin a bubble blocks the goroutine that called it. Waitreturns when all other goroutines in the bubble are durably blocked.Waitreturns when all other goroutines in the bubble have finished.
Time:
- The bubble uses a fake clock (starting at 2000-01-01 00:00:00 UTC).
- Time in the bubble only moves forward if all goroutines are durably blocked.
- Time advances by the smallest amount needed to unblock at least one goroutine.
- If the bubble has to choose between moving time forward or returning from a running
synctest.Wait, it returns fromWait.
The following operations durably block a goroutine:
- A blocking send or receive on a channel created within the bubble.
- A blocking select statement where every case is a channel created within the bubble.
- Calling
Cond.Wait. - Calling
WaitGroup.Waitif allWaitGroup.Addcalls were made inside the bubble. - Calling
time.Sleep.
Limitations
The synctest limitations are quite logical, and you probably won't run into them.
Don't create channels or objects that contain channels (like tickers or timers) outside the bubble. Otherwise, the bubble won't be able to manage them, and the test will hang:
func Test(t *testing.T) {
ch := make(chan int)
synctest.Test(t, func(t *testing.T) {
go func() { <-ch }()
synctest.Wait()
close(ch)
})
}
panic: test timed out after 3s
Don't access synchronization primitives associated with a bubble from outside the bubble:
func Test(t *testing.T) {
var ch chan int
synctest.Test(t, func(t *testing.T) {
ch = make(chan int)
})
close(ch)
}
panic: close of synctest channel from outside bubble
Don't call T.Run, T.Parallel, or T.Deadline inside a bubble:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
t.Run("subtest", func(t *testing.T) {
t.Log("ok")
})
})
}
panic: testing: t.Run called inside synctest bubble
Don't call synctest.Test inside the bubble:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
t.Log("ok")
})
})
}
panic: synctest.Run called from within a synctest bubble
Don't call synctest.Wait from outside the bubble:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
t.Log("ok")
})
synctest.Wait()
}
panic: goroutine is not in a bubble [recovered, repanicked]
Don't call synctest.Wait concurrently from multiple goroutines:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
go synctest.Wait()
go synctest.Wait()
})
}
panic: wait already in progress
That's it!
✎ Exercise: Testing a pipeline
Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.
If you are okay with just theory for now, let's continue.
Keep it up
The synctest package is a complicated beast. But now that you've studied it, you can test concurrent programs no matter what synchronization tools they use—channels, selects, wait groups, timers or tickers, or even time.Sleep.
In the next chapter, we'll talk about concurrency internals (coming soon).
Pre-order for $10 or read online
★ Subscribe to keep up with new posts.