Gist of Go: Wait groups
This is a chapter from my book on Go concurrency, which teaches the topic from the ground up through interactive examples.
Channels are a multi-purpose concurrency tool in Go. In Part 1 of the book, we covered their main use cases:
- Transferring data between goroutines.
- Synchronizing goroutines (the done channel).
- Canceling goroutines (the cancel channel).
Transferring data is what channels were designed for, and they excel at it. For canceling goroutines, there is a special tool besides channels — a context (which we've also discussed). For synchronizing goroutines, there is also a special tool — a wait group. Let's talk about it.
Wait group
A wait group lets you wait for one or more goroutines to finish. We started with a wait group in the very first chapter on goroutines, and now we'll go into more detail.
Suppose we want to start a goroutine and wait for it to complete. Here's how to do it with a done channel:
func main() {
done := make(chan struct{}, 1)
go func() {
time.Sleep(50 * time.Millisecond)
fmt.Print(".")
done <- struct{}{}
}()
<-done
fmt.Println("done")
}
.done
And here's how to do it with a wait group:
func main() {
var wg sync.WaitGroup
wg.Add(1)
go func() {
time.Sleep(50 * time.Millisecond)
fmt.Print(".")
wg.Done()
}()
wg.Wait()
fmt.Println("done")
}
.done
Interestingly, a WaitGroup
doesn't know anything about the goroutines it manages. It works with an internal counter. Calling wg.Add(1)
increments the counter by one, while wg.Done()
decrements it. wg.Wait()
blocks the calling goroutine (in this case, main
) until the counter reaches zero. So, main()
waits for the called goroutine to finish before exiting.
Typically, if you just need to wait for goroutines to complete without needing a result from them, you use a wait group instead of a done channel.
✎ Exercise: From channel to wait group
Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.
If you are okay with just theory for now, let's continue.
Inner world
As we discussed, the wait group knows nothing about goroutines and works with a counter instead. This simplifies the implementation a lot. Conceptually, you can think of the wait group like this:
// A WaitGroup waits for a collection of goroutines to finish.
type WaitGroup struct {
n int
}
// Add adds delta to the WaitGroup counter.
func (wg *WaitGroup) Add(delta int) {
wg.n += delta
if wg.n < 0 {
panic("negative counter")
}
}
// Done decrements the WaitGroup counter by one.
func (wg *WaitGroup) Done() {
wg.Add(-1)
}
// Wait blocks until the WaitGroup counter is zero.
func (wg *WaitGroup) Wait() {
for wg.n > 0 {
}
}
func main() {
var wg WaitGroup
wg.Add(1)
go func() {
time.Sleep(50 * time.Millisecond)
fmt.Print(".")
wg.Done()
}()
wg.Wait()
fmt.Println("done")
}
.done
Of course, in practice it's more complicated :
- All methods can be called concurrently from multiple goroutines. Modifying the shared variable
n
from multiple goroutines is unsafe — concurrent access can corrupt data (we'll talk more about this in the chapter on data races). - A loop-based
Wait
implementation will max out a CPU core until the loop finishes (this type of waiting is also known as busy waiting). Such code is strongly discouraged in production.
However, our naive implementation shows the properties of a wait group that are also present in the actual sync.WaitGroup
:
Add
increments or decrements (ifdelta < 0
) the counter. Positive deltas are much more common, but technically nothing prevents you from callingAdd(-1)
.Wait
blocks execution until the counter reaches 0. So if you callWait
before the firstAdd
, the goroutine won't block.- After
Wait
completes, the wait group returns to its initial state (counter is 0). You can then reuse it.
Value vs. pointer
Another important implementation nuance: you should pass the wait group as a pointer (*WaitGroup
), not as a value (WaitGroup
). Otherwise, each recipient will get its own copy with a duplicate counter, and synchronization won't work.
Here's an example of passing a value:
func runWork(wg sync.WaitGroup) {
wg.Add(1)
go func() {
time.Sleep(50 * time.Millisecond)
fmt.Println("work done")
wg.Done()
}()
}
func main() {
var wg sync.WaitGroup
runWork(wg)
wg.Wait()
fmt.Println("all done")
}
all done
runWork
got a copy of the group and increased its counter with Add
. Meanwhile, main
has its own copy with a zero counter, so Wait
didn't block execution. As a result, main
finished without waiting for the runWork
goroutine to complete.
Here's an example of passing a pointer:
func runWork(wg *sync.WaitGroup) {
wg.Add(1)
go func() {
time.Sleep(50 * time.Millisecond)
fmt.Println("work done")
wg.Done()
}()
}
func main() {
var wg sync.WaitGroup
runWork(&wg)
wg.Wait()
fmt.Println("all done")
}
work done
all done
Now runWork
and main
share the same instance of the group, so everything works as it should.
An even better approach would be not to pass the wait group around at all. Instead, we can encapsulate it in a separate type that hides the implementation details and provides a nice interface. Let's see how to do that.
Encapsulation
In Go, it's considered a good practice to hide synchronization details from clients calling your code. Fellow developers won't thank you for forcing them to deal with wait groups. It's better to encapsulate the synchronization logic in a separate function or type, and provide a convenient interface.
Wrapper functions
Let's say I wrote a function called RunConc
that runs a set of given functions concurrently:
// RunConc executes functions concurrently.
func RunConc(wg *sync.WaitGroup, funcs ...func()) {
wg.Add(len(funcs))
for _, fn := range funcs {
go func() {
defer wg.Done()
fn()
}()
}
}
And I suggest calling it this way:
func main() {
work := func() {
time.Sleep(50 * time.Millisecond)
fmt.Print(".")
}
start := time.Now()
var wg sync.WaitGroup
RunConc(&wg, work, work, work)
wg.Wait()
elapsed := time.Now().Sub(start).Milliseconds()
fmt.Printf("took %d ms\n", elapsed)
}
...took 50 ms
Is this convenient, given that the client just wants to run functions concurrently and wait for them to finish? Not really.
It's better to hide the wait group inside a function:
// RunConc executes functions concurrently and waits for them to finish.
func RunConc(funcs ...func()) {
var wg sync.WaitGroup
wg.Add(len(funcs))
for _, fn := range funcs {
go func() {
defer wg.Done()
fn()
}()
}
wg.Wait()
}
Now you can call it like this:
func main() {
work := func() {
time.Sleep(50 * time.Millisecond)
fmt.Print(".")
}
start := time.Now()
RunConc(work, work, work)
elapsed := time.Now().Sub(start).Milliseconds()
fmt.Printf("took %d ms\n", elapsed)
}
...took 50 ms
The client doesn't need to know how RunConc
does its job. It just works, and that's great.
Wrapper types
Suppose other developers tried RunConc
and didn't like it. They say they prefer to add functions one at a time and then run them all together later. They also want to run a set of functions multiple times.
Okay, I'll rewrite RunConc
as a ConcRunner
type:
// ConcRunner executes functions concurrently.
type ConcRunner struct {
wg sync.WaitGroup
funcs []func()
}
// NewConcRunner creates a new ConcRunner instance.
func NewConcRunner() *ConcRunner {
return &ConcRunner{}
}
// Add adds a function without executing it.
func (cg *ConcRunner) Add(fn func()) {
cg.funcs = append(cg.funcs, fn)
}
// Run executes functions concurrently and waits for them to finish.
func (cg *ConcRunner) Run() {
cg.wg.Add(len(cg.funcs))
for _, fn := range cg.funcs {
go func() {
defer cg.wg.Done()
fn()
}()
}
cg.wg.Wait()
}
You might ask: why is the
wg
field inConcRunner
defined as aWaitGroup
value instead of a*WaitGroup
pointer? It's becauseConcRunner
itself is used as a pointer: the constructor returns a*ConcRunner
, and methods are defined on it. So the methods use the samewg
value, avoiding counter issues.
The wait group is hidden in the type's fields, while the client still has a clean interface without the messy details:
func main() {
cr := NewConcRunner()
// add functions to the runner
cr.Add(work)
cr.Add(work)
cr.Add(work)
// run the functions concurrently
timeit(cr)
// and again
timeit(cr)
}
...took 50 ms
...took 50 ms
In rare cases, a client may want to explicitly access your code's synchronization machinery. But usually it's better to encapsulate the synchronization logic.
✎ Exercise: Concurrent group
Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.
If you are okay with just theory for now, let's continue.
Add after Wait
Normally, all Add
calls happen before Wait
. But technically, there's nothing stopping us from doing some of the Add
calls before Wait
and some after (from another goroutine).
Let's say we have a function runWork
that does its job in a separate goroutine:
// runWork performs work in a goroutine.
func runWork(wg *sync.WaitGroup) {
wg.Add(1)
fmt.Println("starting work...")
go func() {
time.Sleep(50 * time.Millisecond)
fmt.Println("work done")
wg.Done()
}()
}
We'll do the following:
- Start a
runWork
goroutine (worker); - Start another goroutine to wait for the work to finish (waiter);
- Start two more workers;
- When all three workers have finished, the waiter will wake up and signal completion to the
main
function.
func main() {
// main wait group
var wgMain sync.WaitGroup
// worker wait group
var wgWork sync.WaitGroup
// run the first worker
runWork(&wgWork)
// the waiter goroutine waits for all workers to finish,
// and then completes the main wait group
wgMain.Add(1)
go func() {
fmt.Println("waiting for work to be done...")
wgWork.Wait()
fmt.Println("all work done")
wgMain.Done()
}()
// run two more workers after a while
time.Sleep(10 * time.Millisecond)
runWork(&wgWork)
runWork(&wgWork)
// executes when the waiter goroutine finishes
wgMain.Wait()
}
starting work...
waiting for work to be done...
starting work...
starting work...
work done
work done
work done
all work done
This is rarely used in practice.
Multiple Waits
Another not-so-popular WaitGroup
feature: you can call Wait
from multiple goroutines. They will all block until the group's counter reaches zero.
For example, we can start one worker and three waiters:
func main() {
var wg sync.WaitGroup
// worker
wg.Add(1)
go func() {
// do stuff
time.Sleep(50 * time.Millisecond)
fmt.Println("work done")
wg.Done()
}()
// first waiter
go func() {
wg.Wait()
fmt.Println("waiter 1 done")
}()
// second waiter
go func() {
wg.Wait()
fmt.Println("waiter 2 done")
}()
// main waiter
wg.Wait()
fmt.Println("main waiter done")
}
work done
waiter 1 done
waiter 2 done
main waiter done
All waiters unblock after the worker calls wg.Done()
. But the order in which this happens is not guaranteed. Could be this:
work done
waiter 1 done
waiter 2 done
main waiter done
Or this:
work done
waiter 1 done
main waiter done
waiter 2 done
Or even this:
work done
main waiter done
In the last case, the main waiter finished first, and then main
exited before the other waiters could even print anything.
We'll see another use case for multiple Wait
s in the chapter on semaphores.
✎ Exercise: Waiting for worker
Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.
If you are okay with just theory for now, let's continue.
Panic
If multiple goroutines are involved in the wait group, there are multiple possible panic sources.
Let's say there's a work
function that panics on even numbers:
func work() {
if n := rand.Intn(9) + 1; n%2 == 0 {
panic(fmt.Errorf("bad number: %d", n))
}
// do stuff
}
We start four work
goroutines:
func main() {
var wg sync.WaitGroup
for range 4 {
wg.Add(1)
go func() {
work()
wg.Done()
}()
}
wg.Wait()
fmt.Println("work done")
}
panic: bad number: 8
goroutine 9 [running]:
main.work()
/sandbox/src/main.go:19 +0x6e
main.main.func1()
/sandbox/src/main.go:29 +0x1c
created by main.main in goroutine 1
/sandbox/src/main.go:28 +0x30 (exit status 2)
And we face a panic (unless we are very lucky).
Shared recover
Let's add recover
to catch the panic and run the program again:
func main() {
defer func() {
val := recover()
if val == nil {
fmt.Println("work done")
} else {
fmt.Println("panicked!")
}
}()
var wg sync.WaitGroup
for range 4 {
wg.Add(1)
go func() {
work()
wg.Done()
}()
}
wg.Wait()
}
panic: bad number: 6
goroutine 21 [running]:
main.work()
/sandbox/src/main.go:19 +0x73
main.main.func2()
/sandbox/src/main.go:38 +0x1c
created by main.main in goroutine 1
/sandbox/src/main.go:37 +0x4d (exit status 2)
Nope. You might expect recover
to catch the panic and print "panicked". But instead we get the same unhandled panic as before.
The problem is that recover
has an important limitation: it only works within the same goroutine that caused the panic. In our case, the panic comes from the work
goroutines, while recover
runs in the main
goroutine — so it doesn't catch the panic. Goroutines are completely independent, remember? You can only catch the panic happening in those goroutines themselves.
Per-goroutine recover
Let's move recover
inside the work
goroutines:
func main() {
var wg sync.WaitGroup
panicked := false
for range 4 {
wg.Add(1)
go func() {
defer func() {
err := recover()
if err != nil {
panicked = true
}
wg.Done()
}()
work()
}()
}
wg.Wait()
if !panicked {
fmt.Println("work done")
} else {
fmt.Println("panicked!")
}
}
panicked!
Now, the panic is caught in its own goroutine, which then sets the panicked
flag in the main
goroutine. Now the program works fine and prints "panicked" as we expected.
Here we are modifying the shared
panicked
variable from multiple goroutines. In general, this is not a good practice because it leads to data races (we'll talk about them in the next chapter). But in this particular case, there's no real harm from races.
Key takeaway: you cannot catch a panic from "child" goroutines in the "parent" goroutine. If you want to catch a panic, do it in the goroutine where it happens.
✎ Exercise: Concurrent group with panic handling
Practice is crucial in turning abstract knowledge into skills, making theory alone insufficient. The full version of the book contains a lot of exercises — that's why I recommend getting it.
If you are okay with just theory for now, let's continue.
Keep it up
The wait group is used to wait for goroutines to finish. Now you understand how it works and how to apply it. In the next chapter, we'll talk about data races (coming soon).
Pre-order for $10 or read online
★ Subscribe to keep up with new posts.