Detecting goroutine leaks with synctest/pprof

Deadlocks, race conditions, and goroutine leaks are probably the three most common problems in concurrent Go programming. Deadlocks usually cause panics, so they're easier to spot. The race detector can help find data races (although it doesn't catch everything and doesn't help with other types of race conditions). As for goroutine leaks, Go's tooling did not address them for a long time.

A leak occurs when one or more goroutines are indefinitely blocked on synchronization primitives like channels, while other goroutines continue running and the program as a whole keeps functioning. We'll look at some examples shortly.

Things started to change in Go 1.24 with the introduction of the synctest package. There will be even bigger changes in Go 1.26, which adds a new experimental goroutineleak profile that reports leaked goroutines. Let's take a look!

A simple leak • Detection: goleak • Detection: synctest • Detection: pprof • Algorithm • Range over channel • Double send • Early return • Take first • Cancel/timeout • Orphans • Final thoughts

A simple leak

Let's say there's a function that runs the given functions concurrently and sends their results to an output channel:

// Gather runs the given functions concurrently
// and collects the results.
func Gather(funcs ...func() int) <-chan int {
    out := make(chan int)
    for _, f := range funcs {
        go func() {
            out <- f()
        }()
    }
    return out
}

And a simple test:

func Test(t *testing.T) {
    out := Gather(
        func() int { return 11 },
        func() int { return 22 },
        func() int { return 33 },
    )

    total := 0
    for range 3 {
        total += <-out
    }

    if total != 66 {
        t.Errorf("got %v, want 66", total)
    }
}

PASS

Send three functions to be executed and collect the results from the output channel. The test passed, so the function works correctly. But does it really?

Let's pass three functions to Gather without collecting the results, and count the goroutines:

func main() {
    Gather(
        func() int { return 11 },
        func() int { return 22 },
        func() int { return 33 },
    )

    time.Sleep(50 * time.Millisecond)
    nGoro := runtime.NumGoroutine() - 1 // minus the main goroutine
    fmt.Println("nGoro =", nGoro)
}

nGoro = 3

After 50 ms — when all the functions should definitely have finished — there are still three running goroutines (runtime.NumGoroutine). In other words, all the goroutines are stuck.

The reason is that the out channel is unbuffered. If the client doesn't read from it, or doesn't read all the results, the goroutines inside Gather get blocked on sending the f() result to out.

Let's modify the test to catch the leak.

Detecting the leak: goleak

Obviously, we don't want to rely on runtime.NumGoroutine in tests — such check is too fragile. Let's use a third-party goleak package instead:

// Gather runs the given functions concurrently
// and collects the results.
func Gather(funcs ...func() int) <-chan int {
    out := make(chan int)
    for _, f := range funcs {
        go func() {
            out <- f()
        }()
    }
    return out
}

func Test(t *testing.T) {
    defer goleak.VerifyNone(t)

    Gather(
        func() int { return 11 },
        func() int { return 22 },
        func() int { return 33 },
    )
}

playground ▶

--- FAIL: Test (0.44s)
goleak_test.go:28: found unexpected goroutines:

Goroutine 8 in state chan send, with play.Gather.func1 on top of the stack:
play.Gather.func1()
    /tmp/sandbox4216740326/prog_test.go:16 +0x37
created by play.Gather in goroutine 7
    /tmp/sandbox4216740326/prog_test.go:15 +0x45

Goroutine 9 in state chan send, with play.Gather.func1 on top of the stack:
play.Gather.func1()
    /tmp/sandbox4216740326/prog_test.go:16 +0x37
created by play.Gather in goroutine 7
    /tmp/sandbox4216740326/prog_test.go:15 +0x45

Goroutine 10 in state chan send, with play.Gather.func1 on top of the stack:
play.Gather.func1()
    /tmp/sandbox4216740326/prog_test.go:16 +0x37
created by play.Gather in goroutine 7
    /tmp/sandbox4216740326/prog_test.go:15 +0x45

The test output clearly shows where the leak occurs.

Goleak uses time.Sleep internally, but it does so quite efficiently. It inspects the stack for unexpected goroutines up to 20 times, with the wait time between checks increasing exponentially, starting at 1 microsecond and going up to 100 milliseconds. This way, the test runs almost instantly.

Still, I'd prefer not to use third-party packages and time.Sleep.

Detecting the leak: synctest

Let's check for leaks without any third-party packages by using the synctest package (experimental in Go 1.24, production-ready in Go 1.25+):

// Gather runs the given functions concurrently
// and collects the results.
func Gather(funcs ...func() int) <-chan int {
    out := make(chan int)
    for _, f := range funcs {
        go func() {
            out <- f()
        }()
    }
    return out
}

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        Gather(
            func() int { return 11 },
            func() int { return 22 },
            func() int { return 33 },
        )
        synctest.Wait()
    })
}

--- FAIL: Test (0.00s)
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked]

goroutine 10 [chan send (durable), synctest bubble 1]:
sandbox.Gather.func1()
    /tmp/sandbox/main_test.go:34 +0x37
created by sandbox.Gather in goroutine 9
    /tmp/sandbox/main_test.go:33 +0x45

goroutine 11 [chan send (durable), synctest bubble 1]:
sandbox.Gather.func1()
    /tmp/sandbox/main_test.go:34 +0x37
created by sandbox.Gather in goroutine 9
    /tmp/sandbox/main_test.go:33 +0x45

goroutine 12 [chan send (durable), synctest bubble 1]:
sandbox.Gather.func1()
    /tmp/sandbox/main_test.go:34 +0x37
created by sandbox.Gather in goroutine 9
    /tmp/sandbox/main_test.go:33 +0x45

I'll keep this explanation short since synctest isn't the main focus of this article. If you want to learn more about it, check out the Concurrency testing guide. I highly recommend it — synctest is super useful!

Here's what happens:

The call to synctest.Test starts a testing bubble in a separate goroutine.
The call to Gather starts three goroutines.
The call to synctest.Wait blocks the root bubble goroutine.
One of the goroutines executes f, tries to write to out, and gets blocked (because no one is reading from out).
The same thing happens to the other two goroutines.
synctest.Wait sees that all the child goroutines in the bubble are durably blocked, so it unblocks the root goroutine.
The inner test function finishes.

Next, synctest.Test comes into play. It tries to wait for all child goroutines to finish before it returns. But if it sees that some goroutines are durably blocked (in our case, all three are blocked trying to send to the channel), it panics:

main bubble goroutine has exited but blocked goroutines remain

So, here we found the leak without using time.Sleep or goleak. Pretty useful!

Detecting the leak: pprof

Let's check for leaks using the new profile type goroutineleak (experimental in Go 1.26). We'll use a helper function to run the profiled code and print the results when the profile is ready:

func printLeaks(f func()) {
    prof := pprof.Lookup("goroutineleak")

    defer func() {
        time.Sleep(50 * time.Millisecond)
        var content strings.Builder
        prof.WriteTo(&content, 2)
        // Print only the leaked goroutines.
        goros := strings.Split(content.String(), "\n\n")
        for _, goro := range goros {
            if strings.Contains(goro, "(leaked)") {
                fmt.Println(goro + "\n")
            }
        }
    }()

    f()
}

(If you try this locally, don't forget to set the GOEXPERIMENT=goroutineleakprofile environment variable.)

Call Gather with three functions and observe all three leaks:

func main() {
    printLeaks(func() {
        Gather(
            func() int { return 11 },
            func() int { return 22 },
            func() int { return 33 },
        )
    })
}

goroutine 5 [chan send (leaked)]:
main.Gather.func1()
    /tmp/sandbox/main.go:35 +0x37
created by main.Gather in goroutine 1
    /tmp/sandbox/main.go:34 +0x45

goroutine 6 [chan send (leaked)]:
main.Gather.func1()
    /tmp/sandbox/main.go:35 +0x37
created by main.Gather in goroutine 1
    /tmp/sandbox/main.go:34 +0x45

goroutine 7 [chan send (leaked)]:
main.Gather.func1()
    /tmp/sandbox/main.go:35 +0x37
created by main.Gather in goroutine 1
    /tmp/sandbox/main.go:34 +0x45

We have a nice goroutine stack trace that shows exactly where the leak happens. Unfortunately, we had to use time.Sleep again, so this probably isn't the best way to test — unless we combine it with synctest to use the fake clock.

On the other hand, we can collect a profile from a running program, which makes it really useful for finding leaks in production systems (unlike synctest). Pretty neat.

Leak detection algorithm

This goroutineleak profile uses the garbage collector's marking phase to find goroutines that are permanently blocked (leaked). The approach is explained in detail in the proposal and the paper by Saioc et al. — check it out if you're interested.

Here's the gist of it:

   [ Start: GC mark phase ]
             │
             │ 1. Collect live goroutines
             v
   ┌───────────────────────┐
   │   Initial roots       │ <────────────────┐
   │ (runnable goroutines) │                  │
   └───────────────────────┘                  │
             │                                │
             │ 2. Mark reachable memory       │
             v                                │
   ┌───────────────────────┐                  │
   │   Reachable objects   │                  │
   │  (channels, mutexes)  │                  │
   └───────────────────────┘                  │
             │                                │
             │ 3a. Check blocked goroutines   │
             v                                │
   ┌───────────────────────┐          (Yes)   │
   │ Is blocked G waiting  │ ─────────────────┘
   │ on a reachable obj?   │ 3b. Add G to roots
   └───────────────────────┘
             │
             │ (No - repeat until no new Gs found)
             v
   ┌───────────────────────┐
   │   Remaining blocked   │
   │      goroutines       │
   └───────────────────────┘
             │
             │ 5. Report the leaks
             v
      [   LEAKED!   ]
 (Blocked on unreachable
  synchronization objects)

Collect live goroutines. Start with currently active (runnable or running) goroutines as roots. Ignore blocked goroutines for now.
Mark reachable memory. Trace pointers from roots to find which synchronization objects (like channels or wait groups) are currently reachable by these roots.
Resurrect blocked goroutines. Check all currently blocked goroutines. If a blocked goroutine is waiting for a synchronization resource that was just marked as reachable — add that goroutine to the roots.
Iterate. Repeat steps 2 and 3 until there are no more new goroutines blocked on reachable objects.
Report the leaks. Any goroutines left in the blocked state are waiting for resources that no active part of the program can access. They're considered leaked.

In the rest of the article, we'll review the different types of leaks often observed in production and see whether synctest and goroutineleak are able to detect each of them (spoiler: they are).

Based on the code examples from the common-goroutine-leak-patterns repository by Georgian-Vlad Saioc, licensed under the Apache-2.0 license.

Range over channel

One or more goroutines receive from a channel using range, but the sender never closes the channel, so all the receivers eventually leak:

func RangeOverChan(list []any, workers int) {
    ch := make(chan any)

    // Launch workers.
    for range workers {
        go func() {
            // Each worker processes items one by one.
            // The channel is never closed, so every worker leaks
            // once there are no more items left to process.
            for item := range ch {
                _ = item
            }
        }()
    }

    // Send items for processing.
    for _, item := range list {
        ch <- item
    }

    // close(ch) // (X) uncomment to fix
}

Using synctest:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        RangeOverChan([]any{11, 22, 33, 44}, 2)
        synctest.Wait()
    })
}

panic: deadlock: main bubble goroutine has exited but blocked goroutines remain

goroutine 10 [chan receive (durable), synctest bubble 1]:
sandbox.RangeOverChan.func1()
    /tmp/sandbox/main_test.go:36 +0x34
created by sandbox.RangeOverChan in goroutine 9
    /tmp/sandbox/main_test.go:34 +0x45

goroutine 11 [chan receive (durable), synctest bubble 1]:
sandbox.RangeOverChan.func1()
    /tmp/sandbox/main_test.go:36 +0x34
created by sandbox.RangeOverChan in goroutine 9
    /tmp/sandbox/main_test.go:34 +0x45

Using goroutineleak:

func main() {
    printLeaks(func() {
        RangeOverChan([]any{11, 22, 33, 44}, 2)
    })
}

goroutine 19 [chan receive (leaked)]:
main.RangeOverChan.func1()
    /tmp/sandbox/main.go:36 +0x34
created by main.RangeOverChan in goroutine 1
    /tmp/sandbox/main.go:34 +0x45

goroutine 20 [chan receive (leaked)]:
main.RangeOverChan.func1()
    /tmp/sandbox/main.go:36 +0x34
created by main.RangeOverChan in goroutine 1
    /tmp/sandbox/main.go:34 +0x45

Notice how synctest and goroutineleak give almost the same stack traces, clearly showing the root cause of the problem. You'll see this in the next examples as well.

Fix: The sender should close the channel after it finishes sending.

Try uncommenting the ⓧ line and see if both checks pass.

Double send

The sender accidentally sends more values to a channel than intended, and leaks:

func DoubleSend() <-chan any {
    ch := make(chan any)

    go func() {
        res, err := work(13)
        if err != nil {
            // In case of an error, send nil.
            ch <- nil
            // return // (X) uncomment to fix
        }
        // Otherwise, continue with normal behaviour.
        // This leaks if err != nil.
        ch <- res
    }()

    return ch
}

Using synctest:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        <-DoubleSend()
        synctest.Wait()
    })
}

panic: deadlock: main bubble goroutine has exited but blocked goroutines remain

goroutine 22 [chan send (durable), synctest bubble 1]:
sandbox.DoubleSend.func1()
    /tmp/sandbox/main_test.go:42 +0x4c
created by sandbox.DoubleSend in goroutine 21
    /tmp/sandbox/main_test.go:32 +0x5f

Using goroutineleak:

func main() {
    printLeaks(func() {
        <-DoubleSend()
    })
}

goroutine 19 [chan send (leaked)]:
main.DoubleSend.func1()
    /tmp/sandbox/main.go:42 +0x4c
created by main.DoubleSend in goroutine 1
    /tmp/sandbox/main.go:32 +0x67

Fix: Make sure that each possible path in the code sends to the channel no more times than the receiver is ready for. Alternatively, make the channel's buffer large enough to handle all possible sends.

Try uncommenting the ⓧ line and see if both checks pass.

Early return

The parent goroutine exits without receiving a value from the child goroutine, so the child leaks:

func EarlyReturn() {
    ch := make(chan any) // (X) should be buffered

    go func() {
        res, _ := work(42)
        // Leaks if the parent goroutine terminates early.
        ch <- res
    }()

    _, err := work(13)
    if err != nil {
        // Early return in case of error.
        // The child gorouine leaks.
        return
    }

    // Only receive if there is no error.
    <-ch
}

Using synctest:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        EarlyReturn()
        synctest.Wait()
    })
}

panic: deadlock: main bubble goroutine has exited but blocked goroutines remain

goroutine 22 [chan send (durable), synctest bubble 1]:
sandbox.EarlyReturn.func1()
    /tmp/sandbox/main_test.go:35 +0x45
created by sandbox.EarlyReturn in goroutine 21
    /tmp/sandbox/main_test.go:32 +0x5f

Using goroutineleak:

func main() {
    printLeaks(func() {
        EarlyReturn()
    })
}

goroutine 7 [chan send (leaked)]:
main.EarlyReturn.func1()
    /tmp/sandbox/main.go:35 +0x45
created by main.EarlyReturn in goroutine 1
    /tmp/sandbox/main.go:32 +0x67

Fix: Make the channel buffered so the child goroutine doesn't get blocked when sending.

Try making the channel buffered at line ⓧ and see if both checks pass.

Cancel/timeout

Similar to "early return". If the parent is canceled before receiving a value from the child goroutine, the child leaks:

func Canceled(ctx context.Context) {
    ch := make(chan any) // (X) should be buffered

    go func() {
        res, _ := work(100)
        // Leaks if the parent goroutine gets canceled.
        ch <- res
    }()

    // Wait for the result or for cancellation.
    select {
    case <-ctx.Done():
        // The child goroutine leaks.
        return
    case res := <-ch:
        // Process the result.
        _ = res
    }
}

Using synctest:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        ctx, cancel := context.WithCancel(t.Context())
        cancel()
        Canceled(ctx)

        time.Sleep(time.Second)
        synctest.Wait()
    })
}

panic: deadlock: main bubble goroutine has exited but blocked goroutines remain

goroutine 22 [chan send (durable), synctest bubble 1]:
sandbox.Canceled.func1()
    /tmp/sandbox/main_test.go:35 +0x45
created by sandbox.Canceled in goroutine 21
    /tmp/sandbox/main_test.go:32 +0x76

Using goroutineleak:

func main() {
    printLeaks(func() {
        ctx, cancel := context.WithCancel(context.Background())
        cancel()
        Canceled(ctx)
    })
}

goroutine 19 [chan send (leaked)]:
main.Canceled.func1()
    /tmp/sandbox/main.go:35 +0x45
created by main.Canceled in goroutine 1
    /tmp/sandbox/main.go:32 +0x7b

Fix: Make the channel buffered so the child goroutine doesn't get blocked when sending.

Try making the channel buffered at line ⓧ and see if both checks pass.

Take first

The parent launches N child goroutines, but is only interested in the first result. The rest N-1 children leak:

func TakeFirst(items []any) {
    ch := make(chan any)

    // Iterate over every item.
    for _, item := range items {
        go func() {
            ch <- process(item)
        }()
    }

    // Retrieve the first result. All other children leak.
    // Also, the parent leaks if len(items) == 0.
    <-ch
}

Using synctest (zero items, the parent leaks):

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        go TakeFirst(nil)
        synctest.Wait()
    })
}

panic: deadlock: main bubble goroutine has exited but blocked goroutines remain

goroutine 22 [chan receive (durable), synctest bubble 1]:
sandbox.TakeFirst({0x0, 0x0, 0x0?})
    /tmp/sandbox/main_test.go:40 +0xdd
created by sandbox.Test.func1 in goroutine 21
    /tmp/sandbox/main_test.go:44 +0x1a

Using synctest (multiple items, children leak):

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        go TakeFirst([]any{11, 22, 33})
        synctest.Wait()
    })
}

panic: deadlock: main bubble goroutine has exited but blocked goroutines remain

goroutine 10 [chan send (durable), synctest bubble 1]:
sandbox.TakeFirst.func1()
    /tmp/sandbox/main_test.go:35 +0x2e
created by sandbox.TakeFirst in goroutine 9
    /tmp/sandbox/main_test.go:34 +0x51

goroutine 11 [chan send (durable), synctest bubble 1]:
sandbox.TakeFirst.func1()
    /tmp/sandbox/main_test.go:35 +0x2e
created by sandbox.TakeFirst in goroutine 9
    /tmp/sandbox/main_test.go:34 +0x51

Using goroutineleak (zero items, the parent leaks):

func main() {
    printLeaks(func() {
        go TakeFirst(nil)
    })
}

goroutine 19 [chan receive (leaked)]:
main.TakeFirst({0x0, 0x0, 0x0?})
    /tmp/sandbox/main.go:40 +0xeb
created by main.main.func1 in goroutine 1
    /tmp/sandbox/main.go:44 +0x1a

Using goroutineleak (multiple items, children leak):

func main() {
    printLeaks(func() {
        go TakeFirst([]any{11, 22, 33})
    })
}

goroutine 20 [chan send (leaked)]:
main.TakeFirst.func1()
    /tmp/sandbox/main.go:35 +0x2e
created by main.TakeFirst in goroutine 19
    /tmp/sandbox/main.go:34 +0x51

goroutine 21 [chan send (leaked)]:
main.TakeFirst.func1()
    /tmp/sandbox/main.go:35 +0x2e
created by main.TakeFirst in goroutine 19
    /tmp/sandbox/main.go:34 +0x51

Fix: Make the channel's buffer large enough to hold values from all child goroutines. Also, return early if the source collection is empty.

Try changing the TakeFirst implementation as follows and see if both checks pass:

func TakeFirst(items []any) {
    if len(items) == 0 {
        // Return early if the source collection is empty.
        return
    }
    // Make the channel's buffer large enough.
    ch := make(chan any, len(items))

    // Iterate over every item
    for _, item := range items {
        go func() {
            ch <- process(item)
        }()
    }

    // Retrieve first result.
    <-ch
}

Orphans

Inner goroutines leak because the client doesn't follow the contract described in the type's interface and documentation.

Let's say we have a Worker type with the following contract:

// A worker processes a queue of items one by one in the background.
// A started worker must eventually be stopped.
// Failing to stop a worker results in a goroutine leak.
type Worker struct {
    // ...
}

// NewWorker creates a new worker.
func NewWorker() *Worker

// Start starts the processing.
func (w *Worker) Start()

// Stop stops the the processing.
func (w *Worker) Stop()

// Push adds an item to the processing queue.
func (w *Worker) Push(item any)

The implementation isn't particularly important — what really matters is the public contract.

Let's say the client breaks the contract and doesn't stop the worker:

func Orphans() {
    w := NewWorker()
    w.Start()
    // defer w.Stop() // (X) uncomment to fix

    items := make([]any, 10)
    for _, item := range items {
        w.Push(item)
    }
}

Then the worker goroutines will leak, just like the documentation says.

Using synctest:

func Test(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        Orphans()
        synctest.Wait()
    })
}

panic: deadlock: main bubble goroutine has exited but blocked goroutines remain

goroutine 10 [select (durable), synctest bubble 1]:
sandbox.(*Worker).run(0xc00009c190)
    /tmp/sandbox/main_test.go:113 +0xcc
created by sandbox.(*Worker).Start.func1 in goroutine 9
    /tmp/sandbox/main_test.go:89 +0xb6

goroutine 11 [select (durable), synctest bubble 1]:
sandbox.(*Worker).run(0xc00009c190)
    /tmp/sandbox/main_test.go:113 +0xcc
created by sandbox.(*Worker).Start.func1 in goroutine 9
    /tmp/sandbox/main_test.go:90 +0xf6

Using goroutineleak:

func main() {
    printLeaks(func() {
        Orphans()
    })
}

goroutine 19 [select (leaked)]:
main.(*Worker).run(0x147fe4630000)
    /tmp/sandbox/main.go:112 +0xce
created by main.(*Worker).Start.func1 in goroutine 1
    /tmp/sandbox/main.go:88 +0xba

goroutine 20 [select (leaked)]:
main.(*Worker).run(0x147fe4630000)
    /tmp/sandbox/main.go:112 +0xce
created by main.(*Worker).Start.func1 in goroutine 1
    /tmp/sandbox/main.go:89 +0x105

Fix: Follow the contract and stop the worker to make sure all goroutines are stopped.

Try uncommenting the ⓧ line and see if both checks pass.

Final thoughts

Thanks to improvements in Go 1.24-1.26, it's now much easier to catch goroutine leaks, both during testing and in production.

The synctest package is available in 1.24 (experimental) and 1.25+ (production-ready). If you're interested, I have a detailed interactive guide on it.

The goroutineleak profile will be available in 1.26 (experimental). According to the authors, the implementation is already production-ready. It's only marked as experimental so they can get feedback on the API, especially about making it a new profile.

See the proposal and the commits for more details on goroutineleak:

𝗣 74609, 75280 • 𝗖𝗟 688335 • 𝗔 Vlad Saioc

P.S. If you are into concurrency, check out my interactive book.

★ Subscribe to keep up with new posts.