Go proposal: Hashers

Part of the Accepted! series, explaining the upcoming Go changes in simple terms.

Provide a consistent approach to hashing and equality checks in custom data structures.

Ver. 1.26 • Stdlib • Medium impact

Summary

The new maphash.Hasher interface is the standard way to hash and compare elements in custom collections:

// A Hasher implements hashing and equality for type T.
type Hasher[T any] interface {
    Hash(hash *maphash.Hash, value T)
    Equal(T, T) bool
}

The ComparableHasher type is the default hasher implementation for comparable types, like numbers, strings, and structs with comparable fields.

Motivation

The maphash package offers hash functions for byte slices and strings, but it doesn't provide any guidance on how to create custom hash-based data structures.

The proposal aims to improve this by introducing hasher — a standardized interface for hashing and comparing the members of a collection, along with a default implementation.

Description

Add the hasher interface to the maphash package:

// A Hasher is a type that implements hashing and equality for type T.
//
// A Hasher must be stateless and their zero value must be valid.
// Hence, typically, a Hasher will be an empty struct.
type Hasher[T any] interface {
    // Hash updates hash to reflect the contents of value.
    //
    // If two values are [Equal], they must also Hash the same.
    // Specifically, if Equal(a, b) is true, then Hash(h, a) and Hash(h, b)
    // must write identical streams to h.
    Hash(hash *maphash.Hash, value T)
    Equal(T, T) bool
}

Along with the default hasher implementation for comparable types:

// ComparableHasher is an implementation of [Hasher] for comparable types.
// Its Equal(x, y) method is consistent with x == y.
type ComparableHasher[T comparable] struct{}
func (h ComparableHasher[T]) Hash(hash *Hash, value T)
func (h ComparableHasher[T]) Equal(x, y T) bool

Example

Here's a case-insensitive string hasher:

// CaseInsensitive is a Hasher for case-insensitive string comparison.
type CaseInsensitive struct{}

func (CaseInsensitive) Hash(h *maphash.Hash, s string) {
    h.WriteString(strings.ToLower(s))
}

func (CaseInsensitive) Equal(a, b string) bool {
    return strings.EqualFold(a, b)
}

And a generic Set that uses a pluggable hasher for custom equality and hashing:

// Set is a generic set implementation with a pluggable hasher.
// Stores values in a map of buckets identified by the value's hash.
// If multiple values end up in the same bucket,
// uses linear search to find the right one.
type Set[H maphash.Hasher[V], V any] struct {
    seed   maphash.Seed   // random seed for hashing
    hasher H              // performs hashing and equality checks
    data   map[uint64][]V // buckets of values indexed by their hash
}

// NewSet creates a new Set instance with the provided hasher.
func NewSet[H maphash.Hasher[V], V any](hasher H) *Set[H, V] {
    return &Set[H, V]{
        seed:   maphash.MakeSeed(),
        hasher: hasher,
        data:   make(map[uint64][]V),
    }
}

The calcHash helper method uses the hasher to compute the hash of a value:

// calcHash computes the hash of a value.
func (s *Set[H, V]) calcHash(val V) uint64 {
    var h maphash.Hash
    h.SetSeed(s.seed)
    s.hasher.Hash(&h, val)
    return h.Sum64()
}

This hash is used in the Has and Add methods. It acts as a key in the bucket map to find the right bucket for a value.

Has checks if the value exists in the corresponding bucket:

// Has returns true if the given value is present in the set.
func (s *Set[H, V]) Has(val V) bool {
    hash := s.calcHash(val)
    if bucket, ok := s.data[hash]; ok {
        for _, item := range bucket {
            if s.hasher.Equal(val, item) {
                return true
            }
        }
    }
    return false
}

Add adds a value to the corresponding bucket:

// Add adds the value to the set if it is not already present.
func (s *Set[H, V]) Add(val V) {
    if s.Has(val) {
        return
    }
    hash := s.calcHash(val)
    s.data[hash] = append(s.data[hash], val)
}

Now we can create a case-insensitive string set:

func main() {
    set := NewSet(CaseInsensitive{})

    set.Add("hello")
    set.Add("world")

    fmt.Println(set.Has("HELLO")) // true
    fmt.Println(set.Has("world")) // true
}
true
true

Or a regular string set using maphash.ComparableHasher:

func main() {
    set := NewSet(maphash.ComparableHasher[string]{})

    set.Add("hello")
    set.Add("world")

    fmt.Println(set.Has("HELLO")) // false
    fmt.Println(set.Has("world")) // true
}
false
true

Further reading

𝗣 70471 • 𝗖𝗟 657296 (in progress)

★ Subscribe to keep up with new posts.