Compact objects in Python
Working with objects in Python is all nice and cozy until you run out of memory holding 10 million instances. Let's discuss how to reduce memory usage.
All measurements are done using Python 3.12:
import sys
print(sys.version)
3.12.3 (main, May 14 2024, 07:34:56) [GCC 12.2.0]
Tuples
Imagine you have a simple Pet
object with the name
(string) and price
(integer) attributes. Intuitively, it seems that the most compact representation is a tuple:
("Frank the Pigeon", 50000)
Let's measure how much memory this beauty eats:
import random
import string
from pympler.asizeof import asizeof
def fields():
name_gen = (random.choice(string.ascii_uppercase) for _ in range(10))
name = "".join(name_gen)
price = random.randint(10000, 99999)
return (name, price)
def measure(name, fn, n=10_000):
pets = [fn() for _ in range(n)]
size = round(asizeof(pets) / n)
print(f"Pet size ({name}) = {size} bytes")
return size
baseline = measure("tuple", fields)
Pet size (tuple) = 153 bytes
153 bytes. We'll use that as a baseline for further comparison.
Dataclasses vs named tuples
Who works with tuples these days? You would probably choose a dataclass:
from dataclasses import dataclass
@dataclass
class PetData:
name: str
price: int
fn = lambda: PetData(*fields())
base = measure("baseline", fields)
measure("dataclass", fn, baseline=base)
Pet size (baseline) = 153 bytes
Pet size (dataclass) = 233 bytes
x1.52 to baseline
The thing is, it's 1.5 times larger than a tuple.
Let's try a named tuple then:
from typing import NamedTuple
class PetTuple(NamedTuple):
name: str
price: int
fn = lambda: PetTuple(*fields())
base = measure("baseline", fields)
measure("named tuple", fn, baseline=base)
Pet size (baseline) = 153 bytes
Pet size (named tuple) = 153 bytes
x1.00 to baseline
Looks like a dataclass, works like a tuple. Perfect. Or not?
Slots
Python 3.10 received dataclasses with slots:
@dataclass(slots=True)
class PetData:
name: str
price: int
fn = lambda: PetData(*fields())
base = measure("baseline", fields)
measure("dataclass w/slots", fn, baseline=base)
Pet size (baseline) = 153 bytes
Pet size (dataclass w/slots) = 145 bytes
x0.95 to baseline
Wow! Slots magic creates skinny objects without an underlying dictionary, unlike regular Python objects. Such a dataclass is even lighter than a tuple.
What if 3.10 is out of the question yet? Use NamedTuple
. Or add a slots dunder manually:
@dataclass
class PetData:
__slots__ = ("name", "price")
name: str
price: int
Slot objects have their own shortcomings. But they are great for simple cases (without inheritance and other complex stuff).
numpy arrays
The real winner, of course, is the numpy
array:
import numpy as np
PetNumpy = np.dtype([("name", "S10"), ("price", "i4")])
n = 10_000
generator = (fields() for _ in range(n))
pets = np.fromiter(generator, dtype=PetNumpy)
size = round(asizeof(pets) / n)
base = measure("baseline", fields)
print(f"Pet size (numpy array) = {size} bytes\nx{size/base:.2f} to baseline")
Pet size (baseline) = 153 bytes
Pet size (numpy array) = 14 bytes
x0.09 to baseline
This is not a flawless victory, though. If names are unicode (U
type instead of S
), the advantage is not so impressive:
import numpy as np
PetNumpy = np.dtype([("name", "U10"), ("price", "i4")])
n = 10_000
generator = (fields() for _ in range(n))
pets = np.fromiter(generator, dtype=PetNumpy)
size = round(asizeof(pets) / n)
base = measure("baseline", fields)
print(f"Pet size (numpy U10) = {size} bytes\nx{size/base:.2f} to baseline")
Pet size (baseline) = 153 bytes
Pet size (numpy U10) = 44 bytes
x0.29 to baseline
If the name length is not strictly 10 characters, but varies, say, up to 50 characters (U50
instead of U10
) — the advantage disappears completely:
import random
import numpy as np
def fields_var_name():
name_len = random.randint(10, 50)
name_gen = (random.choice(string.ascii_uppercase) for _ in range(name_len))
name = "".join(name_gen)
price = random.randint(10000, 99999)
return (name, price)
PetNumpy = np.dtype([("name", "U50"), ("price", "i4")])
n = 10_000
generator = (fields_var_name() for _ in range(n))
pets = np.fromiter(generator, dtype=PetNumpy)
size = round(asizeof(pets) / n)
base = measure("baseline", fields)
print(f"Pet size (numpy U50) = {size} bytes\nx{size/base:.2f} to baseline")
Pet size (baseline) = 153 bytes
Pet size (numpy U50) = 204 bytes
x1.33 to baseline
Others
Let's consider alternatives for completeness.
A regular class is no different than a dataclass:
class PetClass:
def __init__(self, name: str, price: int):
self.name = name
self.price = price
fn = lambda: PetClass(*fields())
base = measure("baseline", fields)
measure("class", fn, baseline=base)
Pet size (baseline) = 153 bytes
Pet size (class) = 233 bytes
x1.52 to baseline
And a frozen (immutable) dataclass too:
@dataclass(frozen=True)
class PetDataFrozen:
name: str
price: int
fn = lambda: PetDataFrozen(*fields())
base = measure("baseline", fields)
measure("frozen dataclass", fn, baseline=base)
Pet size (baseline) = 153 bytes
Pet size (frozen dataclass) = 233 bytes
x1.52 to baseline
A dict is even worse:
names = ("name", "price")
fn = lambda: dict(zip(names, fields()))
base = measure("baseline", fields)
measure("dict", fn, baseline=base)
Pet size (baseline) = 153 bytes
Pet size (dict) = 281 bytes
x1.84 to baseline
Pydantic model sets an anti-record (no wonder, it uses inheritance):
from pydantic import BaseModel
class PetModel(BaseModel):
name: str
price: int
names = ("name", "price")
fn = lambda: PetModel(**dict(zip(names, fields())))
base = measure("baseline", fields)
measure("pydantic", fn, baseline=base);
Pet size (baseline) = 153 bytes
Pet size (pydantic) = 353 bytes
x2.31 to baseline
Summary
Here are some Python object implementations, ranked from most to least compact:
- numpy (specific use cases only)
- Slotted dataclass.
- Named tuple / regular tuple.
- Dataclass / regular class.
- Dictionary.
- Pydantic model.
Slotted dataclasses are a safe default with Python 3.10+, but named tuples remain my favorite for their simplicity.
──
P.S. Interactive examples in this post are powered by codapi — an open source tool I'm building. Use it to embed live code snippets into your product docs, online course or blog.
★ Subscribe to keep up with new posts.