Compact objects in Python
Python is an object language. This is nice and cozy until you are out of memory holding 10 million objects at once. Let’s talk about how to reduce appetite.
Visit the Playground to try out the code samples
Tuples
Imagine you have a simple Pet
object with the name
(string) and price
(integer) attributes. Intuitively, it seems that the most compact representation is a tuple:
("Frank the Pigeon", 50000)
Let’s measure how much memory this beauty eats:
import random
from pympler.asizeof import asizeof
def fields():
name_gen = (random.choice(string.ascii_uppercase) for _ in range(10))
name = "".join(name_gen)
price = random.randint(10000, 99999)
return (name, price)
def measure(name, fn, n=10_000):
pets = [fn() for _ in range(n)]
size = round(asizeof(pets) / n)
print(f"Pet size ({name}) = {size} bytes")
return size
baseline = measure("tuple", fields)
Pet size (tuple) = 161 bytes
161 bytes. Let’s use it as a baseline for further comparison.
Dataclasses vs named tuples
But who works with tuples these days? You would probably choose a dataclass:
from dataclasses import dataclass
@dataclass
class PetData:
name: str
price: int
fn = lambda: PetData(*fields())
measure("dataclass", fn)
Pet size (dataclass) = 257 bytes
x1.60 to baseline
Thing is, it’s 1.6 times larger than a tuple.
Let’s try a named tuple then:
from typing import NamedTuple
class PetTuple(NamedTuple):
name: str
price: int
fn = lambda: PetTuple(*fields())
measure("named tuple", fn)
Pet size (named tuple) = 161 bytes
x1.00 to baseline
Looks like a dataclass, works like a tuple. Perfect. Or not?
Slots
Python 3.10 received dataclasses with slots:
@dataclass(slots=True)
class PetData:
name: str
price: int
fn = lambda: PetData(*fields())
measure("dataclass w/slots", fn)
Pet size (dataclass w/slots) = 153 bytes
x0.95 to baseline
Wow! Slots magic creates special skinny objects without an underlying dictionary, unlike regular Python objects. Such dataclass is even lighter than a tuple.
What if 3.10 is out of the question yet? Use NamedTuple
. Or add a slots dunder manually:
@dataclass
class PetData:
__slots__ = ("name", "price")
name: str
price: int
Slot objects have their own shortcomings. But they are great for simple cases (without inheritance and other complex stuff).
numpy arrays
The real winner, of course, is the numpy
array:
import string
import numpy as np
PetNumpy = np.dtype([("name", "S10"), ("price", "i4")])
generator = (fields() for _ in range(n))
pets = np.fromiter(generator, dtype=PetNumpy)
size = round(asizeof(pets) / n)
Pet size (numpy array) = 14 bytes
x0.09 to baseline
This is not a flawless victory, though. If names are unicode (U
type instead of S
), the advantage is not so impressive:
PetNumpy = np.dtype([("name", "U10"), ("price", "i4")])
Pet size (numpy U10) = 44 bytes
x0.27 to baseline
If the name length is not strictly 10 characters, but varies, say, up to 50 characters (U50
instead of U10
) — the advantage disappears completely:
def fields():
name_len = random.randint(10, 50)
name_gen = (random.choice(string.ascii_uppercase) for _ in range(name_len))
# ...
PetNumpy = np.dtype([("name", "U50"), ("price", "i4")])
Pet size (tuple) = 179 bytes
Pet size (numpy U50) = 204 bytes
x1.14 to baseline
Others
Let’s consider alternatives for completeness.
A regular class is no different than a dataclass:
class PetClass:
def __init__(self, name: str, price: int):
self.name = name
self.price = price
Pet size (class) = 257 bytes
x1.60 to baseline
And a frozen (immutable) dataclass too:
@dataclass(frozen=True)
class PetDataFrozen:
name: str
price: int
Pet size (frozen dataclass) = 257 bytes
x1.60 to baseline
A dict is even worse:
names = ("name", "price")
fn = lambda: dict(zip(names, fields()))
measure("dict", fn)
Pet size (dict) = 355 bytes
x1.98 to baseline
Pydantic model sets an anti-record (no wonder, it uses inheritance):
from pydantic import BaseModel
class PetModel(BaseModel):
name: str
price: int
Pet size (pydantic) = 385 bytes
x2.39 to baseline
⌘ ⌘ ⌘
Compact (and not so compact) objects in Python:






Follow @ohmypy on Twitter and subscribe by email to keep up with new posts 🚀