Compact objects in Python
Python is an object language. This is all nice and cozy until you run out of memory holding 10 million objects at once. Let's talk about how to reduce its appetite.
Tuples
Imagine you have a simple Pet
object with the name
(string) and price
(integer) attributes. Intuitively, it seems that the most compact representation is a tuple:
("Frank the Pigeon", 50000)
Let's measure how much memory this beauty eats:
import random
import string
from pympler.asizeof import asizeof
def fields():
name_gen = (random.choice(string.ascii_uppercase) for _ in range(10))
name = "".join(name_gen)
price = random.randint(10000, 99999)
return (name, price)
def measure(name, fn, n=10_000):
pets = [fn() for _ in range(n)]
size = round(asizeof(pets) / n)
print(f"Pet size ({name}) = {size} bytes")
return size
baseline = measure("tuple", fields)
161 bytes. Let's use that as a baseline for further comparison.
Dataclasses vs named tuples
But who works with tuples these days? You would probably choose a dataclass:
from dataclasses import dataclass
@dataclass
class PetData:
name: str
price: int
fn = lambda: PetData(*fields())
base = measure("baseline", fields)
measure("dataclass", fn, baseline=base)
Thing is, it's 1.6 times larger than a tuple.
Let's try a named tuple then:
from typing import NamedTuple
class PetTuple(NamedTuple):
name: str
price: int
fn = lambda: PetTuple(*fields())
base = measure("baseline", fields)
measure("named tuple", fn, baseline=base)
Looks like a dataclass, works like a tuple. Perfect. Or not?
Slots
Python 3.10 received dataclasses with slots:
@dataclass(slots=True)
class PetData:
name: str
price: int
fn = lambda: PetData(*fields())
base = measure("baseline", fields)
measure("dataclass w/slots", fn, baseline=base)
Wow! Slots magic creates special skinny objects without an underlying dictionary, unlike regular Python objects. Such a dataclass is even lighter than a tuple.
What if 3.10 is out of the question yet? Use NamedTuple
. Or add a slots dunder manually:
@dataclass
class PetData:
__slots__ = ("name", "price")
name: str
price: int
Slot objects have their own shortcomings. But they are great for simple cases (without inheritance and other complex stuff).
numpy arrays
The real winner, of course, is the numpy
array:
import numpy as np
PetNumpy = np.dtype([("name", "S10"), ("price", "i4")])
n = 10_000
generator = (fields() for _ in range(n))
pets = np.fromiter(generator, dtype=PetNumpy)
size = round(asizeof(pets) / n)
base = measure("baseline", fields)
print(f"Pet size (numpy array) = {size} bytes\nx{size/base:.2f} to baseline")
This is not a flawless victory, though. If names are unicode (U
type instead of S
), the advantage is not so impressive:
import numpy as np
PetNumpy = np.dtype([("name", "U10"), ("price", "i4")])
n = 10_000
generator = (fields() for _ in range(n))
pets = np.fromiter(generator, dtype=PetNumpy)
size = round(asizeof(pets) / n)
base = measure("baseline", fields)
print(f"Pet size (numpy U10) = {size} bytes\nx{size/base:.2f} to baseline")
If the name length is not strictly 10 characters, but varies, say, up to 50 characters (U50
instead of U10
) — the advantage disappears completely:
import random
import numpy as np
def fields_var_name():
name_len = random.randint(10, 50)
name_gen = (random.choice(string.ascii_uppercase) for _ in range(name_len))
name = "".join(name_gen)
price = random.randint(10000, 99999)
return (name, price)
PetNumpy = np.dtype([("name", "U50"), ("price", "i4")])
n = 10_000
generator = (fields_var_name() for _ in range(n))
pets = np.fromiter(generator, dtype=PetNumpy)
size = round(asizeof(pets) / n)
base = measure("baseline", fields)
print(f"Pet size (numpy U50) = {size} bytes\nx{size/base:.2f} to baseline")
Others
Let's consider alternatives for completeness.
A regular class is no different than a dataclass:
class PetClass:
def __init__(self, name: str, price: int):
self.name = name
self.price = price
fn = lambda: PetClass(*fields())
base = measure("baseline", fields)
measure("class", fn, baseline=base)
And a frozen (immutable) dataclass too:
@dataclass(frozen=True)
class PetDataFrozen:
name: str
price: int
fn = lambda: PetDataFrozen(*fields())
base = measure("baseline", fields)
measure("frozen dataclass", fn, baseline=base)
A dict is even worse:
names = ("name", "price")
fn = lambda: dict(zip(names, fields()))
base = measure("baseline", fields)
measure("dict", fn, baseline=base)
Pydantic model sets an anti-record (no wonder, it uses inheritance):
from pydantic import BaseModel
class PetModel(BaseModel):
name: str
price: int
names = ("name", "price")
fn = lambda: PetModel(**dict(zip(names, fields())))
base = measure("baseline", fields)
measure("pydantic", fn, baseline=base);
Summary
Here are some Python object implementations, ranked from more compact to less compact:
- numpy (specific use cases only)
- Slotted dataclass.
- Named tuple / ordinary tuple.
- Dataclass / regular class.
- Dictionary.
- Pydantic model.
──
Interactive examples in this post are powered by codapi — an open source tool I'm building. Use it to embed live code snippets into your product docs, online course or blog.
★ Subscribe to keep up with new posts.