JSON Lines
Worked with the JSON Lines format the other day. It's a CSV on steroids:
- each entry is a separate line, as in CSV;
- at the same time it is a full-fledged JSON.
For example:
{ "id":11, "name":"Diane", "city":"London", "department":"hr", "salary":70 }
{ "id":12, "name":"Bob", "city":"London", "department":"hr", "salary":78 }
{ "id":21, "name":"Emma", "city":"London", "department":"it", "salary":84 }
{ "id":22, "name":"Grace", "city":"Berlin", "department":"it", "salary":90}
{ "id":23, "name":"Henry", "city":"London", "department":"it", "salary":104}
Great stuff:
- Suitable for objects of complex structure (unlike csv);
- Easy to stream read without loading the entire file into memory (unlike json);
- Easy to append new entries to an existing file (unlike json).
JSON can also be streamed. But look how much easier it is with JSON Lines:
import json
from typing import Iterator
def jl_reader(fname: str) -> Iterator[dict]:
with open(fname) as file:
for line in file:
obj = json.loads(line.strip())
yield obj
if __name__ == "__main__":
reader = jl_reader("employees.jl")
for employee in reader:
id = employee["id"]
name = employee["name"]
dept = employee["department"]
print(f"#{id} - {name} ({dept})")
#11 - Diane (hr)
#12 - Bob (hr)
#21 - Emma (it)
#22 - Grace (it)
#23 - Henry (it)
Great fit for logs and data processing pipelines.
★ Subscribe to keep up with new posts.