I really like SQLite. It’s a miniature embedded database, perfect for both exploratory data analysis and as a storage for small apps (I’ve blogged about that previously).
It has a minor drawback though. There are few built-in functions compared to PostgreSQL or Oracle. Fortunately, the authors provided an extension mechanism, which allows doing almost anything. As a result, there are a lot of SQLite extensions out there, but they are incomplete, inconsistent and scattered across the internet.
I wanted more consistency. So I started the sqlean project, which brings the extensions together, neatly packaged into domain modules, documented, tested, and built for Linux, Windows and macOS. Something like a standard library in Python or Go, only for SQLite.
I plan to write in detail about each module in a separate article, but for now — here’s a brief overview.
The main set
These are the most popular functions missing in SQLite:
- crypto: cryptographic hashes like MD5 or SHA-256.
- fileio: read and write files and catalogs.
- fuzzy: fuzzy string matching and phonetics.
- ipaddr: IP address manipulation.
- json1: JSON functions.
- math: math functions.
- re: regular expressions.
- stats: math statistics — median, percentiles, etc.
- text: string functions.
- unicode: Unicode support.
- uuid: Universally Unique IDentifiers.
- vsv: CSV files as virtual tables.
There are precompiled binaries for Windows, Linix and macOS.
These extensions haven’t yet made their way to the main set. They may be too broad, too narrow, or without a well-thought API. I’m gradually refactoring and merging them into the main set:
- array: one-dimensional arrays.
- besttype: convert string value to numeric.
- bloom: a fast way to tell if a value is already in a table.
- btreeinfo, memstat, recsize and stmt: various database introspection features.
- cbrt and math2: additional math functions and bit arithmetics.
- classifier: binary classifier via logistic regression.
- closure: navigate hierarchic tables with parent/child relationships.
- compress and sqlar: compress / uncompress data.
- cron: match dates against cron patterns.
- dbdump: export database as SQL.
- decimal, fcmp and ieee754: decimal and floating-point arithmetic.
- define: create scalar and table-valued functions from SQL.
- envfuncs: read environment variables.
- eval: run arbitrary SQL statements.
- isodate: additional date and time functions.
- pearson: Pearson correlation coefficient between two data sets.
- pivotvtab: pivot tables.
- prefixes: generate string prefixes.
- rotate: string obfuscation.
- spellfix: search a large vocabulary for close matches.
- stats2 and stats3: additional math statistics functions.
- text2: additional string functions.
- uint: natural string sorting and comparison.
- unhex: reverse for
- unionvtab: union similar tables into one.
- xmltojson: convert XML to JSON string.
- zipfile: read and write zip files.
- zorder: map multidimensional data to a single dimension.
Vote for your favorites! Popular ones will make their way into the main set faster.
Incubator extensions are also available for download.
How to load an extension
There are three ways to do it. If you are using SQLite CLI (
sqlite> .load ./stats sqlite> select median(value) from generate_series(1, 99);
If you are using a tool like DB Browser for SQLite, SQLite Expert or DBeaver:
select load_extension('c:\Users\anton\sqlite\stats.dll'); select median(value) from generate_series(1, 99);
If you are using Python (other languages provide similar means):
import sqlite3 connection = sqlite3.connect(":memory:") connection.enable_load_extension(True) connection.load_extension("./stats.so") connection.execute("select median(value) from generate_series(1, 99)") connection.close()
If you feel that you are missing some function in SQLite, check the sqlean repository — you’ll probably find one.
I keep adding new extensions to the incubator. I also refactor the extensions from the incubator and merge them into the main set. I plan to write a separate article for each main module, so stay tuned.