Building a Pivot Table in SQLite
Suppose we have a sales
table with product incomes for the years 2020-2023:
┌─────────┬──────┬────────┐
│ product │ year │ income │
├─────────┼──────┼────────┤
│ alpha │ 2020 │ 100 │
│ alpha │ 2021 │ 120 │
│ alpha │ 2022 │ 130 │
│ alpha │ 2023 │ 140 │
│ beta │ 2020 │ 10 │
│ beta │ 2021 │ 20 │
│ beta │ 2022 │ 40 │
│ beta │ 2023 │ 80 │
│ gamma │ 2020 │ 80 │
│ gamma │ 2021 │ 75 │
│ gamma │ 2022 │ 78 │
│ gamma │ 2023 │ 80 │
└─────────┴──────┴────────┘
And we want to transform it into a so-called pivot table, where products serve as rows and years serve as columns:
┌─────────┬──────┬──────┬──────┬──────┐
│ product │ 2020 │ 2021 │ 2022 │ 2023 │
├─────────┼──────┼──────┼──────┼──────┤
│ alpha │ 100 │ 120 │ 130 │ 140 │
│ beta │ 10 │ 20 │ 40 │ 80 │
│ gamma │ 80 │ 75 │ 78 │ 80 │
└─────────┴──────┴──────┴──────┴──────┘
Some DBMS, like SQL Server, have a custom pivot
operator to do this. SQLite does not. Still, there are multiple ways to solve the problem. Let’s examine them.
1. Filtered totals
Let’s manually extract each year in a separate column and calculate a filtered total income for that year:
select
product,
sum(income) filter (where year = 2020) as "2020",
sum(income) filter (where year = 2021) as "2021",
sum(income) filter (where year = 2022) as "2022",
sum(income) filter (where year = 2023) as "2023"
from sales
group by product
order by product;
Here is our pivot table:
┌─────────┬──────┬──────┬──────┬──────┐
│ product │ 2020 │ 2021 │ 2022 │ 2023 │
├─────────┼──────┼──────┼──────┼──────┤
│ alpha │ 100 │ 120 │ 130 │ 140 │
│ beta │ 10 │ 20 │ 40 │ 80 │
│ gamma │ 80 │ 75 │ 78 │ 80 │
└─────────┴──────┴──────┴──────┴──────┘
This universal method works in every DBMS, not only SQLite. Even if your DB engine does not support filter
, you can always resort to using case
:
select
product,
sum(case when year = 2020 then income end) as "2020",
sum(case when year = 2021 then income end) as "2021",
sum(case when year = 2022 then income end) as "2022",
sum(case when year = 2023 then income end) as "2023"
from sales
group by product
order by product;
explain
Human-verified explanation
This SQL query shows the total income of each product for each year.
The first line of the query selects the product
column from the table.
The next four lines use the sum
function to calculate the total income
for each year (2020, 2021, 2022, and 2023) using a case
statement to only include income values where the year
matches the specified year. The as
keyword is used to give each calculated sum a column alias that corresponds to the year.
The from
clause specifies that the data is being selected from the sales
table.
The group by
clause groups the data by product
, which means that the query will return one row for each unique product
value in the sales
table.
Finally, the order by
clause orders the results by product
in ascending order.
Using filter
is probably the easiest way when we know the columns in advance. But what if we don’t?
2. Dynamic SQL
Let’s build our query dynamically without hardcoding year values:
- Extract distinct year values.
- Generate a
...filter (where year = X) as "X"
query line for each year.
with years as (
select distinct year as year
from sales
),
lines as (
select 'select product ' as part
union all
select ', sum(income) filter (where year = ' || year || ') as "' || year || '" '
from years
union all
select 'from sales group by product order by product;'
)
select group_concat(part, '')
from lines;
This query returns the same SQL we wrote manually at the previous step (minus formatting):
select product , sum(income) filter (where year = 2020) as "2020" , sum(income) filter (where year = 2021) as "2021" , sum(income) filter (where year = 2022) as "2022" , sum(income) filter (where year = 2023) as "2023" from sales group by product order by product;
Now we have to execute it. For that, let’s use the eval(sql)
function available as part of the define
extension:
select load_extension('./define');
with years as (
select distinct year as year
from sales
),
lines as (
select 'drop view if exists v_sales; ' as part
union all
select 'create view v_sales as '
union all
select 'select product '
union all
select ', sum(income) filter (where year = ' || year || ') as "' || year || '" '
from years
union all
select 'from sales group by product order by product;'
)
select eval(group_concat(part, ''))
from lines;
explain
Human-verified explanation
This SQL query creates a view named v_sales
that shows the total income of each product for each year. The query uses a common table expression (CTE) to generate a list of distinct years from the sales
table. Then, it defines another CTE named lines
that contains a series of unioned select statements to generate the SQL commands to create the view.
The first select statement in lines
generates a drop view if exists
command to ensure that any previous version of the view is removed before creating the new one.
The second select statement generates a create view
command to create the v_sales
view.
The third select statement generates a select
command to select the product column from the sales
table.
The fourth select statement generates a sum
command that sums the income for each product for a given year. It uses the filter
clause to filter the sum by year. The year value is passed in from the years
CTE using string concatenation.
The fifth select statement specifies the from
and group by
clauses for the SQL statement. It groups the results by product and orders them by product.
Finally, the last select statement concatenates all the SQL commands generated in the lines
CTE using the group_concat
function and evaluates the resulting SQL string using the eval
function. The resulting SQL commands create the v_sales
view with the desired structure.
Here, we are building a v_sales
view which executes the query we’ve constructed previously. Let’s select the data from it:
select * from v_sales;
┌─────────┬──────┬──────┬──────┬──────┐
│ product │ 2020 │ 2021 │ 2022 │ 2023 │
├─────────┼──────┼──────┼──────┼──────┤
│ alpha │ 100 │ 120 │ 130 │ 140 │
│ beta │ 10 │ 20 │ 40 │ 80 │
│ gamma │ 80 │ 75 │ 78 │ 80 │
└─────────┴──────┴──────┴──────┴──────┘
Note: extensions do not work in the playground, so you’ll have to use your local SQLite to reproduce this step.
Nice!
3. Pivot extension
If dynamic SQL seems too much for you, there is a more straightforward solution — the pivotvtab
extension.
With it, we only have to provide three selects to build a pivot:
- Select row values.
- Select column values.
- Select cell values.
select load_extension('./pivotvtab');
create virtual table v_sales using pivot_vtab (
-- rows
(select distinct product from sales),
-- columns
(select distinct year, year from sales),
-- cells
(select sum(income) from sales where product = ?1 and year = ?2)
);
explain
Human-verified explanation
This SQL query creates a virtual table called v_sales
using the pivot_vtab
module from the pivotvtab
extension. A virtual table is a special type of table that does not store data on disk, but rather generates it on the fly based on a query.
The pivot_vtab
command takes three arguments:
- A subquery that specifies the rows of the virtual table. In this case, the subquery selects all distinct values from the
product
column of thesales
table. - A subquery that specifies the columns of the virtual table. In this case, the subquery selects all distinct values from the
year
column of thesales
table, and duplicates them. This is because thepivot_vtab
module expects two values for each column, one for the column value and one for the column name. By duplicating theyear
value, we are effectively using it as both the value and name for the column. - A subquery that specifies the values of the virtual table. In this case, the subquery selects the sum of the
income
column from thesales
table for a specificproduct
andyear
combination. The?1
and?2
placeholders in the subquery represent parameters that will be filled in at runtime with the actual values forproduct
andyear
.
Overall, this query creates a virtual table that pivots the data from the sales
table, with the product
values as rows, the year
values as columns, and the sum of income
as the cell values.
The extension does the rest:
select * from v_sales;
┌─────────┬──────┬──────┬──────┬──────┐
│ product │ 2020 │ 2021 │ 2022 │ 2023 │
├─────────┼──────┼──────┼──────┼──────┤
│ alpha │ 100 │ 120 │ 130 │ 140 │
│ beta │ 10 │ 20 │ 40 │ 80 │
│ gamma │ 80 │ 75 │ 78 │ 80 │
└─────────┴──────┴──────┴──────┴──────┘
That’s even easier than the pivot
operator in SQL Server!
Summary
There are three ways to build a pivot table in SQLite:
- Using plain SQL and
sum()
+filter
expressions. - Building and executing a dynamic query with the
eval()
function. - Utilizing the
pivotvtab
extension.
P.S. Interested in using SQL for data analytics? Check out my book — SQL Window Functions Explained
Follow @ohmypy on Twitter to keep up with new posts 🚀