Building a Pivot Table in SQLite

Suppose we have a sales table with product incomes for the years 2020-2023:

┌─────────┬──────┬────────┐
│ product │ year │ income │
├─────────┼──────┼────────┤
│ alpha   │ 2020 │ 100    │
│ alpha   │ 2021 │ 120    │
│ alpha   │ 2022 │ 130    │
│ alpha   │ 2023 │ 140    │
│ beta    │ 2020 │ 10     │
│ beta    │ 2021 │ 20     │
│ beta    │ 2022 │ 40     │
│ beta    │ 2023 │ 80     │
│ gamma   │ 2020 │ 80     │
│ gamma   │ 2021 │ 75     │
│ gamma   │ 2022 │ 78     │
│ gamma   │ 2023 │ 80     │
└─────────┴──────┴────────┘

playgrounddownload

And we want to transform it into a so-called pivot table, where products serve as rows and years serve as columns:

┌─────────┬──────┬──────┬──────┬──────┐
│ product │ 2020 │ 2021 │ 2022 │ 2023 │
├─────────┼──────┼──────┼──────┼──────┤
│ alpha   │ 100  │ 120  │ 130  │ 140  │
│ beta    │ 10   │ 20   │ 40   │ 80   │
│ gamma   │ 80   │ 75   │ 78   │ 80   │
└─────────┴──────┴──────┴──────┴──────┘

Some DBMS, like SQL Server, have a custom pivot operator to do this. SQLite does not. Still, there are multiple ways to solve the problem. Let’s examine them.

1. Filtered totals

Let’s manually extract each year in a separate column and calculate a filtered total income for that year:

select
  product,
  sum(income) filter (where year = 2020) as "2020",
  sum(income) filter (where year = 2021) as "2021",
  sum(income) filter (where year = 2022) as "2022",
  sum(income) filter (where year = 2023) as "2023"
from sales
group by product
order by product;

Here is our pivot table:

┌─────────┬──────┬──────┬──────┬──────┐
│ product │ 2020 │ 2021 │ 2022 │ 2023 │
├─────────┼──────┼──────┼──────┼──────┤
│ alpha   │ 100  │ 120  │ 130  │ 140  │
│ beta    │ 10   │ 20   │ 40   │ 80   │
│ gamma   │ 80   │ 75   │ 78   │ 80   │
└─────────┴──────┴──────┴──────┴──────┘

This universal method works in every DBMS, not only SQLite. Even if your DB engine does not support filter, you can always resort to using case:

select
  product,
  sum(case when year = 2020 then income end) as "2020",
  sum(case when year = 2021 then income end) as "2021",
  sum(case when year = 2022 then income end) as "2022",
  sum(case when year = 2023 then income end) as "2023"
from sales
group by product
order by product;
explain

Human-verified explanation

This SQL query shows the total income of each product for each year.

The first line of the query selects the product column from the table.

The next four lines use the sum function to calculate the total income for each year (2020, 2021, 2022, and 2023) using a case statement to only include income values where the year matches the specified year. The as keyword is used to give each calculated sum a column alias that corresponds to the year.

The from clause specifies that the data is being selected from the sales table.

The group by clause groups the data by product, which means that the query will return one row for each unique product value in the sales table.

Finally, the order by clause orders the results by product in ascending order.

Using filter is probably the easiest way when we know the columns in advance. But what if we don’t?

2. Dynamic SQL

Let’s build our query dynamically without hardcoding year values:

  1. Extract distinct year values.
  2. Generate a ...filter (where year = X) as "X" query line for each year.
with years as (
  select distinct year as year
  from sales
),
lines as (
  select 'select product ' as part
  union all
  select ', sum(income) filter (where year = ' || year || ') as "' || year || '" '
  from years
  union all
  select 'from sales group by product order by product;'
)
select group_concat(part, '')
from lines;

This query returns the same SQL we wrote manually at the previous step (minus formatting):

select product , sum(income) filter (where year = 2020) as "2020" , sum(income) filter (where year = 2021) as "2021" , sum(income) filter (where year = 2022) as "2022" , sum(income) filter (where year = 2023) as "2023" from sales group by product order by product;

Now we have to execute it. For that, let’s use the eval(sql) function available as part of the define extension:

select load_extension('./define');

with years as (
  select distinct year as year
  from sales
),
lines as (
  select 'drop view if exists v_sales; ' as part
  union all
  select 'create view v_sales as '
  union all
  select 'select product '
  union all
  select ', sum(income) filter (where year = ' || year || ') as "' || year || '" '
  from years
  union all
  select 'from sales group by product order by product;'
)
select eval(group_concat(part, ''))
from lines;
explain

Human-verified explanation

This SQL query creates a view named v_sales that shows the total income of each product for each year. The query uses a common table expression (CTE) to generate a list of distinct years from the sales table. Then, it defines another CTE named lines that contains a series of unioned select statements to generate the SQL commands to create the view.

The first select statement in lines generates a drop view if exists command to ensure that any previous version of the view is removed before creating the new one.

The second select statement generates a create view command to create the v_sales view.

The third select statement generates a select command to select the product column from the sales table.

The fourth select statement generates a sum command that sums the income for each product for a given year. It uses the filter clause to filter the sum by year. The year value is passed in from the years CTE using string concatenation.

The fifth select statement specifies the from and group by clauses for the SQL statement. It groups the results by product and orders them by product.

Finally, the last select statement concatenates all the SQL commands generated in the lines CTE using the group_concat function and evaluates the resulting SQL string using the eval function. The resulting SQL commands create the v_sales view with the desired structure.

Here, we are building a v_sales view which executes the query we’ve constructed previously. Let’s select the data from it:

select * from v_sales;
┌─────────┬──────┬──────┬──────┬──────┐
│ product │ 2020 │ 2021 │ 2022 │ 2023 │
├─────────┼──────┼──────┼──────┼──────┤
│ alpha   │ 100  │ 120  │ 130  │ 140  │
│ beta    │ 10   │ 20   │ 40   │ 80   │
│ gamma   │ 80   │ 75   │ 78   │ 80   │
└─────────┴──────┴──────┴──────┴──────┘

Note: extensions do not work in the playground, so you’ll have to use your local SQLite to reproduce this step.

Nice!

3. Pivot extension

If dynamic SQL seems too much for you, there is a more straightforward solution — the pivotvtab extension.

With it, we only have to provide three selects to build a pivot:

  1. Select row values.
  2. Select column values.
  3. Select cell values.
select load_extension('./pivotvtab');

create virtual table v_sales using pivot_vtab (
  -- rows
  (select distinct product from sales),
  -- columns
  (select distinct year, year from sales),
  -- cells
  (select sum(income) from sales where product = ?1 and year = ?2)
);
explain

Human-verified explanation

This SQL query creates a virtual table called v_sales using the pivot_vtab module from the pivotvtab extension. A virtual table is a special type of table that does not store data on disk, but rather generates it on the fly based on a query.

The pivot_vtab command takes three arguments:

  1. A subquery that specifies the rows of the virtual table. In this case, the subquery selects all distinct values from the product column of the sales table.
  2. A subquery that specifies the columns of the virtual table. In this case, the subquery selects all distinct values from the year column of the sales table, and duplicates them. This is because the pivot_vtab module expects two values for each column, one for the column value and one for the column name. By duplicating the year value, we are effectively using it as both the value and name for the column.
  3. A subquery that specifies the values of the virtual table. In this case, the subquery selects the sum of the income column from the sales table for a specific product and year combination. The ?1 and ?2 placeholders in the subquery represent parameters that will be filled in at runtime with the actual values for product and year.

Overall, this query creates a virtual table that pivots the data from the sales table, with the product values as rows, the year values as columns, and the sum of income as the cell values.

The extension does the rest:

select * from v_sales;
┌─────────┬──────┬──────┬──────┬──────┐
│ product │ 2020 │ 2021 │ 2022 │ 2023 │
├─────────┼──────┼──────┼──────┼──────┤
│ alpha   │ 100  │ 120  │ 130  │ 140  │
│ beta    │ 10   │ 20   │ 40   │ 80   │
│ gamma   │ 80   │ 75   │ 78   │ 80   │
└─────────┴──────┴──────┴──────┴──────┘

That’s even easier than the pivot operator in SQL Server!

Summary

There are three ways to build a pivot table in SQLite:

  1. Using plain SQL and sum() + filter expressions.
  2. Building and executing a dynamic query with the eval() function.
  3. Utilizing the pivotvtab extension.

P.S. Interested in using SQL for data analytics? Check out my book — SQL Window Functions Explained

Follow @ohmypy on Twitter to keep up with new posts 🚀