3 min read

Paging One-to-Many Results in SQL

Pagination in databases, the division of query results into manageable subsets, is often implemented in terms of SQL constructs like LIMIT or FETCH FIRST, and OFFSET. This works well when the result is compromised of one-to-one relationships. On the other hand, hypothesise what happens if the relationship is a one-to-many as in the following DDL [1]:

An author can write n books as expressed in the foreign key constraint declared on Book’s authorId column. A SELECT * FROM statement joining the two tables by the author’s ID will produce a row for each book the author has written. This means that if you have 3 authors and each one has authored 3 books, the query will return 9 rows:

An attempt at formulating a query that sorts the result by author name and fetches the result’s first page, were the page can have at most 2 authors, might go like this:

Perhaps to one’s surprise, the above SELECT statement, with a LIMIT of 2 and an OFFSET of 0, will only return books authored by Antonopoulos:

The books authored by Chomsky as well as Antonopoulos’s book, ‘Mastering Ethereum’, weren’t included in the result because it’s the many-side of the relationship (i.e., books) which is paginated as opposed to the one-side (i.e., authors). Paging the one-side of a one-to-many relationship isn’t a trivial problem to solve in pure SQL without re-writing part of the query as a sub-select:

In practice, I find it inconvenient to formulate a query in this way in order to accommodate pagination. It doesn’t lend itself easily to runtime string manipulation, and therefore, automating paging, so that the developer isn’t forced to think about limits/offsets every time he writes a query, is harder.

Most popular vendor DBMSs offer window functions for performing calculations over ranges of rows. One such function is DENSE_RANK. This function ranks each row against the rest of the rows to produce a sequence of numbers starting from 1. Rows equal in rank have the same sequence number. DENSE_RANK is an alternative solution for the one-to-many pagination problem as it helps us simulate logical counts and offsets based on the author’s primary key column:

This SELECT statement uses DENSE_RANK combined with ORDER BY to produce rows such that row x and y have the same rank if and only if row x’s author ID and name are equal to row y’s author’s ID and name:

With the offset_ rank column, we can scroll forwards or backwards taking into account that duplicate authors in the result have the same rank. Say we want to get records starting from offset 1. Keeping in mind rank = offset + 1, this requirement would be simply expressed as:

To add a logical count, another rank, representing the count, is re-computed over the sub-select’s result:

In the beginning of this post, we wanted to fetch the first page of the result with a limit of 2 authors. Here’s how to achieve this with DENSE_RANK:

Note how DENSE_RANK allows us to more or less wrap around the main query rather than changing it. We can even take this one step further:

By moving the computation of the offset rank outside of the main query, we have the possibility to easily write string manipulation code in the application that dynamically, and transparently, applies pagination to queries.

Edit (27/06/2017): Upon revisiting the final query, I realised that it can be reduced to:

1: All SQL examples target PostgreSQL.
2: The described solution won't solve the one-to-many paging problem should the result be ordered by a column in the Book table. I'll leave it as an exercise for the reader to figure out why that's the case.
comments powered by Disqus