The N+1 queries problem

October 21st, 2023

This is a really typical problem that you face as a backend engineer. I was discussing with a colleage how we could improve the performance of our system when we have this situation. As a possible solution they were proposing an interesting solution that didn’t sound right to me initially, but then I understood the point.

Let me quickly introduce the “N+1 query” problem: This performance issue happens when you need to query in a loop. A query to fetch a list of items and enrich or hydrate with some metadata, for example, a query to fetch a list of articles and get the article's author. So after the first query, the process iterates the list of articles, and with the author-id, the system performs N queries to get each author's metadata.

There are a few solutions to this problem: you can join tables in a single query, extract the IDs of the authors and do a “SELECT .. WHERE IN (…)”, …

So, the interesting part was the proposal to "order the author’s IDs” to get a better performance in a query. Even if the order does not matter for a single query, as the indexes are stored in B-Trees, this will increase the likelihood of fetching the same data from the bufferpool (cache) vs going to disk, or re-using the same page you loaded from the disk to serve the following rows.

I thought it was a cool way to get better performance. This is one of the learnings from the "Database Internals", the book we are currently reading in the Book Club.

Miguel Ángel Martín