The N+1 queries problem

This is a really typical problem that you face as a backend engineer. I was discussing with a colleage how we could improve the performance of our system when we have this situation. As a possible solution they were proposing an interesting solution that didn’t sound right to me initially, but then I understood the point.

Let me quickly introduce the “N+1 query” problem: This performance issue happens when you need to query in a loop. A query to fetch a list of items and enrich or hydrate with some metadata, for example, a query to fetch a list of articles and get the article's author. So after the first query, the process iterates the list of articles, and with the author-id, the system performs N queries to get each author's metadata.

There are a few solutions to this problem: you can join tables in a single query, extract the IDs of the authors and do a “SELECT .. WHERE IN (…)”, …

So, the interesting part was the proposal to "order the author’s IDs” to get a better performance in a query. Even if the order does not matter for a single query, as the indexes are stored in B-Trees, this will increase the likelihood of fetching the same data from the bufferpool (cache) vs going to disk, or re-using the same page you loaded from the disk to serve the following rows.

I thought it was a cool way to get better performance. This is one of the learnings from the "Database Internals", the book we are currently reading in the Book Club.

Miguel Ángel Martín

Software Engineer writing code since 2010 and working remotely since 2013, I talk about Software Engineering, Management and Leadership. Subscribe to my newsletter if you want to receive my articles directly in your inbox.