Quick note #1 - Partial indexes

Quick note #1 - Partial indexes

Recently, I faced the classical task scheduling issue, where a record needs to be processed after a specific date. Frequently, this scenario is implemented using a database table with a scheduled_at column coupled with a SELECT FOR UPDATE query. In this story, I'll use PostgreSQL syntax as an example.

Example table:

my_tasks

id scheduled_at status
123 2024-06-14 05:32:00 COMPLETED
123 2024-06-15 07:46:00 PENDING

With this example query:

SELECT * FROM my_tasks WHERE scheduled_at < NOW() AND status = 'PENDING' FOR UPDATE SKIP LOCKED LIMIT 1;

Example query

And, of course, a compound index covering scheduled_at and status:

CREATE INDEX idx_scheduled_at_status ON my_tasks (scheduled_at, status);

Default create index statement

The issue

The main concern with this approach is that the table will grow over time. If you don't clean this table frequently, it will be filled with COMPLETED rows. When most rows are COMPLETED, the query cost increases, putting more load on your database instance.

Solution: Partial Indexes

While frequent clean-up is one solution, creating a partial index can significantly improve query performance:

For example:

CREATE INDEX idx_scheduled_at_status_pending ON my_tasks (scheduled_at, status) WHERE status = 'PENDING';

Create index statement

Results

In my case, this reduced the query cost from 70k to around 18k. While still somewhat costly, it considerably decreased the load on the PgSQL instance.

💡
Quick Notes are where I share valuable lessons I've learned.