Quick note #1 - Partial indexes
Recently, I faced the classical task scheduling issue, where a record needs to be processed after a specific date. Frequently, this scenario is implemented using a database table with a scheduled_at
column coupled with a SELECT FOR UPDATE
query. In this story, I'll use PostgreSQL syntax as an example.
Example table:
my_tasks
id | scheduled_at | status |
---|---|---|
123 | 2024-06-14 05:32:00 | COMPLETED |
123 | 2024-06-15 07:46:00 | PENDING |
With this example query:
SELECT * FROM my_tasks WHERE scheduled_at < NOW() AND status = 'PENDING' FOR UPDATE SKIP LOCKED LIMIT 1;
Example query
And, of course, a compound index covering scheduled_at
and status
:
CREATE INDEX idx_scheduled_at_status ON my_tasks (scheduled_at, status);
Default create index statement
The issue
The main concern with this approach is that the table will grow over time. If you don't clean this table frequently, it will be filled with COMPLETED
rows. When most rows are COMPLETED
, the query cost increases, putting more load on your database instance.
Solution: Partial Indexes
While frequent clean-up is one solution, creating a partial index can significantly improve query performance:
For example:
CREATE INDEX idx_scheduled_at_status_pending ON my_tasks (scheduled_at, status) WHERE status = 'PENDING';
Create index statement
Results
In my case, this reduced the query cost from 70k to around 18k. While still somewhat costly, it considerably decreased the load on the PgSQL instance.