Vacuuming update-heavy tables
In my previous post I gave an overview of some general parameters that will work well in a standard setting. They might be aggressive, but will allow other processes to work without interruption. Update-heavy tables need special attention when it comes to VACUUM, so in this post I will focus on that.
Luckily, Postgres allows setting up VACUUM parameters for each table separately, so that even with a general, relatively tight configuration, separate tables can still be configured even more aggressively.
What are update-heavy tables?
Update-heavy tables are tables that include data that, based on a specific business-need, is expected to be updated very frequently to remain relevant. A good example for this is clients’ bank account balance, number of passengers on the train, available theatre tickets, number of available taxis circulating in a particular area etc.
Now let’s take a step back to consider, why frequently updated tables even require special attention?
How does the VACUUM work?
Under MVCC (multi-version concurrency control) paradigm, PostgreSQL keeps previous versions of modified records. These versions still occupy space in that table although they aren’t visible for transactions.
The purpose of VACUUM (and AUTOVACUUM) is to reinstate the space that is taken up by these outdated records, so that it could be reutilized.
In short, the VACUUM process marks up the space that is taken up by the expired records as free and ready to be reutilized. In addition to this there are additional supporting processes that take place: visibility maps are being updated, free space maps, etc.
To update a record PostgreSQL reviews the page with that record in search for free space of an appropriate size. If the space has been identified the new record is entered onto the same page. If the space has not been found the table is reviewed from top to bottom in search for an empty space of an appropriate size.
If the space has been found — the new record is placed there. If not — the data is entered at the bottom of the table. If needed, the new data page is created.
At the same time, for the previous version of the record the system column xmax is set making the former record version invisible to newer transactions.
VACUUM reviews data and checks its visibility. If the data is expired the space occupied by this data is marked as free. This is not the only function that is performed during VACUUM-ing, however for our purposes the most important part is that it marks up this space as available.
For normal tables VACUUM manages to process the data and therefore the new record or the new version of the record will be inserted into that available “gap”.
VACUUM for frequently updated tables
For update-heavy tables VACUUM doesn’t go through all the data in the table due to time constraint, so a new record (or a newer version of the record) is being placed at the end of the table. This results in unnecessary records being kept and a typical table bloat. This impacts not only the table in question, but also TOAST and indexes, so that each additional operation is slowed down due to higher volume of data that needs to be processed.
Optimising VACUUM for update-heavy tables
Due to the process described above, in order to avoid table bloat, for frequently updated tables VACUUM should be optimized even more aggressively, so that these tables are reviewed by AUTOVACUUM even more frequently.
This can be achieved by setting parameters for a specific table, so that the following command is executed:
ALTER TABLE table_name SET (autovacuum_vacuum_scale_factor = 0, autovacuum_vacuum_threshold = 1000);
This command triggers AUTOVACUUM following an update or after removal of 1000 rows, regardless of how many rows there are in the table in total. These parameters can be set up at the stage of table creation.
The full list of storage parameters can be found here. Due to such an extensive list of parameters it is possible to finetune AUTOVACUUM settings as needed.
Another parameter that I would like to highlight here is fillfactor. It is responsible for regulating the occupancy of pages with data, before a new page is created. This way during the normal data insert process the table will continue growing at a faster rate since there will be planned gaps on the page.
However, at the time of the update, if there is enough space an attempt will be made to place the new version of the data on the same page. Since in that case the available space is allocated in advance, the possibility of inserting data into the same page significantly increases.
This increases the performance compared to creation of insert into an empty space in a different page or insert into the newly added page. For tables that require frequent updating the value of this parameter can be set at below 100.
Comprehensive audit of table parameters and an occasional review to identify bloat will allow you to keep the frequently updated tables intact and optimize your database performance.
VACUUM configuration is not the only recommendation for frequently updated tables. Bloat needs to be identified and eliminated and I’d recommend reviewing this post about the process.
This entry has been created as a response to the comment to my previous post. If you have any PostgreSQL questions that you can’t find the answer to – comment below.
Have any interesting use cases that you’d like to share? – We’d love to see them in the comments.