Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Optimizing Postgres for high write workloads - ...

Optimizing Postgres for high write workloads - Citus Con

Slides from my talk at Citus Con - An Event for Postgres 2023 on Optimizing Postgres for high write workloads.

Talk Abstract below:

As you ramp up the write load on your Postgres database, you will often benefit from tuning checkpoint and WAL related configs for better performance. In the last decade, I’ve personally recommended adjusting parameters like max_wal_size, checkpoint_timeout and checkpoint_completion_target to dozens of customers. However, what they’ve found more useful is understanding the concepts so that they can reason on their own on how to tune these for their workload and also understand how other parameters (eg. full_page_writes) influence checkpoints and WAL generation.

In this talk, you’ll learn what checkpoints are, when they are triggered and how they impact database performance. We’ll also go over the most important configs which impact checkpoints and WAL generation, recommend how to set them and how to monitor and adapt them as your workload changes.

Samay Sharma

April 12, 2023
Tweet

Other Decks in Technology

Transcript

  1. Samay Sharma Manager, PostgreSQL Open Source team @ Microsoft Optimizing

    Postgres for write-heavy workloads ft. checkpointer & WAL configs
  2. About Me Manage in the Open Source PG team at

    Microsoft Working with PostgreSQL for > 10 years Previously solutions & customer engineer at Microsoft, Citus Want to make Postgres performance and tuning easier to understand and do
  3. Optimize Postgres, but before that… Ensure low latency b/w app

    and database Deal with connections effectively Make sure you have the right hardware PostgreSQL Client Server
  4. Writing performant applications for PG • Manage Indexes carefully •

    Parallelize your loading job with concurrent connections • Use multi-value INSERTs or COPY to speed up throughput • Consider partitioning to reduce index sizes • Use unlogged tables for unimportant data
  5. Journey of a Postgres write When a write comes in,

    it is first written to WAL for durability reasons. Then the change is made in shared buffers (not to the data file yet!) At some point, those pages are written to disk by the background writer or checkpointer. WAL (disk) Shared buffers Data files (disk) BG writer Checkpointer INSERT INTO …
  6. Getting rid of WAL - Checkpoints • After a crash,

    recovery requires you to replay WAL. • To make recovery faster and avoid maintaining TBs of WAL, Postgres has checkpoints • Checkpoint guarantees all dirty buffers until a point are written to disk • Recovery can start from last checkpoint • WAL before that point can be recycled
  7. Autovacuum PostgreSQL’s way of removing unneeded row versions Checks if

    tables significantly modified & if yes, runs VACUUM and ANALYZE Defaults generally low for large workloads Header Data4 Header Data3 (Updated to Data4) Header Data2 Header Data1 (Deleted) Header Data4 Header Data2
  8. Common VACUUM problems and solutions • VACUUM isn’t triggered often

    for large tables • Reduce autovacuum_vacuum_scale_factor, autovacuum_vacuum_insert_scale_factor • VACUUM running slow • Adjust / disable cost limits • Increase maintenance_work_mem / autovacuum_work_mem • Adjust max_parallel_maintenance_workers for parallel index vacuuming • Dead rows not reducing • Long running queries • Unused replication slots • Uncommitted prepared transactions
  9. Healthy Checkpointer operation • Checkpoints are frequent enough to •

    Ensure a reasonable recovery time • Lesser disk utilization from WAL accumulation • Checkpoints are spaced out enough to • Ensure that I/O impact to user workload is low • Less WAL is generated and lot of pages are not flushed repeatedly
  10. How checkpoints are triggered • Manual actions • Running CHECKPOINTcommand

    • Commands which run CHECKPOINT for you e.g. pg_start_backup, pg_ctl STOP etc. • When a particular time passes since the last checkpoint • When you hit a maximum limit of how much wal you can accumulate
  11. Best practice for tuning checkpoint triggering • If checkpoints are

    triggered too often, you will get this message in your log LOG: checkpoints are occurring too frequently (9 seconds apart) HINT: Consider increasing the configuration parameter "max_wal_size" • Checkpoints should ideally be triggered with checkpoint_timeout • max_wal_size should be set such that it is rarely reached
  12. Setting checkpoint_timeout • Checkpoint_timeout depends on RTO • Hard to

    predict specific recovery time • 5 min (default) is generally small – 15 – 30 min are common • Low values may read to write amplification due to full_page_writes (more later)
  13. Setting max_wal_size • Measure wal generated over checkpoint_timeout period •

    Can use pg_stat_bgwriter, or results of log_checkpoints to determine • Set to 2-3x WAL you expect to generate • (Small workloads): Bound by size of disk for WAL – Soft limit
  14. Tuning checkpointing rate • Checkpointing can lead to huge I/O

    spikes if done all at once • Spread checkpoints out by setting checkpoint_completion_target • 0.9 is generally a good value (PG will aim to complete checkpointing in 90% of time)
  15. Impact of full_page_writes • Setting full_page_writes = on prevents corruption

    when OS crashes • Write content of entire page to WAL during first modification after a checkpoint • Greatly magnifies WAL volume with frequent checkpoints 8kb Postgres OS 8kb 8kb 4kb 4kb
  16. Other WAL tuning parameters • wal_compression • Decreases of WAL

    volume by compressing it • Useful to enable to reduce WAL I/O • wal_buffers • Amount of shared memory used for WAL • Beneficial to increase (upto 128MB) if you have bulk writes or many concurrent writes • min_wal_size • Recycle WAL files until this size is reached instead of removing • Helpful to avoid new WAL file creation during spikes. Set based on log checkpoint output
  17. Other WAL tuning parameters – Contd. • backend_flush_after • Attempt

    to force OS to issue writes once this amount of data is written by a backend • If you see I/O spikes and have a lot of concurrent writes, worth considering this. See caveats. • wal_segment_size • Sets the size of individual WAL files • When you have high WAL volume, number of WAL files per directory can become an issue. Increasing this helps. • Bunch of other parameters in docs – not as commonly used.
  18. My reads can do writes? • Any pages needed for

    reads go into shared_buffers • When there isn’t space to put a buffer into shared_buffers, a page needs to be evicted • If that page is ”dirty”, then it must be written to disk synchronously Bitmap Heap Scan on t0 (cost=38054.04..2162680.07 rows=2867546 width=36) (actual time=1310.569..2181906.171 rows=2859373 loops=1) Recheck Cond: ((col1 = 1) AND (col2 IS NULL)) Rows Removed by Index Recheck: 3617136 Heap Blocks: exact=38631 lossy=859715 Buffers: shared hit=113729 read=792973 written=24959
  19. In comes bgwriter • Bgwriter issues writes of “dirty” shared

    buffers • When clean shared buffers are insufficient, bgwriter writes dirty buffers and marks them as clean • Increases total I/O • Reduces likelihood of user queries having to write buffers themselves
  20. Bgwriter tuning parameters • bgwriter_delay : Time interval at which

    bgwriter wakes up to write dirty buffers • bgwriter_lru_multiplier : Chooses how many buffers should be written per cycle. • Number of buffers is average recent need * bgwriter_lru_multipler. • Default it 2 (giving some cushion for spikes). • Values < 1 indicate writes will be done by server processes. • bgwriter_lru_maxpages : Max limit on how many pages bgwriter can write in each round.
  21. Concepts for bgwriter tuning Tradeoff b/w increased total I/O vs

    I/O done by user queries Goal: See writes by backends and want lower latency Solution: Make bgwriter more aggressive How: Decrease bgwriter_delay and /or increase bgwriter_lru_multiplier Goal: Less total I/O Solution: Make bgwriter less aggressive How: Increase bgwriter_delay and decrease bgwriter_lru_multiplier
  22. There’s more • Shared_buffers tuning – larger for write heavy

    workloads • OS parameters – data_writeback= on, vm.dirty_background_bytes etc. • Group commit settings • And others…
  23. Tuning write workloads is tricky Defaults are not good for

    large workloads Explain doesn’t tell you as much Requires tuning bg processes vs individual statements Problems are more long term
  24. Optimize Postgres for write workloads Setup Make sure you are

    setup well and follow application best practices Monitor Understand the bottleneck by monitoring your database & logs Tune Tune Postgres processes like autovacuum, checkpointer and background writer to improve performance Adapt Keep adapting based on changing workload demands
  25. © Copyright Microsoft Corporation. All rights reserved. danke schön dank

    u merci धन्यवाद teşekkürler thank you grazie gracias tack @samay_sharma [email protected] aka.ms/open-source-discord, #cituscon channel Samay Sharma