PostgreSQL 16 High Availability and Horizontal Scalability: A Production-Ready Guide with Patroni, etcd, and HAProxy

PostgreSQL 16 High Availability and Horizontal Scalability Guide Welcome to
this comprehensive guide on building robust PostgreSQL 16 high availability and horizontal scalability solutions. In this presentation, we'll explore how to implement a fault-tolerant PostgreSQL environment using Patroni, etcd, and HAProxy to ensure your critical database services remain operational even during hardware failures or maintenance periods. PostgreSQL 16, released in September 2023, introduces significant performance improvements in query parallelism, bulk data loading, and logical replication. These enhancements make it an ideal candidate for mission-critical applications requiring continuous availability and scalability. by Shiv Iyer

Presentation Agenda 1 Architecture Overview Understanding the components and their
interactions 2 Setup & Configuration Detailed installation steps for each component 3 Operation & Management Testing, verification, and maintenance procedures Scalability Strategies Options for horizontal scaling and high-throughput solutions This presentation provides a detailed runbook for PostgreSQL DBAs to implement a production-ready high availability setup. We'll start with the architecture overview, proceed through the installation of each component, verify the cluster functionality, and explore advanced topics like scalability and maintenance operations.

Architecture Overview PostgreSQL Data Nodes (3) Three PostgreSQL servers running
version 16, managed by Patroni, with one primary and two replicas providing redundancy and failover capabilities. etcd Node (1) Distributed key-value store that maintains cluster state information and enables consensus for leader election during failover scenarios. HAProxy Node (1) Load balancer that provides a single endpoint for client connections and intelligently routes traffic to the appropriate database node based on health checks. This 5-node architecture creates a robust system where Patroni acts as a template for PostgreSQL high availability, managing replication, failover, and cluster configuration. The etcd server provides a reliable distributed storage layer for cluster state information, while HAProxy ensures client connections are always directed to the active primary node.

Component Roles and Interactions 1 PostgreSQL 16 The core database
engine that stores and processes data. Version 16 brings improvements in query parallelism, bulk loading, and logical replication. Patroni Manages PostgreSQL configuration, monitors node health, coordinates failover, and ensures data consistency across the cluster. etcd Provides distributed consensus and stores cluster state information, enabling Patroni to make consistent decisions about primary node election. HAProxy Distributes client connections, performs health checks, and ensures applications always connect to the active primary node. When working together, these components create a self-healing PostgreSQL environment. If the primary node fails, Patroni coordinates with etcd to elect a new primary from the available replicas, while HAProxy automatically redirects client connections to the newly elected primary.

System Requirements Server Count 5 Ubuntu 24.04 LTS servers CPU
Minimum 2 cores per server Memory Minimum 4GB RAM per server Storage At least 20GB for each PostgreSQL node Network Full connectivity between all nodes IP Addressing Static IP addresses required for all nodes These specifications represent the minimum requirements for a functional high-availability cluster. For production environments supporting significant workloads, consider increasing these specifications, particularly CPU cores, RAM, and storage capacity. Additionally, implementing redundant network connections is recommended to minimize the risk of network-related failures.

Network Configuration Prerequisites # Sample /etc/hosts file configuration 192.168.1.101 data-node1
192.168.1.102 data-node2 192.168.1.103 data-node3 192.168.1.104 etcd-node 192.168.1.105 haproxy-node Hostname Resolution Configure all servers to recognize each other by hostname by updating the /etc/hosts file on each server as shown above. Firewall Configuration Ensure required ports are open: PostgreSQL (5432), Patroni API (8008), etcd (2379, 2380), and HAProxy (5000, 7000). Connectivity Testing Verify network connectivity between all nodes using ping, telnet, or netcat before proceeding with installation. Proper network configuration is crucial for cluster functionality. All nodes must be able to communicate with each other to maintain synchronization, handle failover events, and process client requests. Any network segmentation or firewall rules must account for these communication paths.

Setting Up the etcd Node Download and Extract etcd Download
and extract the etcd package from GitHub. Version 3.4.32 or newer is recommended for compatibility with Patroni. Configure Environment Set the ETCD_NODE_PRIVATE_IP environment variable to your etcd node's IP address for use in configuration scripts. Create Systemd Service Configure etcd as a systemd service to ensure it starts automatically after system reboots. Enable and Start Service Enable the systemd service and start etcd to make it available for Patroni's cluster management. etcd plays a critical role in our high-availability architecture by providing a consistent view of the cluster state. It stores information about which node is currently the primary, synchronization status of replicas, and other crucial metadata that Patroni uses to make failover decisions.

etcd Configuration Parameters 2379 Client Port The port etcd uses
for client communications, including connections from Patroni nodes 2380 Peer Port The port used for etcd cluster communication when running multi- node etcd 5s Default TTL Time-to-live for key-value pairs # Sample etcd systemd service file [Unit] Description=etcd distributed key-value store Documentation=https://github.com/etcd-io/etcd After=network.target [Service] Type=simple ExecStart=/path/to/etcd --listen-peer-urls="http://192.168.1.104:2380" \ --listen-client-urls="http://localhost:2379,http://192.168.1.104:2379" \ --initial-advertise-peer-urls="http://192.168.1.104:2380" \ --initial-cluster="default=http://192.168.1.104:2380," \ --advertise-client-urls="http://192.168.1.104:2379" \ --initial-cluster-token="etcd-cluster" \ --initial-cluster-state="new" \ --enable-v2=true Restart=always RestartSec=10s LimitNOFILE=40000 [Install] WantedBy=multi-user.target

PostgreSQL and Patroni Installation Prepare OS Environment Set hostname and
update package repositories Install PostgreSQL 16 Install PostgreSQL packages and create symbolic links Set Up Python Environment Install Python dependencies and create virtual environment These installation steps need to be performed on all three PostgreSQL data nodes. PostgreSQL 16 brings significant performance improvements over previous versions, including enhanced query parallelism, faster bulk data loading operations, and improved logical replication capabilities. These features make it an excellent choice for high-throughput database environments. Installing Patroni in a Python virtual environment isolates its dependencies from the system Python packages, preventing potential conflicts with other applications and simplifying future upgrades.

PostgreSQL Installation Commands # Set hostname (replace N with node
number) sudo hostnamectl set-hostname data-nodeN # Update package repositories sudo apt-get update # Install PostgreSQL repository configuration sudo apt-get install postgresql-common -y sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh # Install PostgreSQL 16 sudo apt -y install postgresql # Stop PostgreSQL service (we'll manage it with Patroni) sudo systemctl stop postgresql # Create symbolic links to PostgreSQL binaries sudo ln -s /usr/lib/postgresql/16/bin/* /usr/sbin/ # Install required dependencies sudo apt-get install net-tools python3-pip python3-dev \ libpq-dev python3-venv -y These commands first install the PostgreSQL repository configuration, then install PostgreSQL 16 from the official repository. We stop the default PostgreSQL service since Patroni will handle starting and managing the database. Creating symbolic links makes PostgreSQL commands easily accessible in the system path. The required dependencies include networking tools, Python development libraries, and PostgreSQL client libraries, which are necessary for Patroni to interact with PostgreSQL effectively.

Patroni Installation Process 1 Create Virtual Environment Isolate Patroni dependencies
with Python venv Install Patroni Packages Use pip to install Patroni and dependencies Create Configuration Set up patroni.yml with cluster settings Configure as Service Create systemd service for automatic startup # Switch to postgres user sudo su postgres cd $HOME # Create Python virtual environment python3 -m venv patroni-packages source patroni-packages/bin/activate # Install Patroni and dependencies pip3 install --upgrade setuptools pip pip install psycopg[binary] patroni python-etcd # Create directories for Patroni mkdir -p patroni-packages/data chmod 700 patroni-packages/data touch patroni-packages/patroni.yml

Patroni Configuration File Cluster Identity Define the cluster scope, namespace,
and node name to uniquely identify the node within the cluster. etcd Configuration Specify the etcd endpoint that Patroni should use for storing cluster state information. PostgreSQL Settings Configure PostgreSQL-specific parameters, including data directory, connection settings, authentication details, and tuning parameters. REST API Configuration Set up the REST API endpoint that Patroni exposes for health checks and administrative operations. # Excerpt from patroni.yml scope: postgres namespace: Cluster name: data-node1 # Change for each node restapi: listen: 192.168.1.101:8008 # Use node's IP connect_address: 192.168.1.101:8008 etcd: host: 192.168.1.104:2379 # etcd node IP and port

PostgreSQL Configuration in Patroni # PostgreSQL configuration section in patroni.yml
postgresql: listen: 192.168.1.101:5432 connect_address: 192.168.1.101:5432 data_dir: /var/lib/postgresql/patroni-packages/data pgpass: /tmp/pgpass authentication: replication: username: replicator password: '123' superuser: username: postgres password: '123' parameters: unix_socket_directories: '.' The PostgreSQL configuration in Patroni controls both the initialization parameters used when first creating the database cluster and the runtime parameters used during normal operation. It also defines authentication settings for various user roles, including the superuser and replication accounts. Bootstrap Settings Initial configuration parameters for cluster creation PostgreSQL Parameters Runtime configuration for PostgreSQL instances Authentication Rules Access control configuration via pg_hba entries User Management Administrative and replication user setup

Patroni System Service Configuration # Patroni systemd service file [Unit]
Description=High availability PostgreSQL Cluster After=syslog.target network.target [Service] Type=simple User=postgres Group=postgres ExecStart=/var/lib/postgresql/patroni-packages/bin/patroni \ /var/lib/postgresql/patroni-packages/patroni.yml KillMode=process TimeoutSec=30 Restart=no [Install] WantedBy=multi-user.target Create Service File Create the systemd service file at /etc/systemd/system/patroni.service with the content shown above. Reload Daemon Execute sudo systemctl daemon-reload to update systemd's configuration. Enable Service Run sudo systemctl enable patroni to ensure Patroni starts automatically at system boot. Start Service Start the Patroni service with sudo systemctl start patroni.

HAProxy Installation and Configuration Install HAProxy Package Run sudo apt-get
install haproxy -y to install the HAProxy package from the Ubuntu repositories. Configure HAProxy Edit the configuration file at /etc/haproxy/haproxy.cfg to define the PostgreSQL service and health checks. Restart Service Apply the configuration by restarting HAProxy with sudo systemctl restart haproxy. HAProxy serves as the entry point for client applications connecting to the PostgreSQL cluster. It performs health checks on PostgreSQL nodes and routes traffic to the active primary node. If the primary node fails, HAProxy automatically redirects connections to the newly promoted primary, providing seamless failover for client applications. HAProxy achieves this by using HTTP health checks that query Patroni's REST API on each node to determine which node is currently the primary. This approach ensures that client connections are always directed to the correct node.

HAProxy Configuration Details global maxconn 100 log 127.0.0.1 local2 defaults
log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5s listen stats mode http bind *:7000 stats enable stats uri / listen postgres bind *:5000 option httpchk http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server node1 192.168.1.101:5432 maxconn 100 check port 8008 server node2 192.168.1.102:5432 maxconn 100 check port 8008 server node3 192.168.1.103:5432 maxconn 100 check port 8008 TCP Mode HAProxy operates in TCP mode for PostgreSQL traffic, providing optimal performance for database connections. HTTP Health Checks Uses HTTP health checks against Patroni's REST API (port 8008) to determine node status. Statistics Interface Provides a web-based statistics dashboard on port 7000 for monitoring connection status.

Verifying Cluster Setup sudo su postgres cd ~/patroni-packages bin/patronictl -c
patroni.yml list Check Cluster Status This command displays all nodes in the cluster, their roles (Leader or Replica), and their current state. Verify that one node shows as Leader and the others as Replica. + Cluster: postgres (6896192691322738263) -+----+--------- --+ | Member | Host | Role | State | TL | Lag in MB | +-----------+----------------+---------+---------+----+----------+ | data-node1 | 192.168.1.101 | Leader | running | 1 | | | data-node2 | 192.168.1.102 | Replica | running | 1 | 0 | | data-node3 | 192.168.1.103 | Replica | running | 1 | 0 | +-----------+----------------+---------+---------+----+----------+ Expected Output A healthy cluster shows all nodes in the "running" state with minimal or no replication lag (shown in the "Lag in MB" column). This verification step is crucial to confirm that the PostgreSQL cluster is correctly configured and all nodes are properly communicating. The patronictl command provides real-time information about the cluster state, making it an essential tool for both setup verification and ongoing monitoring.

Connecting to the Cluster via HAProxy # Connect to the
cluster through HAProxy psql -h 192.168.1.105 -p 5000 -d postgres -U postgres Password for user postgres: 123 # Test database creation postgres=# CREATE DATABASE testdb; CREATE DATABASE # List databases to verify creation postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+---------+-------+----------------------- postgres | postgres | UTF8 | C.UTF-8 | C.UTF-8 | template0 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres testdb | postgres | UTF8 | C.UTF-8 | C.UTF-8 | (4 rows) This test confirms that client applications can successfully connect to the PostgreSQL cluster through HAProxy and perform database operations. HAProxy transparently routes the connection to the current primary node, providing a single stable endpoint for applications regardless of which physical node is currently serving as the primary. During failover events, applications may experience a brief connection interruption, but reconnection attempts will be routed to the newly promoted primary node, minimizing downtime and application impact.

Testing Automatic Failover sudo systemctl stop patroni Step 1: Stop
Patroni on Primary Identify the current primary node using patronictl list, then stop the Patroni service on that node to simulate a failure: sudo su postgres cd ~/patroni-packages bin/patronictl -c patroni.yml list Step 2: Verify Failover Check the cluster status from another node to confirm that a replica has been automatically promoted to the Leader role: During the failover process, Patroni detects that the primary node is unavailable and coordinates with etcd to elect a new primary from the available replicas. The election process selects the replica with the most up-to-date data to minimize potential data loss. A successful failover test will show that one of the previous replicas now has the "Leader" role, and HAProxy has adjusted its routing to direct connections to the newly promoted primary. This confirms that the high-availability mechanisms are functioning correctly and can handle actual node failures in production.

PostgreSQL 16 Performance Improvements PostgreSQL 16 introduces significant performance improvements
that benefit high-availability deployments. Enhanced query parallelism allows better utilization of multiple CPU cores, while faster bulk loading capabilities enable quicker data migrations and ETL processes. The improvements to logical replication provide more efficient and reliable data synchronization between database instances. The new pg_stat_io view offers deeper insights into I/O performance, helping DBAs identify and resolve bottlenecks more effectively. Together, these enhancements make PostgreSQL 16 an excellent choice for demanding enterprise applications. Enhanced Parallelism Improved parallel query execution for faster analytics workloads Faster Bulk Loading Optimized COPY operations for improved data ingestion Logical Replication Enhanced performance and feature set for logical replication Monitoring View New pg_stat_io view for detailed I/O performance analysis

Read Scalability with HAProxy # Add this to /etc/haproxy/haproxy.cfg listen
postgres_read bind *:5001 option httpchk OPTIONS /replica http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server node1 192.168.1.101:5432 maxconn 100 check port 8008 server node2 192.168.1.102:5432 maxconn 100 check port 8008 server node3 192.168.1.103:5432 maxconn 100 check port 8008 Primary Connections (port 5000) Write operations Transactions requiring consistency Read operations needing latest data Read Pool Connections (port 5001) Read-only queries Reporting workloads Analytics operations Application features that can tolerate slight data lag The architecture already supports read scalability by directing read-only queries to replica nodes. This configuration creates a separate HAProxy endpoint on port 5001 specifically for read operations, allowing applications to explicitly choose between connecting to the primary for write operations or to replicas for read-heavy workloads.

Write Scalability: Data Sharding Partition data across clusters Split tables
by logical boundaries Connect with Foreign Data Wrappers Use postgres_fdw to query across shards Implement application-level routing Direct queries to appropriate shards Data sharding is a horizontal scaling approach that involves partitioning your database across multiple PostgreSQL clusters. Each cluster is responsible for a subset of data, typically divided by a key like customer ID, geography, or date ranges. PostgreSQL's declarative partitioning feature provides built-in support for table partitioning, while Foreign Data Wrappers (FDW) allow queries across multiple shards. This approach dramatically increases write capacity by distributing write operations across multiple primary nodes, each responsible for a different data partition. However, it introduces complexity in query planning and may require application changes to route queries to the appropriate shard.

Write Scalability: Logical Replication 1 2 3 4 -- On
source database CREATE PUBLICATION sales_pub FOR TABLE orders, customers; -- On target database CREATE SUBSCRIPTION sales_sub CONNECTION 'host=192.168.1.101 port=5432 dbname=sales user=repl_user password=secret' PUBLICATION sales_pub; Logical replication in PostgreSQL allows selective replication of database objects between different instances. Unlike physical replication, which replicates the entire data directory, logical replication operates at the table level and can replicate between different PostgreSQL versions. For horizontal scalability, logical replication can be configured bidirectionally between multiple PostgreSQL instances for specific tables, enabling write distribution across multiple nodes. This approach requires careful conflict resolution strategies but provides flexibility in how data is distributed. Publication Creation Define which tables to replicate Subscription Setup Configure target databases to receive changes Conflict Resolution Implement strategies for handling data conflicts Monitoring Track replication lag and health

Comparing Scaling Approaches Approach Write Scalability Query Complexity Implementation Effort
Consistency Model Read Replicas Low Low Low Eventually Consistent Data Sharding High High High Strongly Consistent Within Shard Logical Replication Medium Medium Medium Eventually Consistent Citus Extension High Medium Medium Strongly Consistent Within Shard The optimal scaling approach depends on your specific requirements. Read replicas provide the simplest solution for read-heavy workloads but don't improve write capacity. Data sharding offers the highest write scalability but increases query complexity. Logical replication provides a middle ground with moderate write scaling and implementation effort. For many applications, a hybrid approach works best: using read replicas for read scalability while implementing selective sharding for write-intensive tables or implementing logical replication for specific use cases requiring distributed writes.

Adding a New Node to the Cluster Install PostgreSQL and
Patroni Follow the same installation steps used for the original nodes. Configure Patroni Create patroni.yml with a unique node name and appropriate IP addresses. Start Patroni Service Enable and start the Patroni service on the new node. Update HAProxy Configuration Add the new node to HAProxy's server list and restart HAProxy. When adding a new node to an existing cluster, Patroni automatically handles the bootstrapping process. It connects to etcd to obtain the current cluster state, then initiates a base backup from the primary node to synchronize the data. Once the initial synchronization is complete, the node joins the cluster as a replica and begins streaming replication. Adding nodes increases read capacity and improves fault tolerance by providing additional failover candidates. The process is non-disruptive to existing operations, allowing you to scale your cluster horizontally as demand grows.

Backup Strategies Physical Backups pg_basebackup for full cluster backups Continuous
archiving of WAL files Point-in-time recovery capabilities Logical Backups pg_dump for single database export pg_dumpall for entire cluster export Selective object restoration Backup Tools pgBackRest for advanced backup management Barman for disaster recovery WAL-G for cloud-optimized backups # Example: Taking a physical backup with pg_basebackup pg_basebackup -h 192.168.1.101 -D /path/to/backup -U replicator -P -v Implementing a robust backup strategy is essential for data protection. Physical backups capture the entire database cluster and support point-in-time recovery, while logical backups provide flexibility for selective restoration and cross-version migrations.

Implementing WAL Archiving Transaction Logging PostgreSQL records changes in WAL
files Archive Command Custom script copies WAL segments to archive storage Archive Storage WAL files stored in a secure location Recovery Restore base backup and replay WAL files # Add these settings to postgresql.conf via Patroni parameters: archive_mode: 'on' archive_command: 'cp %p /archive/%f' archive_timeout: '60' WAL (Write-Ahead Log) archiving is a crucial component of a comprehensive backup strategy. By continuously archiving transaction log files, you can recover to any point in time between full backups, minimizing potential data loss in disaster scenarios. The archive_command can be customized to copy WAL files to various storage destinations, including local disks, network file systems, or cloud storage. For production environments, consider using tools like pgBackRest or WAL-G that provide compression, encryption, and parallel transfer capabilities.

Monitoring the PostgreSQL Cluster HAProxy Stats Access the HAProxy statistics
dashboard at http://192.168.1.105:7000/ to monitor connection status and distribution. Patroni REST API Query the Patroni API endpoints to check node health and cluster status: curl http://192.168.1.101:8008/clust er PostgreSQL Metrics Monitor database performance using pg_stat_io, pg_stat_database, and other system views. Alerting Tools Set up Prometheus with postgres_exporter and Grafana for comprehensive monitoring and alerting. Effective monitoring is essential for maintaining a healthy PostgreSQL cluster. At a minimum, monitor replication lag, connection counts, query performance, disk usage, and system resource utilization. PostgreSQL 16's new pg_stat_io view provides valuable insights into I/O performance across different components of the database.

Key Metrics to Monitor <5s Replication Lag Time delay between
primary and replica nodes 85% CPU Utilization Processor usage across database nodes 70% Disk Usage Storage capacity utilization 200ms Query Response Time Average time to complete queries -- Check replication lag in PostgreSQL SELECT client_addr, pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes, NOW() - state_change AS lag_time FROM pg_stat_replication; -- Monitor I/O statistics in PostgreSQL 16 SELECT * FROM pg_stat_io; These metrics provide critical insights into the health and performance of your PostgreSQL cluster. Replication lag indicates how far behind replicas are from the primary, which affects both data freshness for read operations and potential data loss during failover. CPU and disk utilization metrics help identify resource constraints, while query response time directly impacts application performance.

Setting Up Prometheus and Grafana Install postgres_exporter Deploy postgres_exporter on
each PostgreSQL node to collect and expose metrics. Configure Prometheus Set up Prometheus to scrape metrics from postgres_exporter endpoints and store time-series data. Install Grafana Deploy Grafana for visualization and create dashboards to display PostgreSQL metrics. Configure Alerting Set up alert rules in Prometheus and notification channels in Grafana for automated monitoring. # Example Prometheus configuration for PostgreSQL monitoring - job_name: 'postgres' static_configs: - targets: ['192.168.1.101:9187', '192.168.1.102:9187', '192.168.1.103:9187'] scrape_interval: 15s Prometheus and Grafana provide a powerful monitoring solution for PostgreSQL clusters. Prometheus collects and stores time- series metrics, while Grafana creates visually appealing dashboards. The combination enables comprehensive monitoring of both database-specific metrics and system-level resources.

Troubleshooting: Patroni Cannot Connect to etcd Verify etcd is Running
Check etcd service status: sudo systemctl status etcd 2 Check Network Connectivity Ensure network access between Patroni and etcd: telnet 192.168.1.104 2379 Test etcd Health Verify etcd is healthy: curl -L http://192.168.1.104:2379/health 4 Validate Configuration Confirm etcd host in patroni.yml matches the actual etcd endpoint # Expected output from etcd health check {"health":"true"} # Check etcd member list curl -L http://192.168.1.104:2379/v2/members Connection issues between Patroni and etcd can prevent proper cluster formation and operation. When troubleshooting these issues, first ensure that etcd is running and accessible from the PostgreSQL nodes. Network connectivity problems, firewall rules, or incorrect configuration settings are common causes of connection failures.

Troubleshooting: Replication Lag Issues 1 2 3 -- Check replication
lag in bytes SELECT client_addr, pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes FROM pg_stat_replication; -- Check for resource-intensive queries SELECT pid, now() - query_start AS duration, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC LIMIT 10; Excessive replication lag can impact both read scalability and disaster recovery capabilities. When replicas fall too far behind the primary, read queries may return outdated data, and failover operations risk greater data loss. Common causes of replication lag include insufficient resources on replica servers, network bandwidth limitations, or intensive write workloads on the primary. Identify Lag Measure replication lag using PostgreSQL metrics Check Resources Verify sufficient CPU, memory, disk I/O on replicas Network Bandwidth Ensure adequate network throughput between nodes Analyze Workload Examine write intensity on primary node

Troubleshooting: Split-Brain Scenarios Stop All Nodes Stop Patroni services on
all PostgreSQL nodes to prevent further divergence. Identify Latest Node Determine which node has the most recent and complete data set. Reset Cluster Remove the DCS keys to allow reinitializing the cluster. Restart Sequence Start Patroni on the most up-to-date node first, then other nodes. # Reset cluster in etcd curl -X DELETE http://192.168.1.104:2379/v2/keys/postgres?recursive=true # Start Patroni on the most up-to-date node sudo systemctl start patroni Split-brain scenarios occur when multiple nodes believe they are the primary, typically due to network partitions or etcd failures. While Patroni and etcd are designed to prevent split-brain situations through distributed consensus, they can still occur in certain edge cases. Resolving split-brain situations requires careful analysis to identify the node with the most current data, followed by a controlled restart sequence to re-establish proper cluster roles. This process may involve some data loss if conflicting writes occurred during the split-brain period.

Advanced: Multi-Region High Availability Global Load Balancing Route users to
nearest healthy region Regional Clusters Independent HA clusters in each region Cross-Region Replication Logical or physical replication between regions Regional Infrastructure Separate etcd clusters and HAProxy in each region For global applications requiring high availability across geographic regions, consider implementing a multi-region PostgreSQL architecture. This approach involves creating independent high-availability clusters in each region, then establishing cross- region replication to synchronize data between regions. Cross-region replication can be implemented using PostgreSQL's logical replication for selective data synchronization or physical replication for complete database mirroring. Global load balancers direct users to the nearest healthy region, while regional failover mechanisms handle localized failures.

PostgreSQL 16's New pg_stat_io View 0 15,000 30,000 45,000 Data
Files WAL Temporary Files Read Operations Write Operations -- Query the new pg_stat_io view SELECT * FROM pg_stat_io; -- Sample output columns include: -- backend_type, object, context, reads, read_time, writes, write_time, -- extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time PostgreSQL 16 introduces the pg_stat_io view, which provides detailed statistics about I/O operations across various database components. This new monitoring capability allows database administrators to identify I/O bottlenecks with greater precision than previously possible. The view breaks down I/O operations by backend type (e.g., background writer, checkpointer), object type (data files, WAL, etc.), and context (normal, vacuum, etc.). Metrics include operation counts, processing times, and bytes transferred, providing comprehensive insight into database I/O patterns.

Major Changes in PostgreSQL 16 Performance Improvements Significant improvements in
query parallelism, inlining SQL procedures, and commit throughput enhancement for busy systems. Logical Replication Enhancements Support for streaming large in-progress transactions, two-phase commit, and row filtering provide more flexibility and performance. Monitoring and Observability Introduction of pg_stat_io for detailed I/O statistics and enhanced WAL archiving status tracking. Security Enhancements Addition of FEATURE privilege system and improvements to existing security mechanisms like SCRAM. PostgreSQL 16 represents a significant evolution in the world's most advanced open-source database. The performance improvements alone make upgrading worthwhile for many high-throughput applications, while the enhancements to logical replication provide more flexibility for distributed database architectures.

Security Considerations Network Security Implement firewall rules to restrict access
to PostgreSQL ports Use TLS/SSL encryption for all database connections Set up private VLANs or VPCs for internal cluster communication Authentication Use SCRAM-SHA-256 authentication instead of MD5 Implement strong password policies Consider certificate-based authentication for internal services Authorization Apply principle of least privilege to database roles Use row-level security for multi- tenant applications Implement column-level encryption for sensitive data Security should be a top priority when deploying PostgreSQL in production environments. The default Patroni configuration uses password-based authentication for simplicity, but production deployments should implement stronger security measures. PostgreSQL 16's new FEATURE privilege system provides more granular control over who can use specific database features.

Enabling SSL/TLS Encryption Generate SSL Certificates Create server certificates for
secure connections 2 Configure PostgreSQL Parameters Set SSL-related parameters in Patroni configuration Configure Client Connections Update HAProxy and client applications to use SSL # Add these SSL parameters to patroni.yml parameters: ssl: 'on' ssl_cert_file: '/etc/ssl/certs/postgres.crt' ssl_key_file: '/etc/ssl/private/postgres.key' ssl_ca_file: '/etc/ssl/certs/ca.crt' ssl_prefer_server_ciphers: 'on' # Update HAProxy configuration for SSL mode tcp option ssl-hello-chk server node1 192.168.1.101:5432 check port 8008 check-ssl verify none Encrypting PostgreSQL connections with SSL/TLS is essential for protecting data in transit, especially when database traffic traverses untrusted networks. The configuration involves generating server certificates, configuring PostgreSQL to use those certificates, and updating client applications to establish SSL connections.

Secure Authentication Configuration Authentication Methods SCRAM-SHA-256: Modern password-based authentication Certificate:
Client certificate authentication LDAP: Enterprise directory integration GSSAPI: Kerberos-based authentication # In patroni.yml bootstrap: pg_hba: - host all all 0.0.0.0/0 scram-sha-256 - hostssl all all 0.0.0.0/0 cert - host replication replicator 127.0.0.1/32 scram-sha- 256 postgresql: authentication: replication: username: replicator password: 'secure_password' method: 'scram-sha-256' superuser: username: postgres password: 'secure_password' method: 'scram-sha-256' Implementation in Patroni Update the authentication section to specify SCRAM authentication: Secure authentication is critical for protecting your PostgreSQL cluster from unauthorized access. SCRAM-SHA-256 is the recommended password-based authentication method, as it offers significantly better security than the older MD5 method. For production environments, consider using certificate-based authentication for internal services and integrating with enterprise directory services for user authentication.

Testing and Validation Strategies 1 Comprehensive testing is essential before
deploying a PostgreSQL high-availability cluster to production. Start with validating the configuration of each component to ensure it matches your requirements and best practices. Then, perform controlled failover tests to verify that automatic failover mechanisms work correctly under various failure scenarios. Performance testing helps identify potential bottlenecks and validate that the cluster can handle your expected workload. Finally, recovery testing ensures that your backup and restoration procedures are effective and meet your recovery time and point objectives (RTO/RPO). Configuration Validation Verify settings across all components Failover Testing Simulate various failure scenarios Performance Testing Benchmark under expected workloads Recovery Testing Validate backup and restoration procedures

Failover Scenario Testing Failure Scenario Test Method Expected Outcome Validation
Steps Primary Node Crash Stop Patroni service Automatic promotion of replica Check patronictl output, verify HAProxy redirects Network Partition Block network traffic using iptables Majority side elects new primary Verify cluster continues operations on correct side etcd Failure Stop etcd service Cluster remains operational but cannot failover Verify existing primary continues to serve requests HAProxy Failure Stop HAProxy service Applications using direct node connections continue Test direct connections to known primary Systematically testing different failure scenarios is crucial for understanding how your PostgreSQL cluster will behave during actual incidents. For each scenario, document the expected outcome, test method, and validation steps to ensure the cluster responds correctly. After each test, analyze the results to identify potential improvements in your configuration or operational procedures. Regular failover testing helps build confidence in your high-availability setup and ensures that your team is familiar with the behavior of the system during failures.

Performance Tuning Memory Configuration Optimize shared_buffers, work_mem, and maintenance_work_mem based
on available RAM and workload characteristics. Autovacuum Tuning Adjust autovacuum settings to balance between keeping tables clean and minimizing performance impact. Checkpoint Settings Configure checkpoint segments and timeouts to minimize I/O spikes while ensuring data durability. Parallelism Set max_parallel_workers and related parameters to leverage multiple CPU cores effectively. # Example performance-tuned parameters in patroni.yml parameters: # Memory Settings shared_buffers: '1GB' work_mem: '32MB' maintenance_work_mem: '256MB' # Checkpoint Settings max_wal_size: '2GB' min_wal_size: '1GB' checkpoint_timeout: '15min' # Parallelism max_worker_processes: 8 max_parallel_workers_per_gather: 4 max_parallel_workers: 8 # Autovacuum autovacuum_max_workers: 3 autovacuum_naptime: '1min' autovacuum_vacuum_threshold: 50 autovacuum_analyze_threshold: 50

Connection Pooling with PgBouncer Client Applications Multiple applications connect to
PgBouncer instead of directly to PostgreSQL Connection Pooler PgBouncer manages a pool of database connections 3 HAProxy Load balancer routes connections to the primary node 4 PostgreSQL Cluster Database handles fewer connections, improving efficiency # Example PgBouncer configuration (pgbouncer.ini) [databases] * = host=192.168.1.105 port=5000 dbname=postgres [pgbouncer] listen_addr = * listen_port = 6432 auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt pool_mode = transaction max_client_conn = 1000 default_pool_size = 20 Connection pooling with PgBouncer significantly reduces the overhead of establishing new PostgreSQL connections, especially in environments with many short-lived connections. By maintaining a pool of persistent connections to the database, PgBouncer allows applications to quickly obtain a connection from the pool rather than establishing a new one.

Multi-Tenant Database Strategies Shared Database, Shared Schema All tenants share
a single database and schema, with a tenant identifier column in each table. Lowest cost and simplest to maintain Limited tenant isolation Row-level security can provide data separation Shared Database, Separate Schemas All tenants share a database but have their own schema. Better isolation with schema-level separation Simplified backup and maintenance vs. separate databases Can apply different permissions per tenant Separate Databases Each tenant gets a dedicated database instance. Strongest isolation between tenants Flexibility in resource allocation and scaling Highest operational complexity and cost When implementing multi-tenant applications with PostgreSQL, the choice of tenancy model significantly impacts security, performance, scalability, and operational complexity. Each approach has its trade-offs, and the best choice depends on your specific requirements regarding tenant isolation, resource allocation, and operational overhead.

PostgreSQL Feature Comparison Query Parallelism Logical Replication Monitoring Tools Security
Features 0 40 80 120 PostgreSQL 14 PostgreSQL 15 PostgreSQL 16 PostgreSQL 16 brings significant improvements across multiple areas compared to previous versions. Query parallelism enhancements deliver better performance for analytical workloads, while logical replication improvements provide more flexibility for distributed database architectures. The new pg_stat_io view enhances monitoring capabilities, and the FEATURE privilege system adds granularity to security controls. For high-availability deployments, PostgreSQL 16's improved logical replication is particularly valuable, enabling more sophisticated multi-region and active-active configurations. The performance improvements also contribute to better throughput and lower latency during normal operations.

Upgrade Considerations Test Upgrade in Staging Thoroughly test upgrade process
and application compatibility Comprehensive Backup Create full backups before attempting upgrade 3 Application Testing Verify application compatibility with PostgreSQL 16 Detailed Upgrade Plan Document every step with rollback procedures Upgrading to PostgreSQL 16 in a high-availability environment requires careful planning to minimize downtime and risk. The upgrade process typically involves building a new PostgreSQL 16 cluster alongside the existing one, then using logical replication or a similar method to migrate data from the old cluster to the new one. Patroni provides built-in support for major version upgrades through its pg_upgrade integration. This allows you to perform in- place upgrades of PostgreSQL with minimal downtime, though thorough testing in a staging environment is still essential before upgrading production systems.

Resource Sizing Guidelines Database Size Concurrent Connections CPU Cores Memory
(RAM) Storage Small (<100GB) <100 2-4 cores 8-16GB SSD, 3x data size Medium (100GB- 1TB) 100-500 4-8 cores 16-64GB SSD, 3x data size Large (1TB-10TB) 500-2000 8-16 cores 64-256GB SSD/NVMe, 2-3x data size Very Large (>10TB) >2000 16+ cores 256GB+ NVMe, 2x data size Properly sizing your PostgreSQL cluster is crucial for performance and stability. These guidelines provide a starting point, but actual requirements will vary based on workload characteristics, query complexity, and performance expectations. CPU cores impact query parallelism and concurrency, while memory directly affects caching efficiency and query performance. Storage performance is particularly important for PostgreSQL. SSDs or NVMe drives are strongly recommended, with sufficient capacity for the database, indexes, and growth over time. For critical workloads, consider provisioned IOPS or similar performance guarantees to ensure consistent I/O performance.

Recommended Operational Practices Regular Maintenance Windows Schedule routine maintenance operations
like VACUUM FULL and REINDEX during low-traffic periods to minimize impact on performance. Systematic Backup Strategy Implement regular backups with retention policies aligned with your recovery point objective (RPO). Test restoration procedures periodically. Proactive Monitoring Monitor key metrics with alerting thresholds set below critical levels to address issues before they impact service. Documentation and Runbooks Maintain detailed documentation and runbooks for common operational tasks and emergency procedures. Successful PostgreSQL operations require a combination of proactive monitoring, regular maintenance, robust backup strategies, and well-documented procedures. Patroni automates many aspects of high availability, but operational excellence still requires human oversight and intervention for tasks that cannot be fully automated. Investing time in creating comprehensive runbooks and documentation pays dividends during incident response, when clear procedures can significantly reduce mean time to recovery (MTTR) and prevent human errors under pressure.

Summary and Key Takeaways Architecture A robust PostgreSQL high-availability cluster
combines PostgreSQL 16, Patroni, etcd, and HAProxy to provide automatic failover capabilities with minimal downtime. Scalability Horizontal scaling can be achieved through read replicas, data sharding, or logical replication, depending on your specific requirements for read vs. write scalability. Operations Effective operations require comprehensive monitoring, regular testing of failover procedures, and well- documented maintenance processes. PostgreSQL 16 Benefits The latest version provides significant performance improvements in query parallelism, bulk data loading, and logical replication, enhancing both standard operations and disaster recovery capabilities. This presentation has provided a comprehensive guide to implementing a PostgreSQL 16 high-availability cluster with Patroni, etcd, and HAProxy. By following these best practices, you can create a robust, scalable, and resilient database infrastructure capable of supporting mission-critical applications. The combination of PostgreSQL 16's performance improvements with Patroni's automated failover capabilities creates a powerful foundation for databases that require both high availability and scalability. As you implement this architecture, remember that thorough testing and ongoing operational discipline are key to maintaining the reliability and performance of your PostgreSQL environment.

PostgreSQL 16 High Availability and Horizontal ...

PostgreSQL 16 High Availability and Horizontal Scalability: A Production-Ready Guide with Patroni, etcd, and HAProxy

More Decks by Shiv Iyer

Other Decks in Technology

Featured

Transcript