Kumpulan Tutorial Kito

PostgreSQL High-Availability Cluster 🐘 💖

Production-ready PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.

This Ansible playbook is designed for deploying a PostgreSQL high availability cluster on dedicated physical servers for a production environment. The cluster can also be deployed on virtual machines and in the Cloud.

In addition to deploying new clusters, this playbook also support the deployment of cluster over already existing and running PostgreSQL. You can convert your basic PostgreSQL installation to a high availability cluster. Just specify the variable postgresql_exists='true' in the inventory file. Attention! Your PostgreSQL will be stopped before running in cluster mode (please plan for a short downtime of databases).

🏆 Use the sponsoring program to get personalized support, or just to contribute to this project.

Index

Cluster types

You have three schemes available for deployment:

[Type A] PostgreSQL High-Availability with HAProxy Load Balancing

To use this scheme, specify with_haproxy_load_balancing: true in variable file vars/main.yml

This scheme provides the ability to distribute the load on reading. This also allows us to scale out the cluster (with read-only replicas).

port 5000 (read / write) master
port 5001 (read only) all replicas

if variable "synchronous_mode" is 'true' (vars/main.yml):

port 5002 (read only) synchronous replica only
port 5003 (read only) asynchronous replicas only

❗ Your application must have support sending read requests to a custom port (ex 5001), and write requests (ex 5000).

Components of high availability:

Patroni is a template for you to create your own customized, high-availability solution using Python and - for maximum accessibility - a distributed configuration store like ZooKeeper, etcd, Consul or Kubernetes. Used for automate the management of PostgreSQL instances and auto failover.

etcd is a distributed reliable key-value store for the most critical data of a distributed system. etcd is written in Go and uses the Raft consensus algorithm to manage a highly-available replicated log. It is used by Patroni to store information about the status of the cluster and PostgreSQL configuration parameters.

What is Distributed Consensus?

Components of load balancing:

HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications.

confd manage local application configuration files using templates and data from etcd or consul. Used to automate HAProxy configuration file management.

Keepalived provides a virtual high-available IP address (VIP) and single entry point for databases access. Implementing VRRP (Virtual Router Redundancy Protocol) for Linux. In our configuration keepalived checks the status of the HAProxy service and in case of a failure delegates the VIP to another server in the cluster.

PgBouncer is a connection pooler for PostgreSQL.

[Type B] PostgreSQL High-Availability only

This is simple scheme without load balancing Used by default

To provide a single entry point (VIP) for database access is used "vip-manager". If the variable cluster_vip is specified (optional).

vip-manager is a service that gets started on all cluster nodes and connects to the DCS. If the local node owns the leader-key, vip-manager starts the configured VIP. In case of a failover, vip-manager removes the VIP on the old leader and the corresponding service on the new leader starts it there.
Written in Go. Cybertec Schönig & Schönig GmbH https://www.cybertec-postgresql.com

[Type C] PostgreSQL High-Availability with Consul Service Discovery (DNS)

To use this scheme, specify dcs_type: consul in variable file vars/main.yml

This scheme is suitable for master-only access and for load balancing (using DNS) for reading across replicas. Consul Service Discovery with DNS resolving is used as a client access point to the database.

Client access point (example):

master.postgres-cluster.service.consul
replica.postgres-cluster.service.consul

Besides, it can be useful for a distributed cluster across different data centers. We can specify in advance which data center the database server is located in and then use this for applications running in the same data center.

Example: replica.postgres-cluster.service.dc1.consul, replica.postgres-cluster.service.dc2.consul

It requires the installation of a consul in client mode on each application server for service DNS resolution (or use forward DNS to the remote consul server instead of installing a local consul client).

Compatibility

RedHat and Debian based distros (x86_64)

Supported Linux Distributions:

Debian: 10, 11, 12
Ubuntu: 18.04, 20.04, 22.04
CentOS: 7, 8
CentOS Stream: 8, 9
Oracle Linux: 7, 8, 9
Rocky Linux: 8, 9
AlmaLinux: 8, 9

PostgreSQL versions:

all supported PostgreSQL versions

✅ tested, works fine: PostgreSQL 10, 11, 12, 13, 14, 15, 16

Table of results of daily automated testing of cluster deployment:

Distribution	Test result
Debian 10
Debian 11
Debian 12
Ubuntu 20.04
Ubuntu 22.04
CentOS Stream 8
CentOS Stream 9
Oracle Linux 8
Oracle Linux 9
Rocky Linux 8
Rocky Linux 9
AlmaLinux 8
AlmaLinux 9

Ansible version

Minimum supported Ansible version: 2.11.0

Requirements

This playbook requires root privileges or sudo.

Ansible (What is Ansible?)

if dcs_type: "consul", please install consul role requirements on the control node:

ansible-galaxy install -r roles/consul/requirements.yml

Port requirements

List of required TCP ports that must be open for the database cluster:

5432 (postgresql)
6432 (pgbouncer)
8008 (patroni rest api)
2379, 2380 (etcd)

for the scheme "[Type A] PostgreSQL High-Availability with Load Balancing":

5000 (haproxy - (read/write) master)
5001 (haproxy - (read only) all replicas)
5002 (haproxy - (read only) synchronous replica only)
5003 (haproxy - (read only) asynchronous replicas only)
7000 (optional, haproxy stats)

for the scheme "[Type C] PostgreSQL High-Availability with Consul Service Discovery (DNS)":

8300 (Consul Server RPC)
8301 (Consul Serf LAN)
8302 (Consul Serf WAN)
8500 (Consul HTTP API)
8600 (Consul DNS server)

Recommenations

linux (Operation System):

Update your operating system on your target servers before deploying;

Make sure you have time synchronization is configured (NTP). Specify ntp_enabled:'true' and ntp_servers if you want to install and configure the ntp service.

DCS (Distributed Consensus Store):

Fast drives and a reliable network are the most important factors for the performance and stability of an etcd (or consul) cluster.

Avoid storing etcd (or consul) data on the same drive along with other processes (such as the database) that are intensively using the resources of the disk subsystem! Store the etcd and postgresql data on different disks (see etcd_data_dir, consul_data_path variables), use ssd drives if possible. See hardware recommendations and tuning guides.

It is recommended to deploy the DCS cluster on dedicated servers, separate from the database servers.

Placement of cluster members in different data centers:

If you’d prefer a cross-data center setup, where the replicating databases are located in different data centers, etcd member placement becomes critical.

There are quite a lot of things to consider if you want to create a really robust etcd cluster, but there is one rule: do not placing all etcd members in your primary data center. See some examples.

How to prevent data loss in case of autofailover (synchronous_modes and pg_rewind):

Due to performance reasons, a synchronous replication is disabled by default.

To minimize the risk of losing data on autofailover, you can configure settings in the following way:

synchronous_mode: 'true'
synchronous_mode_strict: 'true'
synchronous_commit: 'on' (or 'remote_apply')
use_pg_rewind: 'false' (enabled by default)

Deployment: quick start

Install Ansible on one control node (which could easily be a laptop)

sudo apt update && sudo apt install -y python3-pip sshpass git
pip3 install ansible

Download or clone this repository

git clone https://github.com/vitabaks/postgresql_cluster.git

Go to the playbook directory

cd postgresql_cluster/

Edit the inventory file

Specify (non-public) IP addresses and connection settings (`ansible_user`, `ansible_ssh_pass` or `ansible_ssh_private_key_file` for your environment

nano inventory

Edit the variable file vars/main.yml

nano vars/main.yml

Minimum set of variables:

proxy_env # if required (for download packages)
cluster_vip # for client access to databases in the cluster (optional)
patroni_cluster_name
postgresql_version
postgresql_data_dir
with_haproxy_load_balancing 'true' (Type A) or 'false'/default (Type B)
dcs_type # "etcd" (default) or "consul" (Type C)

if dcs_type: "consul", please install consul role requirements on the control node:

ansible-galaxy install -r roles/consul/requirements.yml

Try to connect to hosts

ansible all -m ping

Run playbook:

ansible-playbook deploy_pgcluster.yml

Deploy Cluster with TimescaleDB

To deploy a PostgreSQL High-Availability Cluster with the TimescaleDB extension, you just need to add the enable_timescale variable.

Example:

ansible-playbook deploy_pgcluster.yml -e "enable_timescale=true"

Variables

See the vars/main.yml, system.yml and (Debian.yml or RedHat.yml) files for more details.

Cluster Scaling

After you successfully deployed your PostgreSQL HA cluster, you may need to scale it further.
Use the add_pgnode.yml playbook for this.

Add new postgresql node to existing cluster

Add new haproxy balancer node

Restore and Cloning

Create new clusters from your existing backups with pgBackRest or WAL-G
Point-In-Time-Recovery

Click here to expand...

Maintenance

I recommend that you study the following materials for further maintenance of the cluster:

Changing PostgreSQL configuration parameters

To change the PostgreSQL configuration in a cluster using automation:

Update the postgresql_parameters variable with the desired parameter changes.
- Note: Optionally, set pending_restart: true to automatically restart PostgreSQL if a parameter change requires it.
Execute the config_pgcluster.yml playbook to apply the changes.

Using Git for cluster configuration management (IaC/GitOps)

Infrastructure as Code (IaC) is the managing and provisioning of infrastructure through code instead of through manual processes.
GitOps automates infrastructure updates using a Git workflow with continuous integration (CI) and continuous delivery (CI/CD). When new code is merged, the CI/CD pipeline enacts the change in the environment. Any configuration drift, such as manual changes or errors, is overwritten by GitOps automation so the environment converges on the desired state defined in Git.

Once the cluster is deployed, you can use the config_pgcluster.yml playbook to integrate with Git to manage cluster configurations.
For example, GitHub Action (link), GitLab CI/CD (link)

Details about IaC and GitOps:

Update the PostgreSQL HA Cluster

Use the update_pgcluster.yml playbook for update the PostgreSQL HA Cluster to a new minor version (for example 15.1->15.2, and etc).

Update PostgreSQL

Update Patroni

Update all system

More details here

PostgreSQL major upgrade

Use the pg_upgrade.yml playbook to upgrade the PostgreSQL to a new major version (for example 14->15, and etc).

Upgrade PostgreSQL

More details here

Disaster Recovery

A high availability cluster provides an automatic failover mechanism, and does not cover all disaster recovery scenarios. You must take care of backing up your data yourself.

etcd

Patroni nodes are dumping the state of the DCS options to disk upon for every change of the configuration into the file patroni.dynamic.json located in the Postgres data directory. The master (patroni leader) is allowed to restore these options from the on-disk dump if these are completely absent from the DCS or if they are invalid.

However, I recommend that you read the disaster recovery guide for the etcd cluster:

etcd disaster recovery

PostgreSQL (databases)

I can recommend the following backup and restore tools:

Do not forget to validate your backups (for example pgbackrest auto).

How to start from scratch

Should you need to start from very beginning, use the playbook remove_cluster.yml.

To prevent the script to be used by accident in a production environment, edit remove_cluster.yml and remove the safety pin. Change these variables accordingly:

remove_postgres: true
remove_etcd: true (or remove_consul)

Run the script and all the data are gone.

ansible-playbook remove_cluster.yml

A new installation can now be made from scratch.

❗ Be careful not to copy this script without the safety pin to the production environment.

License

Licensed under the MIT License. See the LICENSE file for details.

Author

Vitaliy Kukharik (PostgreSQL DBA)
vitabaks@gmail.com

Sponsor this project

Join our sponsorship program to directly contribute to our project's growth and gain exclusive access to personalized support. Your sponsorship is crucial for innovation and progress. Become a sponsor today!

Support our work through GitHub Sponsors

Support our work through Patreon

Support our work through a crypto wallet:

USDT (TRC20): TSTSXZzqDCUDHDjZwCpuBkdukjuDZspwjj

Consul: Add a nameserver entry poining to localhost for dnsmasq (#571 ) 139535c · 3 days ago History 947 Commits
.config	Set the dependency versions for the "yedit" module. (#529 )	2 months ago
.github	Remove compatibility with Ubuntu 18.04 (#495 )	3 months ago
files	Set the dependency versions for the "yedit" module. (#529 )	2 months ago
group_vars	Remove compatibility with Ubuntu 18.04 (#495 )	3 months ago
images	Add TypeC image	last year
meta	Removal of fqcn[action] rule from ansible-lint skip_list (#378 )	8 months ago
molecule	Molecule (fix): Add repository GPG key for AlmaLinux 8 (#566 )	5 days ago
roles	Add patroni_maximum_lag_on_replica variable (#569 )	3 days ago
vars	Consul: Add a nameserver entry poining to localhost for dnsmasq (#571 )	3 days ago
.dockerignore	🚀 chore(.dockerignore): add .venv/ to ignore list for Docker build (#351	9 months ago
.gitignore	chore(ansible-lint.yml): remove yaml[line-length] from skip_list (#417 )	7 months ago
.gitpod.yml	Refactor Project Structure and Makefile Targets (#357 )	9 months ago
CONTRIBUTING.md	Refactor Project Structure and Makefile Targets (#357 )	9 months ago
LICENSE	LICENSE: Update Copyright	last year
Makefile	Refactor Project Structure and Makefile Targets (#357 )	9 months ago
README.md	Describe "Changing PostgreSQL configuration parameters" (#556 )	3 weeks ago
add_balancer.yml	Add the Molecule tests for the Type C schema (Consul) (#511 )	3 months ago
add_pgnode.yml	Consul: Add a nameserver entry poining to localhost for dnsmasq (#571 )	3 days ago
ansible.cfg	Update ansible.cfg	last year
balancers.yml	Add the Molecule tests for the Type C schema (Consul) (#511 )	3 months ago
config_pgcluster.yml	Consul: Add a nameserver entry poining to localhost for dnsmasq (#571 )	3 days ago
consul.yml	Consul: Add a nameserver entry poining to localhost for dnsmasq (#571 )	3 days ago
deploy_pgcluster.yml	Consul: Add a nameserver entry poining to localhost for dnsmasq (#571 )	3 days ago
etcd_cluster.yml	Add the Molecule tests for the Type C schema (Consul) (#511 )	3 months ago
inventory	Inventory: Add a comment about using public IP addresses	7 months ago
pg_upgrade.yml	Do not use a proxy when accessing the Patroni API (#521 )	2 months ago
pg_upgrade_rollback.yml	Add the Molecule tests for the Type C schema (Consul) (#511 )	3 months ago
remove_cluster.yml	Add the Molecule tests for the Type C schema (Consul) (#511 )	3 months ago
requirements.txt	build(deps): bump ansible-core from 2.14.6 to 2.15.8 (#552 )	last month
tags.md	Added pgbouncer authentication via auth_user and auth_query (#401 )	7 months ago
update_pgcluster.yml	Disable 'pgdg' repo during system packages installation (RHEL). (#561 )	2 weeks ago

Selasa, 06 Februari 2024

PGSQL-cluster with ansible

History

Repository files navigation