-
-
Notifications
You must be signed in to change notification settings - Fork 442
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add playbook: update_pgcluster (#281)
- Loading branch information
Showing
14 changed files
with
873 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
## Update the PostgreSQL HA Cluster | ||
|
||
This role is designed to update the PostgreSQL HA cluster to a new minor version (for example, 15.1->15.2, and etc). | ||
|
||
By default, only PostgreSQL packages defined in the postgresql_packages variable are updated (vars/Debian.yml or vars/RedHat.yml). In addition, you can update Patroni or the entire system. | ||
|
||
#### Usage | ||
|
||
Update PostgreSQL: | ||
|
||
`ansible-playbook update_pgcluster.yml` | ||
|
||
Update Patroni: | ||
|
||
`ansible-playbook update_pgcluster.yml -e target=patroni` | ||
|
||
Update all system packages: | ||
|
||
`ansible-playbook update_pgcluster.yml -e target=system` | ||
|
||
|
||
#### Variables | ||
|
||
- `target` | ||
- Defines the target for the update. | ||
- Available values: 'postgres', 'patroni', 'system' | ||
- Default value: postgres | ||
- `max_replication_lag_bytes` | ||
- Determines the size of the replication lag above which the update will not be performed. | ||
- If the lag is high, you will be prompted to try again later. | ||
- Default value: 10485760 (10 MiB) | ||
- `max_transaction_sec` | ||
- Determines the maximum transaction time, in the presence of which the update will not be performed. | ||
- If long-running transactions are present, you will be prompted to try again later. | ||
- Default value: 15 (seconds) | ||
- `update_extensions` | ||
- If 'true', an attempt will be made to automatically update all extensions for all databases. | ||
- Specify 'false', to avoid updating extensions. | ||
- Default value: true | ||
--- | ||
|
||
## Plan: | ||
|
||
Note: About the expected downtime of the database during the update: | ||
|
||
When using load balancing for read-only traffic (the "Type A" and "Type C" schemes), zero downtime is expected (for read traffic), provided there is more than one replica in the cluster. For write traffic (to the Primary), the expected downtime is ~5-10 seconds. | ||
|
||
#### 1. PRE-UPDATE: Perform Pre-Checks | ||
- Test PostgreSQL DB Access | ||
- Make sure that physical replication is active | ||
- Stop, if there are no active replicas | ||
- Make sure there is no high replication lag | ||
- Note: no more than `max_replication_lag_bytes` | ||
- Stop, if replication lag is high | ||
- Make sure there are no long-running transactions | ||
- no more than `max_transaction_sec` | ||
- Stop, if long-running transactions detected | ||
#### 2. UPDATE: Secondary (one by one) | ||
- Stop read-only traffic | ||
- Enable `noloadbalance`, `nosync`, `nofailover` parameters in the patroni.yml | ||
- Reload patroni service | ||
- Make sure replica endpoint is unavailable | ||
- Wait for active transactions to complete | ||
- Stop Services | ||
- Execute CHECKPOINT before stopping PostgreSQL | ||
- Stop Patroni service on the Cluster Replica | ||
- Update PostgreSQL | ||
- if `target` variable is not defined or `target=postgres` | ||
- Install the latest version of PostgreSQL packages | ||
- Update Patroni | ||
- if `target=patroni` (or `system`) | ||
- Install the latest version of Patroni package | ||
- Update all system packages (includes PostgreSQL and Patroni) | ||
- if `target=system` | ||
- Update all system packages | ||
- Start Services | ||
- Start Patroni service | ||
- Wait for Patroni port to become open on the host | ||
- Check that the Patroni is healthy | ||
- Check PostgreSQL is started and accepting connections | ||
- Start read-only traffic | ||
- Disable `noloadbalance`, `nosync`, `nofailover` parameters in the patroni.yml | ||
- Reload patroni service | ||
- Make sure replica endpoint is available | ||
- Perform the same steps for the next replica server. | ||
#### 3. UPDATE: Primary | ||
- Switchover Patroni leader role | ||
- Perform switchover of the leader for the Patroni cluster | ||
- Make sure that the Patroni is healthy and is a replica | ||
- Notes: | ||
- At this stage, the leader becomes a replica | ||
- the database downtime is ~5 seconds (write traffic) | ||
- Stop read-only traffic | ||
- Enable `noloadbalance`, `nosync`, `nofailover` parameters in the patroni.yml | ||
- Reload patroni service | ||
- Make sure replica endpoint is unavailable | ||
- Wait for active transactions to complete | ||
- Stop Services | ||
- Execute CHECKPOINT before stopping PostgreSQL | ||
- Stop Patroni service on the old Cluster Leader | ||
- Update PostgreSQL | ||
- if `target` variable is not defined or `target=postgres` | ||
- Install the latest version of PostgreSQL packages | ||
- Update Patroni | ||
- if `target=patroni` (or `system`) | ||
- Install the latest version of Patroni package | ||
- Update all system packages (includes PostgreSQL and Patroni) | ||
- if `target=system` | ||
- Update all system packages | ||
- Start Services | ||
- Start Patroni service | ||
- Wait for Patroni port to become open on the host | ||
- Check that the Patroni is healthy | ||
- Check PostgreSQL is started and accepting connections | ||
- Start read-only traffic | ||
- Disable `noloadbalance`, `nosync`, `nofailover` parameters in the patroni.yml | ||
- Reload patroni service | ||
- Make sure replica endpoint is available | ||
#### 4. POST-UPDATE: Update extensions | ||
- Update extensions | ||
- Get the current Patroni Cluster Leader Node | ||
- Get a list of databases | ||
- Update extensions in each database | ||
- Get a list of old PostgreSQL extensions | ||
- Update old PostgreSQL extensions (if an update is required) | ||
- Check the Patroni cluster state | ||
- Check the current PostgreSQL version | ||
- List the Patroni cluster members | ||
- Update completed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
--- | ||
- name: 'Get the current Patroni Cluster Leader Node' | ||
uri: | ||
url: http://{{ inventory_hostname }}:{{ patroni_restapi_port }}/leader | ||
status_code: 200 | ||
register: patroni_leader_result | ||
changed_when: false | ||
failed_when: false | ||
|
||
- name: Get a list of databases | ||
command: psql -tAXc "select datname from pg_catalog.pg_database where not datistemplate" | ||
register: databases_list | ||
changed_when: false | ||
when: | ||
- patroni_leader_result.status == 200 | ||
|
||
- name: Update extensions in each database | ||
include_tasks: update_extensions.yml | ||
loop: "{{ databases_list.stdout_lines }}" | ||
loop_control: | ||
loop_var: pg_target_dbname | ||
when: databases_list.stdout_lines is defined | ||
... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
# patroni_installation_method: "pip" | ||
- block: | ||
- name: Install the latest version of Patroni | ||
pip: | ||
name: patroni | ||
state: latest | ||
executable: pip3 | ||
extra_args: "--trusted-host=pypi.python.org --trusted-host=pypi.org --trusted-host=files.pythonhosted.org" | ||
umask: "0022" | ||
environment: | ||
PATH: "{{ ansible_env.PATH }}:/usr/local/bin:/usr/bin" | ||
when: installation_method == "repo" and patroni_installation_method == "pip" | ||
environment: "{{ proxy_env | default({}) }}" | ||
vars: | ||
ansible_python_interpreter: /usr/bin/python3 | ||
|
||
# patroni_installation_method: "rpm/deb" | ||
- block: | ||
# Debian | ||
- name: Install the latest version of Patroni packages | ||
package: | ||
name: "{{ patroni_packages| default('patroni')}}" | ||
state: latest | ||
when: ansible_os_family == "Debian" and patroni_deb_package_repo | length < 1 | ||
|
||
# RedHat | ||
- name: Install the latest version of Patroni packages | ||
package: | ||
name: "{{ patroni_packages| default('patroni')}}" | ||
state: latest | ||
when: ansible_os_family == "RedHat" and patroni_rpm_package_repo | length < 1 | ||
|
||
# when patroni_deb_package_repo or patroni_rpm_package_repo URL is defined | ||
# Debian | ||
- name: Download Patroni deb package | ||
get_url: | ||
url: "{{ item }}" | ||
dest: /tmp/ | ||
timeout: 60 | ||
validate_certs: false | ||
loop: "{{ patroni_deb_package_repo | list }}" | ||
when: ansible_os_family == "Debian" and patroni_deb_package_repo | length > 0 | ||
|
||
- name: Install Patroni from deb package | ||
apt: | ||
force_apt_get: true | ||
deb: "/tmp/{{ item }}" | ||
state: present | ||
loop: "{{ patroni_deb_package_repo | map('basename') | list }}" | ||
when: ansible_os_family == "Debian" and patroni_deb_package_repo | length > 0 | ||
|
||
# RedHat | ||
- name: Download Patroni rpm package | ||
get_url: | ||
url: "{{ item }}" | ||
dest: /tmp/ | ||
timeout: 60 | ||
validate_certs: false | ||
loop: "{{ patroni_rpm_package_repo | list }}" | ||
when: ansible_os_family == "RedHat" and patroni_rpm_package_repo | length > 0 | ||
|
||
- name: Install Patroni from rpm package | ||
package: | ||
name: "/tmp/{{ item }}" | ||
state: present | ||
loop: "{{ patroni_rpm_package_repo | map('basename') | list }}" | ||
when: ansible_os_family == "RedHat" and patroni_rpm_package_repo | length > 0 | ||
environment: "{{ proxy_env | default({}) }}" | ||
when: | ||
- installation_method == "repo" | ||
- (patroni_installation_method == "rpm" or patroni_installation_method == "deb") | ||
... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
--- | ||
- name: Clean yum cache | ||
command: yum clean all | ||
when: | ||
- ansible_os_family == "RedHat" | ||
- ansible_distribution_major_version == '7' | ||
|
||
- name: Clean dnf cache | ||
command: dnf clean all | ||
when: | ||
- ansible_os_family == "RedHat" | ||
- ansible_distribution_major_version is version('8', '>=') | ||
|
||
- name: Update apt cache | ||
apt: | ||
update_cache: true | ||
cache_valid_time: 3600 | ||
when: ansible_os_family == "Debian" | ||
|
||
- name: Install the latest version of PostgreSQL packages | ||
package: | ||
name: "{{ item }}" | ||
state: latest | ||
loop: "{{ postgresql_packages }}" | ||
... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
--- | ||
- name: '[Pre-Check] (ALL) Test PostgreSQL DB Access' | ||
command: psql -tAXc 'select 1' | ||
changed_when: false | ||
|
||
- name: '[Pre-Check] Make sure that physical replication is active' | ||
command: >- | ||
psql -tAXc "select count(*) from pg_stat_replication | ||
where application_name != 'pg_basebackup'" | ||
register: pg_replication_state | ||
changed_when: false | ||
when: | ||
- inventory_hostname in groups['primary'] | ||
|
||
# Stop, if there are no active replicas | ||
- name: "Pre-Check error. Print physical replication state" | ||
fail: | ||
msg: "There are no active replica servers (pg_stat_replication returned 0 entries)." | ||
when: | ||
- inventory_hostname in groups['primary'] | ||
- pg_replication_state.stdout | int == 0 | ||
|
||
- name: '[Pre-Check] Make sure there is no high replication lag (more than {{ max_replication_lag_bytes | human_readable }})' | ||
command: >- | ||
psql -tAXc "select pg_wal_lsn_diff(pg_current_wal_lsn(), | ||
replay_lsn) pg_lag_bytes from pg_stat_replication | ||
order by pg_lag_bytes desc limit 1" | ||
register: pg_lag_bytes | ||
changed_when: false | ||
failed_when: false | ||
until: pg_lag_bytes.stdout|int < max_replication_lag_bytes|int | ||
retries: 30 | ||
delay: 5 | ||
when: | ||
- inventory_hostname in groups['primary'] | ||
|
||
# Stop, if replication lag is high | ||
- block: | ||
- name: "Print replication lag" | ||
debug: | ||
msg: "Current replication lag: | ||
{{ pg_lag_bytes.stdout | int | human_readable }}" | ||
|
||
- name: "Pre-Check error. Please try again later" | ||
fail: | ||
msg: High replication lag on the Patroni Cluster, please try again later. | ||
when: | ||
- pg_lag_bytes.stdout is defined | ||
- pg_lag_bytes.stdout|int >= max_replication_lag_bytes|int | ||
|
||
- name: '[Pre-Check] Make sure there are no long-running transactions (more than {{ max_transaction_sec }} seconds)' | ||
command: >- | ||
psql -tAXc "select pid, usename, client_addr, clock_timestamp() - xact_start as xact_age, | ||
state, wait_event_type ||':'|| wait_event as wait_events, | ||
left(regexp_replace(query, E'[ \\t\\n\\r]+', ' ', 'g'),100) as query | ||
from pg_stat_activity | ||
where clock_timestamp() - xact_start > '{{ max_transaction_sec }} seconds'::interval | ||
and backend_type = 'client backend' and pid <> pg_backend_pid() | ||
order by xact_age desc limit 10" | ||
register: pg_long_transactions | ||
changed_when: false | ||
failed_when: false | ||
until: pg_long_transactions.stdout | length < 1 | ||
retries: 30 | ||
delay: 2 | ||
when: | ||
- inventory_hostname in groups['primary'] | ||
|
||
# Stop, if long-running transactions detected | ||
- block: | ||
- name: "Print long-running (>{{ max_transaction_sec }}s) transactions" | ||
debug: | ||
msg: "{{ pg_long_transactions.stdout_lines }}" | ||
|
||
- name: "Pre-Check error. Please try again later" | ||
fail: | ||
msg: long-running transactions detected (more than {{ max_transaction_sec }} seconds), please try again later. | ||
when: | ||
- pg_long_transactions.stdout is defined | ||
- pg_long_transactions.stdout | length > 0 | ||
... |
Oops, something went wrong.