Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade: Fix PgBouncer socket dir permissions on RHEL #534

Merged
merged 4 commits into from
Dec 21, 2023

Conversation

vitabaks
Copy link
Owner

Problem:
There was a discovered issue in Red Hat Enterprise Linux (RHEL) environments where the PgBouncer socket directory had incorrect permissions. This problem specifically affected the operation of PAUSE pools in PgBouncer before pg_upgrade.

Solution:
This commit fixes the access rights issue by checking the access rights to the PgBouncer socket directory files before pausing pools. The changes ensure that the directory has appropriate ownership and permissions.

Fixed:

  TASK [upgrade : Perform PAUSE on all pgbouncers servers] ***********************
  included: /home/runner/work/postgresql_cluster/postgresql_cluster/roles/upgrade/tasks/pgbouncer_pause.yml for 10.172.2.20, 10.172.2.21, 10.172.2.22
  fatal: [10.172.2.20]: FAILED! => {"changed": true, "cmd": "set -o pipefail;\n\npg_servers=\"10.172.2.20\\n10.172.2.21\\n10.172.2.22\"\npg_servers_count=\"3\"\npg_slow_active_count_query=\"select count(*) from pg_stat_activity where pid <> pg_backend_pid() and state <> 'idle' and query_start < clock_timestamp() - interval '1000 ms' and backend_type = 'client backend'\"\npg_slow_active_terminate_query=\"select\n  clock_timestamp(),\n  pg_terminate_backend(pid),\n  clock_timestamp() - query_start as query_age,\n  left(regexp_replace(query, E'[ \\\\t\\\\n\\\\r]+', ' ', 'g'),150) as query\nfrom pg_stat_activity where pid <> pg_backend_pid() and state <> 'idle' and query_start < clock_timestamp() - interval '100 ms' and backend_type = 'client backend'\"\n# it is assumed that pgbouncer is installed on database servers\npgb_servers=\"$pg_servers\"\npgb_servers_count=\"$pg_servers_count\"\npgb_count=\"12\"\npgb_pause_command=\"printf '%s\\n' /var/run/pgbouncer /var/run/pgbouncer-2 /var/run/pgbouncer-3 /var/run/pgbouncer-4 | xargs -I {} -P 4 -n 1 psql -h {} -p 6432 -U postgres -d pgbouncer -tAXc 'PAUSE'\"\npgb_resume_command='kill -SIGUSR2 $(pidof pgbouncer)'\n\nstart_time=$(date +%s)\nwhile true; do\n  current_time=$(date +%s)\n  # initialize pgb_paused_count to 0 (we assume that all pgbouncers are not paused)\n  pgb_paused_count=0\n\n  # wait for the active queries to complete on pg_servers\n  IFS=$'\\n' pg_slow_active_counts=($(echo -e \"$pg_servers\" | xargs -I {} -P \"$pg_servers_count\" -n 1 ssh -o StrictHostKeyChecking=no {} \"psql -p 5432 -U postgres -d postgres -tAXc \\\"$pg_slow_active_count_query\\\"\"))\n\n  # sum up all the values in the array\n  total_pg_slow_active_count=0\n  for count in \"${pg_slow_active_counts[@]}\"; do\n    total_pg_slow_active_count=$((total_pg_slow_active_count + count))\n  done\n\n  echo \"$(date): total pg_slow_active_count: $total_pg_slow_active_count\"\n\n  if [[ \"$total_pg_slow_active_count\" == 0 ]]; then\n    # pause pgbouncer on all pgb_servers. We send via ssh to all pgbouncers in parallel and collect results from all (maximum wait time 2 seconds)\n    IFS=$'\\n' pause_results=($(echo -e \"$pgb_servers\" | xargs -I {} -P \"$pgb_servers_count\" -n 1 ssh -o StrictHostKeyChecking=no {} \"timeout 2 $pgb_pause_command 2>&1 || true\"))\n    echo \"${pause_results[*]}\"\n    # analyze the pause_results array to count the number of paused pgbouncers\n    pgb_paused_count=$(echo \"${pause_results[*]}\" | grep -o -e \"PAUSE\" -e \"already suspended/paused\" | wc -l)\n    echo \"$(date): pgb_count: $pgb_count, pgb_paused: $pgb_paused_count\"\n  fi\n\n  # make sure that the pause is performed on all pgbouncer servers, to ensure atomicity\n  if [[ \"$pgb_paused_count\" -eq \"$pgb_count\" ]]; then\n    break # pause is performed on all pgb_servers, exit from the loop\n  elif [[ \"$pgb_paused_count\" -gt 0 && \"$pgb_paused_count\" -ne \"$pgb_count\" ]]; then\n    # pause is not performed on all pgb_servers, perform resume (we do not use timeout because we mast to resume all pgbouncers)\n    IFS=$'\\n' resume_results=($(echo -e \"$pgb_servers\" | xargs -I {} -P \"$pgb_servers_count\" -n 1 ssh -o StrictHostKeyChecking=no {} \"$pgb_resume_command 2>&1 || true\"))\n    echo \"${resume_results[*]}\"\n  fi\n\n  # after 30 seconds of waiting, terminate active sessions on pg_servers and try pausing again\n  if (( current_time - start_time >= 30 )); then\n    echo \"$(date): terminate active queries\"\n    echo -e \"$pg_servers\" | xargs -I {} -P \"$pg_servers_count\" -n 1 ssh -o StrictHostKeyChecking=no {} \"psql -p 5432 -U postgres -d postgres -tAXc \\\"$pg_slow_active_terminate_query\\\"\"\n  fi\n\n  # if it was not possible to pause for 60 seconds, exit with an error\n  if (( current_time - start_time >= 60 )); then\n    echo \"$(date): it was not possible to pause (exit by timeout)\"\n    exit 1\n  fi\ndone > /tmp/pgbouncer_pool_pause_2023-12-16.log\n", "delta": "0:01:00.255280", "end": "2023-12-16 00:32:03.849474", "msg": "non-zero return code", "rc": 1, "start": "2023-12-16 00:31:03.594194", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

Fix the problem with the rights to the PgBouncer socket directory when executing PAUSE pools for RHEL.
@vitabaks vitabaks self-assigned this Dec 21, 2023
@vitabaks vitabaks changed the title Fix PgBouncer socket dir permissions on RHEL upgrade: Fix PgBouncer socket dir permissions on RHEL Dec 21, 2023
@vitabaks
Copy link
Owner Author

vitabaks commented Dec 21, 2023

Test: pg_upgrade (RockyLinux 8)

PLAY [pg_upgrade.yml | Upgrade PostgreSQL 14 to the new version 16] ***

TASK [Gathering Facts] *********************************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include main variables] **************************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include upgrade variables] ***********************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]
[WARNING]: Module remote_tmp /var/lib/pgsql/.ansible/tmp did not exist and was
created with a mode of 0700, this may cause issues when running as another
user. To avoid this, create the remote_tmp dir with the correct permissions
manually

TASK [[Prepare] Get Patroni Cluster Leader Node] *******************************
ok: [10.172.2.22]
ok: [10.172.2.20]
ok: [10.172.2.21]

TASK [[Prepare] Add host to group "primary" (in-memory inventory)] *************
ok: [10.172.2.20] => (item=10.172.2.20)

TASK [[Prepare] Add hosts to group "secondary" (in-memory inventory)] **********
ok: [10.172.2.20] => (item=10.172.2.21)
ok: [10.172.2.20] => (item=10.172.2.22)

TASK [Print Patroni Cluster info] **********************************************
ok: [10.172.2.20] => {
    "msg": [
        "Cluster Name: postgres-cluster",
        "Cluster Leader: pgnode01"
    ]
}

PLAY [(1/7) PRE-UPGRADE: Perform Pre-Checks] ***********************************

TASK [Include main variables] **************************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include OS-specific variables] *******************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include upgrade variables] ***********************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Running Pre-Checks] ******************************************************

TASK [upgrade : [Pre-Check] Make sure the ansible required Python library is installed] ***
ok: [10.172.2.22] => (item=pexpect)
ok: [10.172.2.20] => (item=pexpect)
ok: [10.172.2.21] => (item=pexpect)

TASK [upgrade : [Pre-Check] Test PostgreSQL database access using a unix socket] ***
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [upgrade : [Pre-Check] Check the current version of PostgreSQL] ***********
ok: [10.172.2.20]

TASK [upgrade : Set variable 'current_pg_version'] *****************************
ok: [10.172.2.20]

TASK [upgrade : [Pre-Check] Ensure new data directory is different from the current one] ***
ok: [10.172.2.20]

TASK [upgrade : [Pre-Check] Make sure that physical replication is active] *****
ok: [10.172.2.20]

TASK [upgrade : [Pre-Check] Make sure there is no high replication lag (more than 10.00 MB)] ***
ok: [10.172.2.20]

TASK [upgrade : [Pre-Check] Make sure there are no long-running transactions (more than 15 seconds)] ***
ok: [10.172.2.21]
ok: [10.172.2.20]
ok: [10.172.2.22]

TASK [upgrade : [Pre-Check] Make sure that SSH key-based authentication is configured between cluster nodes] ***
included: /workspace/postgresql_cluster/roles/upgrade/tasks/ssh-keys.yml for 10.172.2.20, 10.172.2.21, 10.172.2.22

TASK [upgrade : Make sure that the openssh-clients package is installed] *******
changed: [10.172.2.20]
changed: [10.172.2.21]
changed: [10.172.2.22]

TASK [upgrade : Make sure the SSH key for user "postgres" exists] **************
changed: [10.172.2.21]
changed: [10.172.2.22]
changed: [10.172.2.20]

TASK [upgrade : Fetch public SSH keys from database servers] *******************
ok: [10.172.2.20]
ok: [10.172.2.22]
ok: [10.172.2.21]

TASK [upgrade : Add public SSH keys to authorized_keys] ************************
changed: [10.172.2.22]
changed: [10.172.2.20]
changed: [10.172.2.21]

TASK [upgrade : Remove public SSH keys from localhost] *************************
ok: [10.172.2.20 -> localhost] => (item=10.172.2.20)
ok: [10.172.2.20 -> localhost] => (item=10.172.2.21)
ok: [10.172.2.20 -> localhost] => (item=10.172.2.22)

TASK [upgrade : known_hosts | for each host, scan for its ssh public key] ******
ok: [10.172.2.22] => (item=10.172.2.20)
ok: [10.172.2.21] => (item=10.172.2.20)
ok: [10.172.2.20] => (item=10.172.2.20)
ok: [10.172.2.21] => (item=10.172.2.21)
ok: [10.172.2.20] => (item=10.172.2.21)
ok: [10.172.2.22] => (item=10.172.2.21)
ok: [10.172.2.22] => (item=10.172.2.22)
ok: [10.172.2.21] => (item=10.172.2.22)
ok: [10.172.2.20] => (item=10.172.2.22)

TASK [upgrade : known_hosts | for each host, add/update the public key in the "~postgres/.ssh/known_hosts"] ***
changed: [10.172.2.21] => (item=None)
changed: [10.172.2.20] => (item=None)
changed: [10.172.2.22] => (item=None)
changed: [10.172.2.20] => (item=None)
changed: [10.172.2.21] => (item=None)
changed: [10.172.2.22] => (item=None)
changed: [10.172.2.20] => (item=None)
changed: [10.172.2.21] => (item=None)
changed: [10.172.2.22] => (item=None)
changed: [10.172.2.20]
changed: [10.172.2.21]
changed: [10.172.2.22]

TASK [upgrade : [Pre-Check] Make sure that the rsync package are installed] ****
ok: [10.172.2.22]
ok: [10.172.2.20]
ok: [10.172.2.21]

TASK [upgrade : [Pre-Check] Rsync Checks: create testrsync file on Primary] ****
changed: [10.172.2.20]

TASK [upgrade : [Pre-Check] Rsync Checks: test rsync and ssh key access] *******
changed: [10.172.2.20] => (item=10.172.2.21)
changed: [10.172.2.20] => (item=10.172.2.22)

TASK [upgrade : [Pre-Check] Cleanup testrsync file] ****************************
changed: [10.172.2.22]
changed: [10.172.2.21]
changed: [10.172.2.20]

TASK [upgrade : [Pre-Check] Check if PostgreSQL tablespaces exist] *************
ok: [10.172.2.20]

TASK [upgrade : [Pre-Check] Test PgBouncer access via unix socket] *************
ok: [10.172.2.21] => (item=pgbouncer)
ok: [10.172.2.20] => (item=pgbouncer)
ok: [10.172.2.22] => (item=pgbouncer)
ok: [10.172.2.20] => (item=pgbouncer-2)
ok: [10.172.2.22] => (item=pgbouncer-2)
ok: [10.172.2.21] => (item=pgbouncer-2)
ok: [10.172.2.22] => (item=pgbouncer-3)
ok: [10.172.2.21] => (item=pgbouncer-3)
ok: [10.172.2.20] => (item=pgbouncer-3)
ok: [10.172.2.21] => (item=pgbouncer-4)
ok: [10.172.2.22] => (item=pgbouncer-4)
ok: [10.172.2.20] => (item=pgbouncer-4)

PLAY [(2/7) PRE-UPGRADE: Install new PostgreSQL packages] **********************

TASK [Include main variables] **************************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include OS-specific variables] *******************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include upgrade variables] ***********************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Install packages] ********************************************************

TASK [upgrade : Clean dnf cache] ***********************************************
changed: [10.172.2.20]
changed: [10.172.2.22]
changed: [10.172.2.21]

TASK [upgrade : Install PostgreSQL 16 packages] ********************************
changed: [10.172.2.22] => (item=postgresql16)
changed: [10.172.2.20] => (item=postgresql16)
changed: [10.172.2.21] => (item=postgresql16)
changed: [10.172.2.22] => (item=postgresql16-server)
changed: [10.172.2.21] => (item=postgresql16-server)
changed: [10.172.2.20] => (item=postgresql16-server)
changed: [10.172.2.22] => (item=postgresql16-contrib)
changed: [10.172.2.20] => (item=postgresql16-contrib)
changed: [10.172.2.21] => (item=postgresql16-contrib)

PLAY [(3/7) PRE-UPGRADE: Initialize new db, schema compatibility check, and pg_upgrade --check] ***

TASK [Include main variables] **************************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include OS-specific variables] *******************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include upgrade variables] ***********************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Create Data directory and initdb] ****************************************

TASK [upgrade : Make sure new PostgreSQL data directory "/pgdata/16/main" exists] ***
changed: [10.172.2.20]
changed: [10.172.2.22]
changed: [10.172.2.21]

TASK [upgrade : Make sure new PostgreSQL data directory "/pgdata/16/main" is not initialized] ***
ok: [10.172.2.20]
ok: [10.172.2.22]
ok: [10.172.2.21]

TASK [upgrade : Get the current install user] **********************************
ok: [10.172.2.20]

TASK [upgrade : Get the current encodig and data_checksums settings] ***********
ok: [10.172.2.20] => (item=server_encoding)
ok: [10.172.2.20] => (item=lc_collate)
ok: [10.172.2.20] => (item=lc_ctype)
ok: [10.172.2.20] => (item=lc_messages)
ok: [10.172.2.20] => (item=lc_monetary)
ok: [10.172.2.20] => (item=lc_numeric)
ok: [10.172.2.20] => (item=lc_time)
ok: [10.172.2.20] => (item=data_checksums)

TASK [upgrade : Initialize new PostgreSQL data directory on the Primary] *******
changed: [10.172.2.20]

TASK [Copy files] **************************************************************

TASK [upgrade : Get the current shared_preload_libraries settings] *************
ok: [10.172.2.20]

TASK [upgrade : Check if PostgreSQL is running] ********************************
ok: [10.172.2.20]

TASK [upgrade : Start new PostgreSQL on port 5433 to check the schema compatibility] ***
changed: [10.172.2.20]

TASK [upgrade : Wait for PostgreSQL to start] **********************************
changed: [10.172.2.20]

TASK [upgrade : Check the compatibility of the database schema with the PostgreSQL 16] ***
changed: [10.172.2.20]

TASK [upgrade : Wait for the schema compatibility check to complete.] **********
changed: [10.172.2.20]

TASK [upgrade : Checking the result of the schema compatibility] ***************
ok: [10.172.2.20]

TASK [upgrade : Result of checking the compatibility of the scheme - success] ***
ok: [10.172.2.20] => {
    "msg": "The database schema are compatible with PostgreSQL 16"
}

TASK [upgrade : Stop new PostgreSQL to re-initdb] ******************************
changed: [10.172.2.20]

TASK [upgrade : Reinitialize the database after checking schema compatibility] ***
included: /workspace/postgresql_cluster/roles/upgrade/tasks/initdb.yml for 10.172.2.20

TASK [upgrade : Make sure new PostgreSQL data directory "/pgdata/16/main" exists] ***
ok: [10.172.2.20]

TASK [upgrade : Make sure new PostgreSQL data directory "/pgdata/16/main" is not initialized] ***
ok: [10.172.2.20]

TASK [upgrade : Clear the new PostgreSQL data directory "/pgdata/16/main"] *****
changed: [10.172.2.20] => (item=absent)
changed: [10.172.2.20] => (item=directory)

TASK [upgrade : Get the current install user] **********************************
ok: [10.172.2.20]

TASK [upgrade : Get the current encodig and data_checksums settings] ***********
ok: [10.172.2.20] => (item=server_encoding)
ok: [10.172.2.20] => (item=lc_collate)
ok: [10.172.2.20] => (item=lc_ctype)
ok: [10.172.2.20] => (item=lc_messages)
ok: [10.172.2.20] => (item=lc_monetary)
ok: [10.172.2.20] => (item=lc_numeric)
ok: [10.172.2.20] => (item=lc_time)
ok: [10.172.2.20] => (item=data_checksums)

TASK [upgrade : Initialize new PostgreSQL data directory on the Primary] *******
changed: [10.172.2.20]

TASK [upgrade : Get the current shared_preload_libraries settings] *************
ok: [10.172.2.20]

TASK [upgrade : Set the variable: pg_shared_preload_libraries_value] ***********
ok: [10.172.2.20]

TASK [upgrade : Verify the two clusters are compatible (pg_upgrade --check)] ***
changed: [10.172.2.20]

TASK [upgrade : Print the result of the pg_upgrade check] **********************
ok: [10.172.2.20] => {
    "failed_when_result": false,
    "pg_upgrade_check_result.stdout_lines": [
        "Performing Consistency Checks on Old Live Server",
        "------------------------------------------------",
        "Checking cluster versions                                     ok",
        "Checking database user is the install user                    ok",
        "Checking database connection settings                         ok",
        "Checking for prepared transactions                            ok",
        "Checking for system-defined composite types in user tables    ok",
        "Checking for reg* data types in user tables                   ok",
        "Checking for contrib/isn with bigint-passing mismatch         ok",
        "Checking for incompatible \"aclitem\" data type in user tables  ok",
        "Checking for presence of required libraries                   ok",
        "Checking database user is the install user                    ok",
        "Checking for prepared transactions                            ok",
        "Checking for new cluster tablespace directories               ok",
        "",
        "*Clusters are compatible*"
    ]
}

PLAY [(4/7) PRE-UPGRADE: Prepare the Patroni configuration] ********************

TASK [Include main variables] **************************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include OS-specific variables] *******************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include upgrade variables] ***********************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Patroni config] **********************************************************

TASK [upgrade : Edit patroni.yml | update parameters: data_dir, bin_dir, config_dir] ***
changed: [10.172.2.20] => (item=data_dir: /pgdata/16/main)
changed: [10.172.2.21] => (item=data_dir: /pgdata/16/main)
changed: [10.172.2.22] => (item=data_dir: /pgdata/16/main)
changed: [10.172.2.20] => (item=bin_dir: /usr/pgsql-16/bin)
changed: [10.172.2.22] => (item=bin_dir: /usr/pgsql-16/bin)
changed: [10.172.2.21] => (item=bin_dir: /usr/pgsql-16/bin)
changed: [10.172.2.20] => (item=config_dir: /pgdata/16/main)
changed: [10.172.2.22] => (item=config_dir: /pgdata/16/main)
changed: [10.172.2.21] => (item=config_dir: /pgdata/16/main)

TASK [upgrade : Edit patroni.yml | check if the 'stats_temp_directory' parameter is specified] ***
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [upgrade : Edit patroni.yml | remove parameter: 'stats_temp_directory'] ***
changed: [10.172.2.20]
changed: [10.172.2.21]
changed: [10.172.2.22]

TASK [upgrade : Edit patroni.yml | check if the 'force_parallel_mode' parameter is specified] ***
ok: [10.172.2.22]
ok: [10.172.2.20]
ok: [10.172.2.21]

TASK [upgrade : Edit patroni.yml | check if the 'promote_trigger_file' parameter is specified] ***
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [upgrade : Edit patroni.yml | check if the 'vacuum_defer_cleanup_age' parameter is specified] ***
ok: [10.172.2.22]
ok: [10.172.2.21]
ok: [10.172.2.20]

TASK [upgrade : Copy pg_hba.conf to /pgdata/16/main] ***************************
changed: [10.172.2.20]
changed: [10.172.2.21]
changed: [10.172.2.22]

PLAY [(5/7) UPGRADE: Upgrade PostgreSQL] ***************************************

TASK [Include main variables] **************************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include OS-specific variables] *******************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include upgrade variables] ***********************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Enable maintenance mode] *************************************************

TASK [upgrade : Pause Patroni cluster (enable maintenance mode)] ***************
changed: [10.172.2.20]

TASK [upgrade : Update haproxy conf file (disable http-checks)] ****************
changed: [10.172.2.20]
changed: [10.172.2.21]
changed: [10.172.2.22]

TASK [upgrade : Reload haproxy service] ****************************************
changed: [10.172.2.22]
changed: [10.172.2.21]
changed: [10.172.2.20]

TASK [upgrade : Stop Patroni service] ******************************************
changed: [10.172.2.21]
changed: [10.172.2.20]
changed: [10.172.2.22]
FAILED - RETRYING: [10.172.2.20]: Wait until the Patroni cluster is stopped (30 retries left).
FAILED - RETRYING: [10.172.2.20]: Wait until the Patroni cluster is stopped (29 retries left).
FAILED - RETRYING: [10.172.2.20]: Wait until the Patroni cluster is stopped (28 retries left).

TASK [upgrade : Wait until the Patroni cluster is stopped] *********************
ok: [10.172.2.20]

TASK [Stop Services] ***********************************************************

TASK [upgrade : Execute CHECKPOINT before stopping PostgreSQL] *****************
changed: [10.172.2.20]
changed: [10.172.2.22]
changed: [10.172.2.21]

TASK [upgrade : Wait for the CHECKPOINT to complete] ***************************
changed: [10.172.2.20]
changed: [10.172.2.22]
changed: [10.172.2.21]

TASK [upgrade : Wait until replication lag is less than 10.00 MB] **************
ok: [10.172.2.20]

TASK [upgrade : Perform PAUSE on all pgbouncers servers] ***********************
included: /workspace/postgresql_cluster/roles/upgrade/tasks/pgbouncer_pause.yml for 10.172.2.20, 10.172.2.21, 10.172.2.22

TASK [upgrade : Check the permissions to the PgBouncer unix socket directory] ***
changed: [10.172.2.20] => (item=pgbouncer)
changed: [10.172.2.21] => (item=pgbouncer)
changed: [10.172.2.22] => (item=pgbouncer)
ok: [10.172.2.20] => (item=pgbouncer-2)
ok: [10.172.2.21] => (item=pgbouncer-2)
ok: [10.172.2.22] => (item=pgbouncer-2)
ok: [10.172.2.21] => (item=pgbouncer-3)
ok: [10.172.2.20] => (item=pgbouncer-3)
ok: [10.172.2.22] => (item=pgbouncer-3)
ok: [10.172.2.20] => (item=pgbouncer-4)
ok: [10.172.2.21] => (item=pgbouncer-4)
ok: [10.172.2.22] => (item=pgbouncer-4)

TASK [upgrade : PAUSE PgBouncer pools] *****************************************
changed: [10.172.2.20]

TASK [upgrade : Stop PostgreSQL on the Leader] *********************************
changed: [10.172.2.20]

TASK [upgrade : Stop PostgreSQL on the Replica] ********************************
changed: [10.172.2.21]
changed: [10.172.2.22]

TASK [upgrade : Check if PostgreSQL 14 is stopped] *****************************
ok: [10.172.2.20]
ok: [10.172.2.22]
ok: [10.172.2.21]

TASK [upgrade : Check if PostgreSQL 16 is stopped] *****************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Check 'Latest checkpoint location'] **************************************

TASK [upgrade : Get 'Latest checkpoint location' on the Leader] ****************
ok: [10.172.2.20]

TASK [upgrade : Get 'Latest checkpoint location' on the Replicas] **************
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [upgrade : Print 'Latest checkpoint location' for the Leader] *************
ok: [10.172.2.20] => {
    "msg": "Leader's latest checkpoint location: 0/6000028"
}

TASK [upgrade : Print 'Latest checkpoint location' for the Replica] ************
ok: [10.172.2.21] => {
    "msg": "Replica: 10.172.2.21, latest checkpoint location: 0/6000028"
}
ok: [10.172.2.22] => {
    "msg": "Replica: 10.172.2.22, latest checkpoint location: 0/6000028"
}

TASK [upgrade : Check if all 'Latest checkpoint location' values match] ********
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [upgrade : SUCCESS: 'Latest checkpoint location' values match on all cluster nodes] ***
ok: [10.172.2.20] => {
    "msg": "'Latest checkpoint location' is the same on the leader and its standbys"
}

TASK [Upgrade Primary] *********************************************************

TASK [upgrade : Upgrade the PostgreSQL to version 16 on the Primary (using pg_upgrade --link)] ***
changed: [10.172.2.20]

TASK [upgrade : Print the result of the pg_upgrade] ****************************
ok: [10.172.2.20] => {
    "msg": [
        [
            "Performing Consistency Checks",
            "-----------------------------",
            "Checking cluster versions                                     ok",
            "Checking database user is the install user                    ok",
            "Checking database connection settings                         ok",
            "Checking for prepared transactions                            ok",
            "Checking for system-defined composite types in user tables    ok",
            "Checking for reg* data types in user tables                   ok",
            "Checking for contrib/isn with bigint-passing mismatch         ok",
            "Checking for incompatible \"aclitem\" data type in user tables  ok",
            "Creating dump of global objects                               ok",
            "Creating dump of database schemas                             ok",
            "Checking for presence of required libraries                   ok",
            "Checking database user is the install user                    ok",
            "Checking for prepared transactions                            ok",
            "Checking for new cluster tablespace directories               ok",
            "",
            "If pg_upgrade fails after this point, you must re-initdb the",
            "new cluster before continuing.",
            "",
            "Performing Upgrade",
            "------------------",
            "Setting locale and encoding for new cluster                   ok",
            "Analyzing all rows in the new cluster                         ok",
            "Freezing all rows in the new cluster                          ok",
            "Deleting files from new pg_xact                               ok",
            "Copying old pg_xact to new server                             ok",
            "Setting oldest XID for new cluster                            ok",
            "Setting next transaction ID and epoch for new cluster         ok",
            "Deleting files from new pg_multixact/offsets                  ok",
            "Copying old pg_multixact/offsets to new server                ok",
            "Deleting files from new pg_multixact/members                  ok",
            "Copying old pg_multixact/members to new server                ok",
            "Setting next multixact ID and offset for new cluster          ok",
            "Resetting WAL archives                                        ok",
            "Setting frozenxid and minmxid counters in new cluster         ok",
            "Restoring global objects in the new cluster                   ok",
            "Restoring database schemas in the new cluster                 ok",
            "Adding \".old\" suffix to old global/pg_control                 ok",
            "",
            "If you want to start the old cluster, you will need to remove",
            "the \".old\" suffix from /pgdata/14/main/global/pg_control.old.",
            "Because \"link\" mode was used, the old cluster cannot be safely",
            "started once the new cluster has been started.",
            "Linking user relation files                                   ok",
            "Setting next OID for new cluster                              ok",
            "Sync data directory to disk                                   ok",
            "Creating script to delete old cluster                         ok",
            "Checking for extension updates                                ok",
            "",
            "Upgrade Complete",
            "----------------",
            "Optimizer statistics are not transferred by pg_upgrade.",
            "Once you start the new server, consider running:",
            "    /usr/pgsql-16/bin/vacuumdb -U postgres --all --analyze-in-stages",
            "Running this script will delete the old cluster's data files:",
            "    ./delete_old_cluster.sh"
        ]
    ]
}

TASK [Upgrade Secondary] *******************************************************

TASK [upgrade : Make sure that the new data directory "/pgdata/16/main" are empty on the Replica] ***
changed: [10.172.2.22] => (item=absent)
changed: [10.172.2.21] => (item=absent)
changed: [10.172.2.22] => (item=directory)
changed: [10.172.2.21] => (item=directory)

TASK [upgrade : Upgrade the PostgreSQL on the Replica (using rsync --hard-links)] ***
changed: [10.172.2.20]
FAILED - RETRYING: [10.172.2.20]: Wait for the rsync to complete. (1800 retries left).

TASK [upgrade : Wait for the rsync to complete.] *******************************
changed: [10.172.2.20]

TASK [Create WAL dir symlink] **************************************************

TASK [upgrade : Make sure /pgdata/16/main/pg_wal is not symlink] ***************
ok: [10.172.2.22]
ok: [10.172.2.20]
ok: [10.172.2.21]

TASK [upgrade : Make sure the custom WAL directory "/pgwal/16/pg_wal" exists and is empty] ***
ok: [10.172.2.20] => (item=absent)
ok: [10.172.2.22] => (item=absent)
ok: [10.172.2.21] => (item=absent)
changed: [10.172.2.20] => (item=directory)
changed: [10.172.2.21] => (item=directory)
changed: [10.172.2.22] => (item=directory)

TASK [upgrade : Synchronize /pgdata/16/main/pg_wal to /pgwal/16/pg_wal] ********
changed: [10.172.2.20]
changed: [10.172.2.21]
changed: [10.172.2.22]

TASK [upgrade : Rename pg_wal to pg_wal_old] ***********************************
changed: [10.172.2.20]
changed: [10.172.2.22]
changed: [10.172.2.21]

TASK [upgrade : Create symlink /pgdata/16/main/pg_wal -> /pgwal/16/pg_wal] *****
changed: [10.172.2.20]
changed: [10.172.2.22]
changed: [10.172.2.21]

TASK [upgrade : Remove pg_wal_old directory] ***********************************
changed: [10.172.2.22]
changed: [10.172.2.20]
changed: [10.172.2.21]

TASK [Remove old cluster from DCS] *********************************************

TASK [upgrade : Remove existing cluster "postgres-cluster" from DCS] ***********
changed: [10.172.2.20]

TASK [Start Services] **********************************************************

TASK [upgrade : Start Patroni service on the Cluster Leader] *******************
changed: [10.172.2.20]

TASK [upgrade : Wait for Patroni port "8008" to become open on the host] *******
ok: [10.172.2.20]

TASK [upgrade : Check Patroni is healthy on the Leader] ************************
ok: [10.172.2.20]

TASK [upgrade : Perform RESUME PgBouncer pools on the Leader] ******************
included: /workspace/postgresql_cluster/roles/upgrade/tasks/pgbouncer_resume.yml for 10.172.2.20

TASK [upgrade : RESUME PgBouncer pools] ****************************************
changed: [10.172.2.20]

TASK [upgrade : Start Patroni service on the Cluster Replica] ******************
changed: [10.172.2.21]
changed: [10.172.2.22]

TASK [upgrade : Wait for Patroni port "8008" to become open on the host] *******
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [upgrade : Check Patroni is healthy on the Replica] ***********************
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [upgrade : Perform RESUME PgBouncer pools on the Replica] *****************
included: /workspace/postgresql_cluster/roles/upgrade/tasks/pgbouncer_resume.yml for 10.172.2.21, 10.172.2.22

TASK [upgrade : RESUME PgBouncer pools] ****************************************
changed: [10.172.2.22]
changed: [10.172.2.21]

TASK [upgrade : Check PostgreSQL is started and accepting connections] *********
ok: [10.172.2.22]
ok: [10.172.2.21]
ok: [10.172.2.20]

TASK [Disable maintenance mode] ************************************************

TASK [upgrade : Update haproxy conf file (enable http-checks)] *****************
changed: [10.172.2.20]
changed: [10.172.2.21]
changed: [10.172.2.22]

TASK [upgrade : Reload haproxy service] ****************************************
changed: [10.172.2.20]
changed: [10.172.2.21]
changed: [10.172.2.22]
fatal: [10.172.2.21]: FAILED! => {"changed": false, "msg": "Could not find the requested service confd: host"}
fatal: [10.172.2.20]: FAILED! => {"changed": false, "msg": "Could not find the requested service confd: host"}
fatal: [10.172.2.22]: FAILED! => {"changed": false, "msg": "Could not find the requested service confd: host"}

TASK [upgrade : Start confd service] *******************************************
...ignoring
...ignoring
...ignoring

PLAY [(6/7) POST-UPGRADE: Perform Post-Checks and Update extensions] ***********

TASK [Include main variables] **************************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include OS-specific variables] *******************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include upgrade variables] ***********************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Running Post-Checks] *****************************************************
FAILED - RETRYING: [10.172.2.20]: Make sure that physical replication is active (30 retries left).
FAILED - RETRYING: [10.172.2.20]: Make sure that physical replication is active (29 retries left).

TASK [upgrade : Make sure that physical replication is active] *****************
ok: [10.172.2.20]

TASK [upgrade : Create a table "test_replication" with 10000 rows on the Primary] ***
changed: [10.172.2.20]

TASK [upgrade : Wait until the PostgreSQL replica is synchronized] *************
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [upgrade : Drop a table "test_replication"] *******************************
changed: [10.172.2.20]

TASK [upgrade : Print the result of checking the number of records] ************
ok: [10.172.2.21] => {
    "msg": [
        "The PostgreSQL Replication is OK for replica pgnode02",
        "The number of records in the test_replication table the same as the Primary (10000 rows)"
    ]
}
ok: [10.172.2.22] => {
    "msg": [
        "The PostgreSQL Replication is OK for replica pgnode03",
        "The number of records in the test_replication table the same as the Primary (10000 rows)"
    ]
}

TASK [Update extensions] *******************************************************

TASK [upgrade : Get a list of databases] ***************************************
ok: [10.172.2.20]

TASK [upgrade : Update extensions in each database] ****************************
included: /workspace/postgresql_cluster/roles/upgrade/tasks/update_extensions.yml for 10.172.2.20 => (item=template1)
included: /workspace/postgresql_cluster/roles/upgrade/tasks/update_extensions.yml for 10.172.2.20 => (item=postgres)

TASK [upgrade : Get list of installed PostgreSQL extensions (database: template1)] ***
ok: [10.172.2.20]

TASK [upgrade : Get list of old PostgreSQL extensions (database: template1)] ***
ok: [10.172.2.20]

TASK [upgrade : The extensions are up-to-date (database: template1)] ***********
ok: [10.172.2.20] => {
    "msg": [
        "The extension versions are up-to-date for the database template1",
        "No update is required."
    ]
}

TASK [upgrade : Get list of installed PostgreSQL extensions (database: postgres)] ***
ok: [10.172.2.20]

TASK [upgrade : Get list of old PostgreSQL extensions (database: postgres)] ****
ok: [10.172.2.20]

TASK [upgrade : The extensions are up-to-date (database: postgres)] ************
ok: [10.172.2.20] => {
    "msg": [
        "The extension versions are up-to-date for the database postgres",
        "No update is required."
    ]
}

PLAY [(7/7) POST-UPGRADE: Analyze a PostgreSQL database (update optimizer statistics) and Post-Upgrade tasks] ***

TASK [Include main variables] **************************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include OS-specific variables] *******************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Include upgrade variables] ***********************************************
ok: [10.172.2.20]
ok: [10.172.2.21]
ok: [10.172.2.22]

TASK [Analyze database] ********************************************************

TASK [upgrade : Start pg_terminator script: Monitor locks and terminate the 'ANALYZE' blockers] ***
changed: [10.172.2.20]

TASK [upgrade : Run vacuumdb to analyze the PostgreSQL databases] **************
changed: [10.172.2.20]

TASK [upgrade : Collecting statistics in progress. Wait for the analyze to complete.] ***
changed: [10.172.2.20]

TASK [upgrade : Stop pg_terminator script] *************************************
changed: [10.172.2.20]

TASK [Running Post-Upgrade tasks] **********************************************

TASK [upgrade : Ensure the current data directory is the new data directory] ***
ok: [10.172.2.22]
ok: [10.172.2.21]
ok: [10.172.2.20]

TASK [upgrade : Delete the old PostgreSQL data directory] **********************
changed: [10.172.2.21]
changed: [10.172.2.20]
changed: [10.172.2.22]

TASK [upgrade : Delete the old PostgreSQL WAL directory] ***********************
changed: [10.172.2.21]
changed: [10.172.2.22]
changed: [10.172.2.20]

TASK [upgrade : Remove old PostgreSQL packages] ********************************
changed: [10.172.2.22] => (item=postgresql14)
changed: [10.172.2.20] => (item=postgresql14)
changed: [10.172.2.21] => (item=postgresql14)
ok: [10.172.2.20] => (item=postgresql14-server)
ok: [10.172.2.21] => (item=postgresql14-server)
ok: [10.172.2.22] => (item=postgresql14-server)
ok: [10.172.2.21] => (item=postgresql14-contrib)
ok: [10.172.2.22] => (item=postgresql14-contrib)
ok: [10.172.2.20] => (item=postgresql14-contrib)

TASK [upgrade : Check the Patroni cluster state] *******************************
ok: [10.172.2.20]

TASK [upgrade : Check the current PostgreSQL version] **************************
ok: [10.172.2.20]

TASK [upgrade : List the Patroni cluster members] ******************************
ok: [10.172.2.20] => {
    "msg": [
        "+ Cluster: postgres-cluster (7314995127741163698) --+-----------+",
        "| Member   | Host        | Role    | State     | TL | Lag in MB |",
        "+----------+-------------+---------+-----------+----+-----------+",
        "| pgnode01 | 10.172.2.20 | Leader  | running   |  2 |           |",
        "| pgnode02 | 10.172.2.21 | Replica | streaming |  2 |         0 |",
        "| pgnode03 | 10.172.2.22 | Replica | streaming |  2 |         0 |",
        "+----------+-------------+---------+-----------+----+-----------+"
    ]
}

TASK [upgrade : Upgrade completed] *********************************************
ok: [10.172.2.20] => {
    "msg": [
        "PostgreSQL upgrade completed.",
        "Current version: 16.1"
    ]
}

PLAY RECAP *********************************************************************
10.172.2.20                : ok=335  changed=143  unreachable=0    failed=0    skipped=647  rescued=0    ignored=2   
10.172.2.21                : ok=242  changed=110  unreachable=0    failed=0    skipped=634  rescued=0    ignored=2   
10.172.2.22                : ok=242  changed=110  unreachable=0    failed=0    skipped=634  rescued=0    ignored=2   

passed

@vitabaks vitabaks merged commit 32bbceb into master Dec 21, 2023
17 checks passed
@vitabaks vitabaks deleted the pgbouncer-socket-dir-rights branch December 21, 2023 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant