Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update_pgcluster.yml: Improve error handling #578

Merged
merged 1 commit into from
Feb 17, 2024

Conversation

vitabaks
Copy link
Owner

@vitabaks vitabaks commented Feb 17, 2024

  1. Ignore errors when updating packages
    • to avoid a situation where the database is stopped, and then the playbook is stopped with an error during update packages (for example, when there are problems with dependencies), as a result of which the database remains stopped on one of the cluster servers.

Fixed:


  PLAY [update_pgcluster.yml | Update PostgreSQL HA Cluster (based on "Patroni")] ***
  
  TASK [Gathering Facts] *********************************************************
  ok: [10.172.0.22]
  ok: [10.172.0.21]
  ok: [10.172.0.20]
  
  TASK [Include main variables] **************************************************
  ok: [10.172.0.20]
  ok: [10.172.0.21]
  ok: [10.172.0.22]
  
  TASK [[Prepare] Get Patroni Cluster Leader Node] *******************************
  ok: [10.172.0.21]
  ok: [10.172.0.20]
  ok: [10.172.0.22]
  
  TASK [[Prepare] Add host to group "primary" (in-memory inventory)] *************
  ok: [10.172.0.20] => (item=10.172.0.20)
  
  TASK [[Prepare] Add hosts to group "secondary" (in-memory inventory)] **********
  ok: [10.172.0.20] => (item=10.172.0.21)
  ok: [10.172.0.20] => (item=10.172.0.22)
  
  TASK [Print Patroni Cluster info] **********************************************
  ok: [10.172.0.20] => {
      "msg": [
          "Cluster Name: postgres-cluster",
          "Cluster Leader: pgnode01"
      ]
  }
  
  PLAY [(1/4) PRE-UPDATE: Perform Pre-Checks] ************************************
  
  TASK [Include main variables] **************************************************
  ok: [10.172.0.20]
  ok: [10.172.0.21]
  ok: [10.172.0.22]
  
  TASK [Running Pre-Checks] ******************************************************
  
  TASK [update : [Pre-Check] (ALL) Test PostgreSQL DB Access] ********************
  ok: [10.172.0.20]
  ok: [10.172.0.22]
  ok: [10.172.0.21]
  
  TASK [update : [Pre-Check] Make sure that physical replication is active] ******
  ok: [10.172.0.20]
  
  TASK [update : [Pre-Check] Make sure there is no high replication lag (more than 10.00 MB)] ***
  ok: [10.172.0.20]
  
  TASK [update : [Pre-Check] Make sure there are no long-running transactions (more than 15 seconds)] ***
  ok: [10.172.0.21]
  ok: [10.172.0.20]
  ok: [10.172.0.22]
  
  PLAY [(2/4) UPDATE: Secondary] *************************************************
  
  TASK [Include main variables] **************************************************
  ok: [10.172.0.21]
  
  TASK [Include OS-specific variables] *******************************************
  ok: [10.172.0.21]
  
  TASK [Stop read-only traffic] **************************************************
  
  TASK [update : Edit patroni.yml | enable noloadbalance, nosync, nofailover] ****
  changed: [10.172.0.21] => (item=noloadbalance: true)
  changed: [10.172.0.21] => (item=nosync: true)
  changed: [10.172.0.21] => (item=nofailover: true)
  
  TASK [update : Reload patroni service] *****************************************
  changed: [10.172.0.21]
  FAILED - RETRYING: [10.172.0.21]: Make sure replica endpoint is unavailable (30 retries left).
  FAILED - RETRYING: [10.172.0.21]: Make sure replica endpoint is unavailable (29 retries left).
  
  TASK [update : Make sure replica endpoint is unavailable] **********************
  ok: [10.172.0.21]
  
  TASK [update : Wait for active transactions to complete] ***********************
  ok: [10.172.0.21]
  
  TASK [Stop Services] ***********************************************************
  
  TASK [update : Check PostgreSQL is started and accepting connections] **********
  ok: [10.172.0.21]
  
  TASK [update : Execute CHECKPOINT before stopping PostgreSQL] ******************
  changed: [10.172.0.21]
  
  TASK [update : Stop Patroni service on the Cluster Replica (pgnode02)] *********
  changed: [10.172.0.21]
  
  TASK [Update PostgreSQL] *******************************************************
  
  TASK [update : Update dnf cache] ***********************************************
  changed: [10.172.0.21]
  
  TASK [update : Install the latest version of PostgreSQL packages] **************
  ok: [10.172.0.21] => (item=postgresql16)
  ok: [10.172.0.21] => (item=postgresql16-server)
  ok: [10.172.0.21] => (item=postgresql16-contrib)
  
  TASK [Update Patroni] **********************************************************
  
  TASK [update : Install the latest version of Patroni] **************************
  ok: [10.172.0.21]
  
  TASK [Update all system packages] **********************************************
  
  TASK [update : Update dnf cache] ***********************************************
  changed: [10.172.0.21]
  fatal: [10.172.0.21]: FAILED! => {"attempts": 3, "changed": false, "failures": [], "msg": "Depsolve Error occurred: \n Problem: package iptables-legacy-1.8.8-6.el9.2.x86_64 from @System requires (iptables-libs(x86-64) = 1.8.8-6.el9 or iptables-libs(x86-64) = 1.8.8-6.el9_1), but none of the providers can be installed\n  - cannot install both iptables-libs-1.8.10-2.el9.x86_64 from baseos and iptables-libs-1.8.8-6.el9.x86_64 from @System\n  - cannot install both iptables-libs-1.8.8-6.el9.x86_64 from baseos and iptables-libs-1.8.10-2.el9.x86_64 from baseos\n  - cannot install the best update candidate for package iptables-libs-1.8.8-6.el9.x86_64\n  - cannot install the best update candidate for package iptables-legacy-1.8.8-6.el9.2.x86_64", "rc": 1, "results": []}
  FAILED - RETRYING: [10.172.0.21]: Update all system packages (3 retries left).
  FAILED - RETRYING: [10.172.0.21]: Update all system packages (2 retries left).
  FAILED - RETRYING: [10.172.0.21]: Update all system packages (1 retries left).
  
  TASK [update : Update all system packages] *************************************
  
  NO MORE HOSTS LEFT *************************************************************
  
  PLAY RECAP *********************************************************************
  10.172.0.20                : ok=241  changed=88   unreachable=0    failed=0    skipped=706  rescued=0    ignored=0
  10.172.0.21                : ok=208  changed=89   unreachable=0    failed=1    skipped=679  rescued=0    ignored=0
  10.172.0.22                : ok=195  changed=83   unreachable=0    failed=0    skipped=665  rescued=0    ignored=0
  1. Improve the error handling
    • in order to inform about update errors after completing the playbook.

@vitabaks vitabaks self-assigned this Feb 17, 2024
@vitabaks vitabaks merged commit b786100 into master Feb 17, 2024
17 checks passed
@vitabaks vitabaks deleted the update-role-error-handling branch February 17, 2024 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant