Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-apiserver crashes during pgbackrest backups #1539

Open
iohenkies opened this issue Jul 13, 2023 · 0 comments
Open

kube-apiserver crashes during pgbackrest backups #1539

iohenkies opened this issue Jul 13, 2023 · 0 comments

Comments

@iohenkies
Copy link

Hi all,

Originally I posted this at pgbackrest/pgbackrest#2118 but was advised to give it a go here. So hopefully you have any idea? :)

We've got a 85 node cluster running all sorts of stuff. Control planes and etcd are separated from our workers and from each other, so all separate nodes. Then we have for instance Elasticsearch on a separate nodepool, a lot of workers for all kinds of apps, and our Postgres databases on a separate nodepool. These are 11 nodes with 8vCPU and 32GB mem each.

At 2am and 6am about 60 pgbackrest backups are started. This often, but not always, makes our kube-apiserver containers on our control planes crash. This is very strange to us, because why would pgbackrest cause such a constraint on the apiserver? We've tried to replicate this issue by spawning 300 pods with another app at the same time, calling the apiserver, and then the kube-apiserver remains running. It only seems to be happening during these backups.

We have audit logging enabled on the kube-apiserver and up till right before the crashes, we don't see anything unusual, but then it gets too busy and crashes and we probably can't catch the very end of the logs. The only thing in the pgbackrest logs that sticks out is quite a lot of these apiserver was unable to write a JSON response: http: Handler timeout errors. Not only during crash, but also during the day.

Now, we are no database experts, our DBA colleague who was the lead in setting up Postgres is on a long sick leave, so we're hoping to make use if the expertise here! Maybe there are settings there can be tweaked? Or explained what and if pgbackrest is doing a lot of calls to the apiserver?

  1. pgBackRest version:
    pgBackRest 2.40

  2. PostgreSQL version:
    postgres (PostgreSQL) 14.5

  3. Operating system/version - if you have more than one server (for example, a database server, a repository host server, one or more standbys), please specify each:
    Kubernetes 1.24.10 on Ubuntu 20.04.5 LTS nodes

  4. Did you install pgBackRest from source or from a package?
    Installed on Kubernetes 1.24.10, running image registry.developers.crunchydata.com/crunchydata/postgres-operator:ubi8-5.2.0-0

  5. Please attach the following as applicable:
    pgbackrest conf

bash-4.4$ cat pgbackrest_instance.conf
# Generated by postgres-operator. DO NOT EDIT.
# Your changes will not be saved.

[global]
buffer-size = 2MiB
compress-type = lz4
log-path = /pgdata/pgbackrest/log
process-max = 2
repo1-path = /pgbackrest/grafana/grafana
repo1-retention-full = 2
repo1-retention-full-type = time
repo1-s3-bucket = npo
repo1-s3-endpoint = storagegrid.s3.ourdomain.com
repo1-s3-port = 443
repo1-s3-region = NL-AER-1
repo1-s3-uri-style = path
repo1-storage-ca-file = /etc/pgbackrest/conf.d/root.pem
repo1-storage-verify-tls = y
repo1-type = s3

[db]
pg1-path = /pgdata/pg14
pg1-port = 5432
pg1-socket-path = /tmp/postgres

Backup command

bash -ceu --  shopt -s globstar files=(/etc/pgbackrest/conf.d/**) for i in "${!files[@]}"; do ?[[ -f "${files[$i]}" ]] || unset -v "files[$i]" done declare -r hash="$1" local_hash="$(sha1sum "${files[@]}" | sha1sum)"  if [[ "${local_hash}" != "${hash}" ]]; then ?printf >&2 "hash %s does not match local hash %s" "${hash}" "${local_hash}"; exit 1; else ?pgbackrest backup --stanza=db --repo=1 --type=incr fi  - 725c12672026deac030f95c75a5abee7186e180a  -

Errors in log

apiserver was unable to write a JSON response: http: Handler timeout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant