Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/postgresql-ha] Cloning data from primary node fails due to liveness/rediness probes #3556

Closed
Antiarchitect opened this issue Aug 30, 2020 · 4 comments · Fixed by #29948
Labels
postgresql-ha stale 15 days without activity

Comments

@Antiarchitect
Copy link
Contributor

Which chart:
bitnami/postgresql-ha 3.5.9

Describe the bug
Upscaling fails on "Cloning data from primary node..." with large db (18GB) probably due to liveness/readiness probes

To Reproduce

  1. Set replicaCount to 1
  2. Restore large db from the dump
  3. Try to upscale to 2 replicas

Expected behavior
Some mechanism to avoid this.

P.S. If I turn off liveness/readiness - all is OK and both replicas have these last strings in log:

postgresql-repmgr 14:33:15.70 INFO  ==> Starting PostgreSQL in background...
postgresql-repmgr 14:33:15.84 INFO  ==> ** Starting repmgrd **
[2020-08-30 14:33:15] [NOTICE] repmgrd (repmgrd 5.1.0) starting up
INFO:  set_repmgrd_pid(): provided pidfile is /opt/bitnami/repmgr/tmp/repmgr.pid
[2020-08-30 14:33:15] [NOTICE] starting monitoring of node "pg-ha-postgresql-1" (ID: 1001)

But when I try to turn on liveness/readiness the second pod (pg-ha-postgresql-1) have to fully resync by some reason and starts failing again due to liveness/readiness turned on again

Version of Helm and Kubernetes:

  • Output of helm version:
version.BuildInfo{Version:"v3.3.0", GitCommit:"8a4aeec08d67a7b84472007529e8097ec3742105", GitTreeState:"dirty", GoVersion:"go1.14.7"}
  • Output of kubectl version:
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:12:48Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:04:18Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Additional context
NONE

@Antiarchitect Antiarchitect changed the title [bitnami/postgresql-ha] [bitnami/postgresql-ha] Cloning data from primary node fails due to liveness/rediness probes Aug 30, 2020
@carrodher
Copy link
Member

Hi, thanks for using this bitnami chart, did you try modifying the parameters of the probe instead of disabling them? Maybe this action is taking so long and you need to increase the probes' parameters, see https://github.com/bitnami/charts/blob/master/bitnami/postgresql-ha/values.yaml#L189

Apart from that, what is the error that appears in the logs when the issue is reached? What says kubectl describe POD?

@Antiarchitect
Copy link
Contributor Author

If I tune readiness/liveness I cannot predict even nearly when 1TB database will replicate. So it's better turn them off. My current issue for now is #3563

@stale
Copy link

stale bot commented Sep 17, 2020

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@stale stale bot added the stale 15 days without activity label Sep 17, 2020
@percenuage
Copy link
Contributor

percenuage commented Oct 15, 2024

Hello, I have exactly the same issue as @Antiarchitect. The primary node has 300Gi of data, I want to have a second replica (1 -> 2) but the time of liveness is too short during data sync. Have you any solution? Changing/disabling liveness seems to be strange. In other hand, it's logic that the replica should not be live until the data are not fully synchronized.
Thanks for your help.

Which chart:
bitnami/postgresql-ha 14.2.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
postgresql-ha stale 15 days without activity
Projects
None yet
3 participants