Skip to content

Conversation

hors
Copy link
Collaborator

@hors hors commented Oct 11, 2025

K8SPG-844 Powered by Pull Request Badge

CHANGE DESCRIPTION

Problem:

The test disables backups but doesn’t wait for the cluster to become ready. We should wait for readiness to confirm that all pgBackRest containers have been removed and backups are fully disabled.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PG version?
  • Does the change support oldest and newest supported Kubernetes version?

Comment on lines +119 to +173
local pod_prefix="$1"
local target_count="$2"
local namespace="${NAMESPACE}"
local max_wait_seconds=300
local check_interval=5
local elapsed_time=0

if [[ -z "$pod_prefix" || -z "$target_count" || -z "$namespace" ]]; then
echo "Error: Missing arguments." >&2
echo "Usage: wait_for_ready_containers <pod_name_prefix> <target_ready_count> <namespace>" >&2
return 1
fi

echo "Waiting for pods starting with '$pod_prefix' in namespace '$namespace' to have $target_count ready containers (Max ${max_wait_seconds}s)..."

while [[ "$elapsed_time" -lt "$max_wait_seconds" ]]; do
local target_pods
# Get pods that match the prefix AND are running
target_pods=$(kubectl get pods -n "$namespace" --field-selector=status.phase=Running --output=json | \
jq -r ".items[] | select(.metadata.name | startswith(\"$pod_prefix\")) | .metadata.name")
# If no running pods match the prefix, something might be wrong, but we'll keep waiting.
if [[ -z "$target_pods" ]]; then
echo "No running pods found with prefix '$pod_prefix'. Waiting..."
sleep "$check_interval"
elapsed_time=$((elapsed_time + check_interval))
continue
fi

local ready_count=0
local total_matches=0

# Check each pod individually
for pod_name in $target_pods; do
total_matches=$((total_matches + 1))
current_ready=$(kubectl get pod "$pod_name" -n "$namespace" -o json 2>/dev/null | \
jq '.status.containerStatuses | map(select(.ready == true)) | length')

if [[ "$current_ready" -eq "$target_count" ]]; then
ready_count=$((ready_count + 1))
fi
done

if [[ "$ready_count" -eq "$total_matches" ]]; then
echo "Success: All $total_matches pods now have $target_count ready containers."
return 0
fi

echo "Current status: $ready_count of $total_matches pods have $target_count ready containers. Waiting ${check_interval}s..."

sleep "$check_interval"
elapsed_time=$((elapsed_time + check_interval))
done

echo "Error: Timeout reached! After ${max_wait_seconds} seconds, not all pods reached $target_count ready containers." >&2
return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[shfmt] reported by reviewdog 🐶

Suggested change
local pod_prefix="$1"
local target_count="$2"
local namespace="${NAMESPACE}"
local max_wait_seconds=300
local check_interval=5
local elapsed_time=0
if [[ -z "$pod_prefix" || -z "$target_count" || -z "$namespace" ]]; then
echo "Error: Missing arguments." >&2
echo "Usage: wait_for_ready_containers <pod_name_prefix> <target_ready_count> <namespace>" >&2
return 1
fi
echo "Waiting for pods starting with '$pod_prefix' in namespace '$namespace' to have $target_count ready containers (Max ${max_wait_seconds}s)..."
while [[ "$elapsed_time" -lt "$max_wait_seconds" ]]; do
local target_pods
# Get pods that match the prefix AND are running
target_pods=$(kubectl get pods -n "$namespace" --field-selector=status.phase=Running --output=json | \
jq -r ".items[] | select(.metadata.name | startswith(\"$pod_prefix\")) | .metadata.name")
# If no running pods match the prefix, something might be wrong, but we'll keep waiting.
if [[ -z "$target_pods" ]]; then
echo "No running pods found with prefix '$pod_prefix'. Waiting..."
sleep "$check_interval"
elapsed_time=$((elapsed_time + check_interval))
continue
fi
local ready_count=0
local total_matches=0
# Check each pod individually
for pod_name in $target_pods; do
total_matches=$((total_matches + 1))
current_ready=$(kubectl get pod "$pod_name" -n "$namespace" -o json 2>/dev/null | \
jq '.status.containerStatuses | map(select(.ready == true)) | length')
if [[ "$current_ready" -eq "$target_count" ]]; then
ready_count=$((ready_count + 1))
fi
done
if [[ "$ready_count" -eq "$total_matches" ]]; then
echo "Success: All $total_matches pods now have $target_count ready containers."
return 0
fi
echo "Current status: $ready_count of $total_matches pods have $target_count ready containers. Waiting ${check_interval}s..."
sleep "$check_interval"
elapsed_time=$((elapsed_time + check_interval))
done
echo "Error: Timeout reached! After ${max_wait_seconds} seconds, not all pods reached $target_count ready containers." >&2
return 1
local pod_prefix="$1"
local target_count="$2"
local namespace="${NAMESPACE}"
local max_wait_seconds=300
local check_interval=5
local elapsed_time=0
if [[ -z $pod_prefix || -z $target_count || -z $namespace ]]; then
echo "Error: Missing arguments." >&2
echo "Usage: wait_for_ready_containers <pod_name_prefix> <target_ready_count> <namespace>" >&2
return 1
fi
echo "Waiting for pods starting with '$pod_prefix' in namespace '$namespace' to have $target_count ready containers (Max ${max_wait_seconds}s)..."
while [[ $elapsed_time -lt $max_wait_seconds ]]; do
local target_pods
# Get pods that match the prefix AND are running
target_pods=$(kubectl get pods -n "$namespace" --field-selector=status.phase=Running --output=json \
| jq -r ".items[] | select(.metadata.name | startswith(\"$pod_prefix\")) | .metadata.name")
# If no running pods match the prefix, something might be wrong, but we'll keep waiting.
if [[ -z $target_pods ]]; then
echo "No running pods found with prefix '$pod_prefix'. Waiting..."
sleep "$check_interval"
elapsed_time=$((elapsed_time + check_interval))
continue
fi
local ready_count=0
local total_matches=0
# Check each pod individually
for pod_name in $target_pods; do
total_matches=$((total_matches + 1))
current_ready=$(kubectl get pod "$pod_name" -n "$namespace" -o json 2>/dev/null \
| jq '.status.containerStatuses | map(select(.ready == true)) | length')
if [[ $current_ready -eq $target_count ]]; then
ready_count=$((ready_count + 1))
fi
done
if [[ $ready_count -eq $total_matches ]]; then
echo "Success: All $total_matches pods now have $target_count ready containers."
return 0
fi
echo "Current status: $ready_count of $total_matches pods have $target_count ready containers. Waiting ${check_interval}s..."
sleep "$check_interval"
elapsed_time=$((elapsed_time + check_interval))
done
echo "Error: Timeout reached! After ${max_wait_seconds} seconds, not all pods reached $target_count ready containers." >&2
return 1

@JNKPercona
Copy link
Collaborator

Test Name Result Time
backup-enable-disable passed 00:00:00
custom-extensions passed 00:00:00
custom-tls passed 00:05:42
database-init-sql passed 00:00:00
demand-backup passed 00:00:00
finalizers passed 00:00:00
init-deploy passed 00:00:00
monitoring passed 00:00:00
monitoring-pmm3 passed 00:00:00
one-pod passed 00:00:00
operator-self-healing passed 00:09:17
pgvector-extension passed 00:00:00
pitr passed 00:00:00
scaling passed 00:00:00
scheduled-backup passed 00:00:00
self-healing passed 00:09:16
sidecars passed 00:00:00
start-from-backup passed 00:00:00
tablespaces passed 00:00:00
telemetry-transfer passed 00:00:00
upgrade-consistency passed 00:00:00
upgrade-minor passed 00:00:00
users passed 00:00:00
We run 23 out of 23 00:24:15

commit: 98d4b33
image: perconalab/percona-postgresql-operator:PR-1318-98d4b33eb

@hors hors marked this pull request as ready for review October 14, 2025 08:00
@hors hors changed the title Fix backup-enable-disable test K8SPG-844 fix backup-enable-disable test Oct 14, 2025
@hors hors merged commit 74508f2 into main Oct 14, 2025
18 checks passed
@hors hors deleted the fix_test branch October 14, 2025 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants