Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions test/e2e/tests/weighted_backend.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ func runWeightedBackendTest(t *testing.T, suite *suite.ConformanceTestSuite, rou
suite.ControllerName,
kubernetes.NewGatewayRef(SameNamespaceGateway), weightEqualRoute)

// Make sure all test resources are ready
kubernetes.NamespacesMustBeReady(t, suite.Client, suite.TimeoutConfig, []string{ConformanceInfraNamespace})

Comment on lines +69 to +71
Copy link
Contributor Author

@jukie jukie May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohuabing @zirain I narrowed this down to being a race condition if NamespacesMustBeReady() executes prior to GatewayAndHTTPRoutesMustBeAccepted() and that's what was changed with the refactor.

The manifest apply action happens before test execution but the client cache seems to still be outdated and as a result the list actions in NamespacesMustBeReady() are empty and the intended wait action never happens.

I was able to reproduce the failure locally ~50% of the time but by moving NamespacesMustBeReady() into runWeightedBackendTest() after GatewayAndHTTPRoutesMustBeAccepted() execution it consistently suceeds.

Copy link
Member

@zirain zirain May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTY, can you explain more about this?
NamespacesMustBeReady() make sure that all the pods in the namespace should be ready, it should be more strict than GatewayAndHTTPRoutesMustBeAccepted(), why move it after solve this?
Need more time to refresh the pod list? if so, prefer to bring back

WaitForPods(t, suite.Client, ns, map[string]string{"app": "zone-aware-backend"}, corev1.PodRunning, podReady)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any real proof about the issue being client cache but that's a theory.
I suspect that by first calling GatewayAndHTTPRoutesMustBeAccepted() that implicitly ensures the proxy pods are also up and running. In the case of zone aware routing tests, this is needed due to the affinity rules.

It seems useful that for all tests we'd want to wait for all test resources to be completely ready before proceeding. Limiting to just Pods allows for other edge cases in the future so a comprehensive function like NamespacesMustBeReady() (tbf it currently just checks gateways and pods but could be expanded further) would be my preference but if you'd like to instead bring back WaitForPods() specifically for zone-aware-routing I can make that update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using same-namespace gateway, so it's always there.

Let us test with, we can bring WaitForPods() back if needed in the future.

expectedResponse := http.ExpectedResponse{
Request: http.Request{
Path: path,
Expand Down
4 changes: 0 additions & 4 deletions test/e2e/tests/zone_aware_routing.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ package tests
import (
"testing"

"sigs.k8s.io/gateway-api/conformance/utils/kubernetes"
"sigs.k8s.io/gateway-api/conformance/utils/suite"
)

Expand All @@ -24,9 +23,6 @@ var ZoneAwareRoutingTest = suite.ConformanceTest{
Manifests: []string{"testdata/zone-aware-routing.yaml"},
Test: func(t *testing.T, suite *suite.ConformanceTestSuite) {
t.Run("only local zone should get requests", func(t *testing.T) {
// Let's make sure that all the deployments are ready.
kubernetes.NamespacesMustBeReady(t, suite.Client, suite.TimeoutConfig, []string{"gateway-conformance-infra"})

// Pods from the backend-local deployment have affinity
// for the Envoy Proxy pods so should receive all requests.
expected := map[string]int{
Expand Down