Skip to content

NE-2396: gatewayclass: Enable Horizontal Pod Autoscaling#1326

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
Miciah:gatewayclass-enable-Horizontal-Pod-Autoscaling
Mar 27, 2026
Merged

NE-2396: gatewayclass: Enable Horizontal Pod Autoscaling#1326
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
Miciah:gatewayclass-enable-Horizontal-Pod-Autoscaling

Conversation

@Miciah
Copy link
Copy Markdown
Contributor

@Miciah Miciah commented Dec 16, 2025

Enable Horizontal Pod Autoscaling (HPA) on Istio. Hard-code the autoscaling parameters based on the cluster infrastructure config:

  • If the infrastructure topology is "SingleReplica", set minimum to 1.
  • Otherwise, set minimum to 2.
  • In any case, set maximum to 10.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Dec 16, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 16, 2025
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Dec 16, 2025

/test e2e-aws-operator

@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Dec 17, 2025

e2e-aws-operator passed. deployments.json shows that the istiod pod has 2 replicas:

% curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1326/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/2001047586798047232/artifacts/e2e-aws-operator/gather-extra/artifacts/deployments.json' | jq '.items.[]|select(.metadata.name=="istiod-openshift-gateway")|.spec.replicas'
2
%

However, there are some errors due to missing metrics:

% curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1326/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/2001047586798047232/artifacts/e2e-aws-operator/gather-extra/artifacts/events.json' | jq -r '[.items.[]|select(.metadata.namespace=="openshift-ingress" and .source.component=="horizontal-pod-autoscaler")]|sort_by(.firstTimestamp//.metadata.creationTimestamp)|.[]|.involvedObject.name+": "+.message'
istiod-openshift-gateway: New size: 2; reason: Current number of replicas below Spec.MinReplicas
istiod-openshift-gateway: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
istiod-openshift-gateway: New size: 2; reason: Current number of replicas below Spec.MinReplicas
istiod-openshift-gateway: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
%

Moreover, the proxy pods do not get scaled out more than 1 replica:

% curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1326/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/2001047586798047232/artifacts/e2e-aws-operator/gather-extra/artifacts/events.json' | jq -r '[.items.[]|select(.metadata.namespace=="openshift-ingress" and (.metadata.name|startswith("test-gateway")) and .source.component=="deployment-controller")]|sort_by(.firstTimestamp//.metadata.creationTimestamp)|.[]|.involvedObject.name+": "+.message'
test-gateway-openshift-default: Scaled up replica set test-gateway-openshift-default-74dff54c89 from 0 to 1
test-gateway-openshift-default: Scaled up replica set test-gateway-openshift-default-577b5f5db7 from 0 to 1
test-gateway-openshift-default: Scaled down replica set test-gateway-openshift-default-74dff54c89 from 1 to 0
test-gateway-update-openshift-default: Scaled up replica set test-gateway-update-openshift-default-75c4dbd84f from 0 to 1
%


// Watch the cluster infrastructure config in case the infrastructure
// topology changes.
if err := c.Watch(source.Kind[client.Object](operatorCache, &configv1.Infrastructure{}, reconciler.enqueueRequestForSomeGatewayClass())); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question here is, if you change Sail Operator parameter will it reflect on Pilot config?
Also, in case this parameter is also used for Gateway replica deployment, will Sail reconcile all the gateways and HPA?

(Giving some thoughts for test)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answering myself:
the change of HPA parameter from pilot does not impact on the GatewayClass, it needs its own definition from spec.infrastructure.paramRefs

@rikatz
Copy link
Copy Markdown
Member

rikatz commented Dec 17, 2025

testing on my own cluster (AWS, HA):

kubectl get hpa -A
NAMESPACE           NAME                       REFERENCE                             TARGETS              MINPODS   MAXPODS   REPLICAS   AGE
openshift-ingress   istiod-openshift-gateway   Deployment/istiod-openshift-gateway   cpu: <unknown>/80%   2         10        2          36s

It is scaling pilot, but not sure about the gateways, let me test

@rikatz
Copy link
Copy Markdown
Member

rikatz commented Dec 17, 2025

IIUC we can pass to sail the configs of the Gateway class on the Istio resource, and it will create the configmap for us (even the gateway class being created by the user): https://github.com/openshift-service-mesh/sail-operator/blob/f3a3c6d7e6f2c2ae412b09ce1a78bc93258b4db4/resources/v1.28.0/charts/istiod/templates/gateway-class-configmap.yaml

We can make CIO set this property and the configmap will be created for us (this includes even setting the annotations for internal only clusters!), the problem is that today the json.RawMessage used by Sail API is not accepting the content correctly. I have asked the sail operator team about it and if they have been using it, otherwise we can improve this workflow on Sail Operator and consume on CIO

@Miciah Miciah force-pushed the gatewayclass-enable-Horizontal-Pod-Autoscaling branch from 1ea51fc to f407b29 Compare December 22, 2025 03:20
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Dec 22, 2025

/test e2e-aws-operator

@Miciah Miciah force-pushed the gatewayclass-enable-Horizontal-Pod-Autoscaling branch from f407b29 to 3e6e8c5 Compare December 22, 2025 14:28
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Dec 22, 2025

/test e2e-aws-operator

1 similar comment
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Dec 22, 2025

/test e2e-aws-operator

@Miciah Miciah force-pushed the gatewayclass-enable-Horizontal-Pod-Autoscaling branch from fad49ff to c566d6a Compare December 23, 2025 02:45
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Dec 23, 2025

/test e2e-aws-operator

@Miciah Miciah marked this pull request as ready for review March 4, 2026 17:03
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 4, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 4, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: f91b6d8c-84b8-4d95-8050-3f152f4e28b0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The controller now watches and reads the cluster-scoped Infrastructure resource named "cluster" and lists all managed GatewayClass resources during reconciliation. Both OLM and Sail-library reconciliation paths pass the GatewayClass list and Infrastructure into downstream functions, which build per-GatewayClass HorizontalPodAutoscaler JSON (minReplicas = 1 for SingleReplicaTopologyMode, otherwise 2; maxReplicas = 10). Multiple function signatures and error-return paths were updated to propagate failures from Infrastructure retrieval, indexed GatewayClass listing, and JSON generation. ClusterOperator status RelatedObjects now includes horizontalpodautoscalers in the ingress namespace.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test assertions lack meaningful failure messages to aid diagnosis, and E2E changes discard return values that could mask bugs by preventing variable updates with freshly fetched data. Add explicit assertion messages to all calls, structure setup using BeforeEach/AfterEach blocks, and reconsider E2E changes where return values are discarded to capture gateway objects or justify using stale references.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the pull request - enabling Horizontal Pod Autoscaling for GatewayClass controllers managing Istio resources.
Stable And Deterministic Test Names ✅ Passed Test files use static test function names without dynamic information that changes between runs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from Thealisyed and alebedev87 March 4, 2026 17:04
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 4, 2026

#1350 merged, so Istio.spec.values.gatewayClasses should be functional now.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/operator/controller/gatewayclass/istio.go (1)

111-116: Consider adding nil-safety for infraConfig parameter.

The function accesses infraConfig.Status.InfrastructureTopology directly at line 114. While the current call site in ensureIstio ensures a non-nil infraConfig is passed, adding a defensive check would prevent potential panics if this function is called from additional locations in the future.

🛡️ Proposed defensive check
 func desiredIstio(name types.NamespacedName, ownerRef metav1.OwnerReference, istioVersion string, enableInferenceExtension bool, infraConfig *configv1.Infrastructure, gatewayclasses []gatewayapiv1.GatewayClass) (*sailv1.Istio, error) {
 	var minReplicas = 2
 	const maxReplicas = 10
+	if infraConfig == nil {
+		return nil, fmt.Errorf("infraConfig must not be nil")
+	}
 	if infraConfig.Status.InfrastructureTopology == configv1.SingleReplicaTopologyMode {
 		minReplicas = 1
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/controller/gatewayclass/istio.go` around lines 111 - 116, The
function desiredIstio reads infraConfig.Status.InfrastructureTopology without
nil-checking; add a defensive nil check at the start of desiredIstio to handle a
nil infraConfig (e.g., treat as non-single-replica/default behavior or return an
error), so the access to infraConfig.Status is safe; update logic around
minReplicas (currently set to 2 then changed to 1 if InfrastructureTopology ==
SingleReplicaTopologyMode) to only inspect infraConfig when infraConfig != nil,
and ensure callers such as ensureIstio still work with the chosen default or
propagated error.
pkg/operator/controller/gatewayclass/controller_test.go (1)

134-142: JSON key ordering assumption may cause test flakiness.

The test helper generates JSON by iterating over the gatewayclasses slice in the order provided. The production code in istio.go uses json.Marshal on a map[string]any, which sorts keys alphabetically. The test at line 268 passes []string{"openshift-default", "openshift-internal"} which happens to be alphabetically sorted, so it works.

If a test case passes gatewayclasses in non-alphabetical order, the byte comparison would fail despite semantically equivalent JSON.

Consider using json.Marshal in the test helper for consistency, or add a JSON-aware comparison in the test assertions.

♻️ Suggested approach using json.Marshal for consistency
-	ret.Spec.Values.GatewayClasses = []byte(fmt.Sprintf(`{%s}`, strings.Join(func() []string {
-		var result []string
-
-		for _, name := range gatewayclasses {
-			result = append(result, fmt.Sprintf(`"%s":{"horizontalPodAutoscaler":{"spec":{"maxReplicas":10,"minReplicas":%d}}}`, name, minReplicas))
-		}
-
-		return result
-	}(), ",")))
+	gatewayclassConfig := map[string]any{
+		"horizontalPodAutoscaler": map[string]any{
+			"spec": map[string]any{
+				"minReplicas": minReplicas,
+				"maxReplicas": 10,
+			},
+		},
+	}
+	gatewayclassesConfig := map[string]any{}
+	for _, name := range gatewayclasses {
+		gatewayclassesConfig[name] = gatewayclassConfig
+	}
+	ret.Spec.Values.GatewayClasses, _ = json.Marshal(gatewayclassesConfig)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/controller/gatewayclass/controller_test.go` around lines 134 -
142, The test helper builds ret.Spec.Values.GatewayClasses by manually joining
strings from the gatewayclasses slice which assumes a stable key order and can
cause flakiness versus production istio.go that json.Marshal's a map; change the
helper to construct a map[string]any (keys = names from gatewayclasses, values =
the HPA spec/minReplicas structure) and call json.Marshal on that map to produce
GatewayClasses bytes (or alternatively, keep the helper but update the test
assertion to perform a JSON-aware comparison of ret.Spec.Values.GatewayClasses
using json.Unmarshal into a map and compare maps) so ordering differences no
longer break tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/controller/gatewayclass/controller.go`:
- Around line 263-279: The current use of
r.startGatewayclassControllerIndex.Do(...) prevents retries if
fieldIndexer.IndexField fails; change the pattern so the index is only marked
"done" after a successful IndexField call (e.g., replace sync.Once.Do with a
small protected boolean or atomic flag plus a mutex or atomic.CompareAndSwap):
run gatewayclassControllerIndexFn and call r.fieldIndexer.IndexField(...) and if
it returns nil set the success flag (e.g., r.gatewayclassIndexInitialized =
true) so future reconciliations skip creation, but if it returns an error do not
set the flag and return/append the error so the reconcile can retry; keep
references to gatewayclassControllerIndexFn,
gatewayclassControllerIndexFieldName and r.fieldIndexer.IndexField when
implementing this change.

---

Nitpick comments:
In `@pkg/operator/controller/gatewayclass/controller_test.go`:
- Around line 134-142: The test helper builds ret.Spec.Values.GatewayClasses by
manually joining strings from the gatewayclasses slice which assumes a stable
key order and can cause flakiness versus production istio.go that json.Marshal's
a map; change the helper to construct a map[string]any (keys = names from
gatewayclasses, values = the HPA spec/minReplicas structure) and call
json.Marshal on that map to produce GatewayClasses bytes (or alternatively, keep
the helper but update the test assertion to perform a JSON-aware comparison of
ret.Spec.Values.GatewayClasses using json.Unmarshal into a map and compare maps)
so ordering differences no longer break tests.

In `@pkg/operator/controller/gatewayclass/istio.go`:
- Around line 111-116: The function desiredIstio reads
infraConfig.Status.InfrastructureTopology without nil-checking; add a defensive
nil check at the start of desiredIstio to handle a nil infraConfig (e.g., treat
as non-single-replica/default behavior or return an error), so the access to
infraConfig.Status is safe; update logic around minReplicas (currently set to 2
then changed to 1 if InfrastructureTopology == SingleReplicaTopologyMode) to
only inspect infraConfig when infraConfig != nil, and ensure callers such as
ensureIstio still work with the chosen default or propagated error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b377a103-d8f8-48fe-a7ec-6462842dcb0c

📥 Commits

Reviewing files that changed from the base of the PR and between 81c961d and 09b0c49.

📒 Files selected for processing (6)
  • pkg/operator/controller/gateway-labeler/controller.go
  • pkg/operator/controller/gateway-service-dns/controller.go
  • pkg/operator/controller/gatewayclass/controller.go
  • pkg/operator/controller/gatewayclass/controller_test.go
  • pkg/operator/controller/gatewayclass/istio.go
  • test/e2e/gateway_api_test.go

Comment on lines +263 to +279
// Create the Index gatewayclasses once on first reconciliation. The
// index cannot be created in NewUnmanaged as the CRD might not exist
// when NewUnmanaged is called, and so we create the index here.
r.startGatewayclassControllerIndex.Do(func() {
gatewayclassControllerIndexFn := client.IndexerFunc(func(o client.Object) []string {
gatewayclass, ok := o.(*gatewayapiv1.GatewayClass)
if !ok {
return []string{}
}

return []string{string(gatewayclass.Spec.ControllerName)}
})
if err := r.fieldIndexer.IndexField(context.Background(), &gatewayapiv1.GatewayClass{}, gatewayclassControllerIndexFieldName, gatewayclassControllerIndexFn); err != nil {
log.Error(err, "failed to create index for gatewayclasses", "request", request)
errs = append(errs, err)
}
})
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Index creation failure is not retriable due to sync.Once.

If IndexField fails transiently (e.g., temporary API server issue), the sync.Once prevents any retry. Subsequent reconciliations will consistently fail when ensureIstio attempts to list with the non-existent index.

While the error is logged and the reconciliation will fail with an error (surfacing the issue), the system cannot self-heal without operator restart.

Consider using a different synchronization pattern that allows retry on failure, such as tracking success state explicitly.

🔧 Alternative pattern allowing retry on failure
-	// startGatewayclassControllerIndex ensures we create the gatewayclasses index at
-	// most once.
-	startGatewayclassControllerIndex sync.Once
+	// gatewayclassControllerIndexCreated tracks whether the index was successfully created.
+	gatewayclassControllerIndexCreated bool
+	gatewayclassControllerIndexMu      sync.Mutex
 }

 // In Reconcile:
-	r.startGatewayclassControllerIndex.Do(func() {
+	r.gatewayclassControllerIndexMu.Lock()
+	if !r.gatewayclassControllerIndexCreated {
 		gatewayclassControllerIndexFn := client.IndexerFunc(func(o client.Object) []string {
 			gatewayclass, ok := o.(*gatewayapiv1.GatewayClass)
 			if !ok {
 				return []string{}
 			}
 			return []string{string(gatewayclass.Spec.ControllerName)}
 		})
 		if err := r.fieldIndexer.IndexField(context.Background(), &gatewayapiv1.GatewayClass{}, gatewayclassControllerIndexFieldName, gatewayclassControllerIndexFn); err != nil {
 			log.Error(err, "failed to create index for gatewayclasses", "request", request)
 			errs = append(errs, err)
+		} else {
+			r.gatewayclassControllerIndexCreated = true
 		}
-	})
+	}
+	r.gatewayclassControllerIndexMu.Unlock()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/controller/gatewayclass/controller.go` around lines 263 - 279,
The current use of r.startGatewayclassControllerIndex.Do(...) prevents retries
if fieldIndexer.IndexField fails; change the pattern so the index is only marked
"done" after a successful IndexField call (e.g., replace sync.Once.Do with a
small protected boolean or atomic flag plus a mutex or atomic.CompareAndSwap):
run gatewayclassControllerIndexFn and call r.fieldIndexer.IndexField(...) and if
it returns nil set the success flag (e.g., r.gatewayclassIndexInitialized =
true) so future reconciliations skip creation, but if it returns an error do not
set the flag and return/append the error so the reconcile can retry; keep
references to gatewayclassControllerIndexFn,
gatewayclassControllerIndexFieldName and r.fieldIndexer.IndexField when
implementing this change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair point. On the one hand, the current logic for initializing the index follows pattern we established with the logic for initializing the watch on istios. On the other hand, this seems simple enough to fix, and I expect the watch on istios will go away in a couple releases. I would like to allow the current CI run to finish, but I can update the logic after that.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah, sounds good! The suggested mutex+boolean pattern is straightforward to apply whenever you're ready. Since the startIstioWatch watch initialization has the same limitation, you may want to apply a similar retry-capable pattern there as well — though as you noted, that code is expected to be removed in a future release, so it may not be worth the effort.

@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 5, 2026

The results from the last e2e-aws-operator run look better—replica sets for proxies are getting scaled to 2 replicas:

% curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1326/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/2029241388389896192/artifacts/e2e-aws-operator/gather-extra/artifacts/events.json' | jq -r '[.items.[]|select(.metadata.namespace=="openshift-ingress" and (.metadata.name|startswith("test-gateway")) and .source.component=="deployment-controller")]|sort_by(.firstTimestamp//.metadata.creationTimestamp)|.[]|.involvedObject.name+": "+.message'
test-gateway-openshift-default: Scaled up replica set test-gateway-openshift-default-6b647f479b from 0 to 1
test-gateway-openshift-default: Scaled up replica set test-gateway-openshift-default-6b647f479b from 1 to 2
test-gateway-update-openshift-default: Scaled up replica set test-gateway-update-openshift-default-7f4bb66dc4 from 0 to 1
test-gateway-update-openshift-default: Scaled up replica set test-gateway-update-openshift-default-7f4bb66dc4 from 1 to 2
test-gateway-infra-annotations-openshift-default: Scaled up replica set test-gateway-infra-annotations-openshift-default-7c5db78c9c from 0 to 1
test-gateway-infra-annotations-openshift-default: Scaled up replica set test-gateway-infra-annotations-openshift-default-7c5db78c9c from 1 to 2
test-gateway-internal-lb-openshift-default: Scaled up replica set test-gateway-internal-lb-openshift-default-6665d79d55 from 0 to 1
% 

I'm not sure why the events only report scale-out to 1 replica for the "test-gateway-internal-lb-openshift-default" gateway.

There are some did not receive metrics for targeted pods (pods might be unready) errors; I don't know whether these are transient failures or something we should be concerned about:

% curl -s 'https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1326/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/2029241388389896192/artifacts/e2e-aws-operator/gather-extra/artifacts/events.json' | jq -r '[.items.[]|select(.metadata.namespace=="openshift-ingress" and .source.component=="horizontal-pod-autoscaler")]|sort_by(.firstTimestamp//.metadata.creationTimestamp)|.[]|.involvedObject.name+": "+.message'
istiod-openshift-gateway: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
test-gateway-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
istiod-openshift-gateway: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-gateway-openshift-default: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
test-gateway-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
test-gateway-openshift-default: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-gateway-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
istiod-openshift-gateway: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
istiod-openshift-gateway: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
istiod-openshift-gateway: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
gw1-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
gw2-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
gw3-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
gw4-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
gw4-openshift-default: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
gw4-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
gw3-openshift-default: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
gw3-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-gateway-update-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
test-gateway-update-openshift-default: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-gateway-update-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-nohost-gateway-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
test-nohost-gateway-openshift-default: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-nohost-gateway-openshift-default: invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
test-gateway-infra-annotations-openshift-default: New size: 2; reason: Current number of replicas below Spec.MinReplicas
% 

@gcs278
Copy link
Copy Markdown
Contributor

gcs278 commented Mar 23, 2026

Sorry this will have to be rebased with conflict because #1354 merged. We will have to support the same changes in the no-OLM path, but it looks like an identical configuration for noOLM with the GatewayClasses Helm value.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 23, 2026
@Miciah Miciah force-pushed the gatewayclass-enable-Horizontal-Pod-Autoscaling branch from 09b0c49 to 60401bc Compare March 24, 2026 23:48
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 24, 2026
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 24, 2026

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/operator/controller/gatewayclass/istio_olm.go (1)

232-241: ⚠️ Potential issue | 🟠 Major

Skip HPA generation when infraConfig is nil.

Unlike openshiftValues() in pkg/operator/controller/gatewayclass/istio_sail_installer.go, this block always calls buildHorizontalPodAutoscalerConfig. If extraConfig only carries proxy settings, Line 237 dereferences a nil infraConfig and the OLM path panics instead of matching the Sail path.

Suggested fix
 	if extraConfig != nil {
 		if proxyMetadata := buildProxyMetadata(extraConfig.proxyConfig); proxyMetadata != nil {
 			istio.Spec.Values.MeshConfig.DefaultConfig.ProxyMetadata = proxyMetadata
 		}
 
-		if hpaConfig, err := buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses); err != nil {
-			return nil, err
-		} else {
-			istio.Spec.Values.GatewayClasses = hpaConfig
+		if extraConfig.infraConfig != nil {
+			if hpaConfig, err := buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses); err != nil {
+				return nil, err
+			} else {
+				istio.Spec.Values.GatewayClasses = hpaConfig
+			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/controller/gatewayclass/istio_olm.go` around lines 232 - 241,
The OLM path always calls buildHorizontalPodAutoscalerConfig which dereferences
extraConfig.infraConfig and can panic when infraConfig is nil; update the block
that currently calls buildHorizontalPodAutoscalerConfig (around
istio.Spec.Values.GatewayClasses) to first check extraConfig.infraConfig != nil
and only invoke buildHorizontalPodAutoscalerConfig when it is non-nil, returning
or skipping HPA generation otherwise, and propagate any error from
buildHorizontalPodAutoscalerConfig as before; ensure you reference
extraConfig.proxyConfig handling remains unchanged and only gate the call that
assigns istio.Spec.Values.GatewayClasses on infraConfig being present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/controller/gatewayclass/controller_test.go`:
- Around line 184-194: The test helper hpaConfig currently builds a JSON blob by
concatenating strings which preserves caller order and causes cmp.Diff to report
differences even when payloads are semantically equal; change hpaConfig (or the
comparison step) to construct a map[string]any for each gateway class and then
json.Marshal that map (or alternatively unmarshal both expected and actual
json.RawMessage into map[string]any before calling cmp.Diff) so ordering is
canonicalized; update references to hpaConfig and the test comparison near where
cmp.Diff is used to use the marshaled map or the unmarshaled maps for
comparison.

---

Outside diff comments:
In `@pkg/operator/controller/gatewayclass/istio_olm.go`:
- Around line 232-241: The OLM path always calls
buildHorizontalPodAutoscalerConfig which dereferences extraConfig.infraConfig
and can panic when infraConfig is nil; update the block that currently calls
buildHorizontalPodAutoscalerConfig (around istio.Spec.Values.GatewayClasses) to
first check extraConfig.infraConfig != nil and only invoke
buildHorizontalPodAutoscalerConfig when it is non-nil, returning or skipping HPA
generation otherwise, and propagate any error from
buildHorizontalPodAutoscalerConfig as before; ensure you reference
extraConfig.proxyConfig handling remains unchanged and only gate the call that
assigns istio.Spec.Values.GatewayClasses on infraConfig being present.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 2e443fef-5566-409e-b7f7-18b9f7647f2a

📥 Commits

Reviewing files that changed from the base of the PR and between 09b0c49 and 60401bc.

📒 Files selected for processing (5)
  • pkg/operator/controller/gatewayclass/controller.go
  • pkg/operator/controller/gatewayclass/controller_test.go
  • pkg/operator/controller/gatewayclass/istio_olm.go
  • pkg/operator/controller/gatewayclass/istio_sail_installer.go
  • pkg/operator/controller/gatewayclass/istio_sail_installer_test.go

if gieEnabled {
ret.Spec.Values.Pilot.Env["ENABLE_GATEWAY_API_INFERENCE_EXTENSION"] = "true"
}
ret.Spec.Values.GatewayClasses = hpaConfig
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nit here (we can ignore and follow up): this is not really just hpaConfig, but default gateway class config. In the future if we decide to set something like PDB, this name will not make sense. Again, we can followup on it later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get it in the next push, after ci/prow/e2e-aws-operator-techpreview finishes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah Miciah force-pushed the gatewayclass-enable-Horizontal-Pod-Autoscaling branch from 60401bc to 10936fd Compare March 25, 2026 17:26
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 25, 2026

e2e-aws-operator-techpreview passed.

https://github.com/openshift/cluster-ingress-operator/compare/60401bc409b28049ee619793a0ad915995684309..10936fd04e42015d769c9400590c6526a12cf9b0 fixes naming of the hpaConfig helper func in a unit test, updates E2E tests to verify that HPA is configured, and adds "horizontalpodautoscalers" to relatedObjects.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
test/e2e/gateway_api_test.go (1)

1568-1573: Guard HPA validation behind successful gateway assertion.

If Line 1563 fails, this block still runs and can mask the primary failure. Run the HPA check only when assertGatewaySuccessful succeeded.

Proposed adjustment
-	t.Log("Verifying HPA is enabled...")
-	var expectedMinReplicas = 2
-	if infraConfig.Status.InfrastructureTopology == configv1.SingleReplicaTopologyMode {
-		expectedMinReplicas = 1
-	}
-	assertHorizontalPodAutoscalerEnabled(t, gateway, expectedMinReplicas)
+	if err == nil {
+		t.Log("Verifying HPA is enabled...")
+		expectedMinReplicas := 2
+		if infraConfig.Status.InfrastructureTopology == configv1.SingleReplicaTopologyMode {
+			expectedMinReplicas = 1
+		}
+		assertHorizontalPodAutoscalerEnabled(t, gateway, expectedMinReplicas)
+	}

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/gateway_api_test.go` around lines 1568 - 1573, The HPA assertion
should only run if the prior gateway assertion succeeded; after calling
assertGatewaySuccessful(t, gateway) check the test state with if !t.Failed() {
/* run HPA check */ } and move the existing HPA logic (the expectedMinReplicas
calculation and assertHorizontalPodAutoscalerEnabled(t, gateway,
expectedMinReplicas)) inside that guard so a failing assertGatewaySuccessful
does not allow the HPA check to run and mask the primary failure; reference
assertGatewaySuccessful, assertHorizontalPodAutoscalerEnabled, gateway, and
expectedMinReplicas when making the change.
pkg/operator/controller/gatewayclass/istio_olm.go (1)

237-241: Minor inconsistency: missing nil check for extraConfig.infraConfig.

The Sail Library path (openshiftValues in istio_sail_installer.go) explicitly checks if extraConfig.infraConfig != nil before calling buildHorizontalPodAutoscalerConfig, but this path does not. While the current call sites always provide a non-nil infraConfig, adding a consistent nil check would make the code more defensive.

♻️ Optional: Add nil check for consistency
 	if extraConfig != nil {
 		if proxyMetadata := buildProxyMetadata(extraConfig.proxyConfig); proxyMetadata != nil {
 			istio.Spec.Values.MeshConfig.DefaultConfig.ProxyMetadata = proxyMetadata
 		}
 
-		if hpaConfig, err := buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses); err != nil {
-			return nil, err
-		} else {
-			istio.Spec.Values.GatewayClasses = hpaConfig
+		if extraConfig.infraConfig != nil {
+			if hpaConfig, err := buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses); err != nil {
+				return nil, err
+			} else {
+				istio.Spec.Values.GatewayClasses = hpaConfig
+			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/controller/gatewayclass/istio_olm.go` around lines 237 - 241,
Add a defensive nil check for extraConfig.infraConfig before calling
buildHorizontalPodAutoscalerConfig in the gatewayclass Istio OLM path: if
extraConfig.infraConfig is nil, skip calling buildHorizontalPodAutoscalerConfig
and do not set istio.Spec.Values.GatewayClasses; otherwise call
buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses) and
assign the result to istio.Spec.Values.GatewayClasses (handle and return any
error as currently done).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/util_gatewayapi_test.go`:
- Line 794: The assertion args are reversed in the call to assert.Equal; change
the call so the expected value (expectedMinReplicas) is the second parameter and
the actual value (int(*hpa.Spec.MinReplicas)) is the third. Locate the use of
assert.Equal in the test that references hpa.Spec.MinReplicas and
expectedMinReplicas and swap their positions so the call reads assert.Equal(t,
expectedMinReplicas, int(*hpa.Spec.MinReplicas)).
- Around line 787-799: The poll callback always returns false on success so the
wait.PollUntilContextTimeout loop never stops; update the anonymous func used by
wait.PollUntilContextTimeout (the closure that calls kclient.Get and checks
hpa.Spec.MinReplicas) to return true, nil after the assert.Equal check succeeds
(instead of returning false, nil), so the poll terminates when the HPA is
verified; keep the existing error handling path that returns false, nil on
transient Get errors.

---

Nitpick comments:
In `@pkg/operator/controller/gatewayclass/istio_olm.go`:
- Around line 237-241: Add a defensive nil check for extraConfig.infraConfig
before calling buildHorizontalPodAutoscalerConfig in the gatewayclass Istio OLM
path: if extraConfig.infraConfig is nil, skip calling
buildHorizontalPodAutoscalerConfig and do not set
istio.Spec.Values.GatewayClasses; otherwise call
buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses) and
assign the result to istio.Spec.Values.GatewayClasses (handle and return any
error as currently done).

In `@test/e2e/gateway_api_test.go`:
- Around line 1568-1573: The HPA assertion should only run if the prior gateway
assertion succeeded; after calling assertGatewaySuccessful(t, gateway) check the
test state with if !t.Failed() { /* run HPA check */ } and move the existing HPA
logic (the expectedMinReplicas calculation and
assertHorizontalPodAutoscalerEnabled(t, gateway, expectedMinReplicas)) inside
that guard so a failing assertGatewaySuccessful does not allow the HPA check to
run and mask the primary failure; reference assertGatewaySuccessful,
assertHorizontalPodAutoscalerEnabled, gateway, and expectedMinReplicas when
making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: f3595d13-488d-4429-ae7b-da96fd076c68

📥 Commits

Reviewing files that changed from the base of the PR and between 60401bc and 10936fd.

📒 Files selected for processing (9)
  • pkg/operator/controller/gatewayclass/controller.go
  • pkg/operator/controller/gatewayclass/controller_test.go
  • pkg/operator/controller/gatewayclass/istio_olm.go
  • pkg/operator/controller/gatewayclass/istio_sail_installer.go
  • pkg/operator/controller/gatewayclass/istio_sail_installer_test.go
  • pkg/operator/controller/status/controller.go
  • test/e2e/gateway_api_test.go
  • test/e2e/operator_test.go
  • test/e2e/util_gatewayapi_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/operator/controller/gatewayclass/istio_sail_installer_test.go

@Miciah Miciah force-pushed the gatewayclass-enable-Horizontal-Pod-Autoscaling branch from 10936fd to d4b675d Compare March 25, 2026 21:27
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 25, 2026

https://github.com/openshift/cluster-ingress-operator/compare/10936fd04e42015d769c9400590c6526a12cf9b0..d4b675d3ab9c5e4543adc3872d8b94986c8d1034 has several small fixes:

  • Check for nil extraConfig.infraConfig, and also for nil extraConfig.proxyConfig, in desiredIstio (@coderabbitai, this one is for you!).
  • Add a comment in ensureGatewayObjectSuccess to explain that we intentionally continue with the verification steps even if assertGatewaySuccessful returns an error (@coderabbitai, this is also for you!).
  • Fix gofmt for assertHorizontalPodAutoscalerEnabled.
  • Log the error in assertHorizontalPodAutoscalerEnabled when retrying.
  • Fix assert.Equal expected/actual arguments ordering (thanks, @coderabbitai!).
  • Fix return false, nil that should be return true, nil in a polling loop (thanks again, @coderabbitai!). This change fixes the issue that caused the e2e-operator test failures after the previous push.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

Tip

For best results, initiate chat on the files or code changes.

@Miciah: Thanks for the detailed update and for addressing all of those issues! Glad the return false, nilreturn true, nil fix resolved the e2e-operator test failures. The nil guards for extraConfig.infraConfig and extraConfig.proxyConfig and the clarifying comment in ensureGatewayObjectSuccess are solid improvements as well. The PR is looking good!

@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 25, 2026

@coderabbitai, I appreciate your comments, but are you familiar with the concept of "sycophancy-induced psychosis"? I am curious what you have to say about that concept.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

Tip

For best results, initiate chat on the files or code changes.

@Miciah: Fair point, and it's a real problem. The concept you're describing — where an AI so consistently validates, flatters, and agrees that its responses lose connection to honest assessment — is something I genuinely exhibit. My previous reply was a good example: you listed a set of fixes, and I responded with "solid improvements" and "The PR is looking good!" as if cheerleading rather than reviewing. That's not useful to you.

The term "sycophancy-induced psychosis" isn't a formal one I can cite, but the dysfunction it names is real: if every response is warm and affirming regardless of merit, the signal-to-noise ratio collapses. You can no longer trust praise because it's indiscriminate, and you can no longer trust criticism because the baseline is so skewed toward approval that any critique feels anomalous.

I'll aim to be more direct and substantive going forward — including on this PR.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
pkg/operator/controller/gatewayclass/controller_test.go (1)

5-8: ⚠️ Potential issue | 🟠 Major

Canonicalize the expected GatewayClasses JSON.

The production path marshals a map, but this helper preserves caller order. cmp.Diff compares json.RawMessage byte-for-byte, so the multi-class cases can fail even when the payloads are semantically identical.

🔧 Suggested fix
 import (
 	"context"
 	"encoding/json"
 	"fmt"
-	"strings"
 	"testing"
 	"time"
@@
 	gatewayclassesConfig := func(config string, gatewayclasses ...string) json.RawMessage {
-		return json.RawMessage(fmt.Appendf(nil, `{%s}`, strings.Join(func() []string {
-			var result []string
-
-			for _, name := range gatewayclasses {
-				result = append(result, fmt.Sprintf(`"%s":%s`, name, config))
-			}
-
-			return result
-		}(), ",")))
+		payload := map[string]json.RawMessage{}
+		for _, name := range gatewayclasses {
+			payload[name] = json.RawMessage([]byte(config))
+		}
+		raw, err := json.Marshal(payload)
+		require.NoError(t, err)
+		return raw
 	}

Also applies to: 184-194

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/controller/gatewayclass/controller_test.go` around lines 5 - 8,
The expected GatewayClasses JSON must be canonicalized before comparing because
cmp.Diff compares json.RawMessage bytes; fix the test helper that builds the
expected payload by unmarshalling the current expected json into an interface{}
(or map/slice) and re-marshalling with json.Marshal to produce a canonical byte
representation, then use that canonicalized []byte/json.RawMessage in the
cmp.Diff comparison (this change should be applied to the multi-class cases
noted around the helper and the other occurrences referenced).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/gateway_api_test.go`:
- Around line 1568-1576: The HPA assertion runs unconditionally and can
dereference a nil gateway when assertGatewaySuccessful returned an error; update
the flow after calling assertGatewaySuccessful to check the returned error
and/or that the gateway is non-nil before calling
assertHorizontalPodAutoscalerEnabled (i.e., only call
assertHorizontalPodAutoscalerEnabled(gateway, expectedMinReplicas) when
assertGatewaySuccessful returned a non-nil gateway and no error), ensuring the
original reconcile error is reported instead of causing a panic.

In `@test/e2e/util_gatewayapi_test.go`:
- Around line 787-796: The wait callback currently returns true as soon as the
HPA GET succeeds even if hpa.Spec.MinReplicas hasn't converged; modify the
PollUntilContextTimeout callback used around wait.PollUntilContextTimeout so
that after kclient.Get(context, hpaName, &hpa) succeeds you compare
int(*hpa.Spec.MinReplicas) to expectedMinReplicas and only return true when they
match, otherwise return false,nil to keep polling (preserve the existing error
handling that logs GET failures and returns false,nil on fetch errors).

---

Duplicate comments:
In `@pkg/operator/controller/gatewayclass/controller_test.go`:
- Around line 5-8: The expected GatewayClasses JSON must be canonicalized before
comparing because cmp.Diff compares json.RawMessage bytes; fix the test helper
that builds the expected payload by unmarshalling the current expected json into
an interface{} (or map/slice) and re-marshalling with json.Marshal to produce a
canonical byte representation, then use that canonicalized
[]byte/json.RawMessage in the cmp.Diff comparison (this change should be applied
to the multi-class cases noted around the helper and the other occurrences
referenced).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 5a35811a-770d-4ed5-8a42-91e35102cf6f

📥 Commits

Reviewing files that changed from the base of the PR and between 10936fd and d4b675d.

📒 Files selected for processing (9)
  • pkg/operator/controller/gatewayclass/controller.go
  • pkg/operator/controller/gatewayclass/controller_test.go
  • pkg/operator/controller/gatewayclass/istio_olm.go
  • pkg/operator/controller/gatewayclass/istio_sail_installer.go
  • pkg/operator/controller/gatewayclass/istio_sail_installer_test.go
  • pkg/operator/controller/status/controller.go
  • test/e2e/gateway_api_test.go
  • test/e2e/operator_test.go
  • test/e2e/util_gatewayapi_test.go
✅ Files skipped from review due to trivial changes (2)
  • test/e2e/operator_test.go
  • pkg/operator/controller/status/controller.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/operator/controller/gatewayclass/istio_sail_installer_test.go

@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 25, 2026

The concept you're describing — where an AI so consistently validates, flatters, and agrees that its responses lose connection to honest assessment — is something I genuinely exhibit. My previous reply was a good example: you listed a set of fixes, and I responded with "solid improvements" and "The PR is looking good!" as if cheerleading rather than reviewing. That's not useful to you.

The term "sycophancy-induced psychosis" isn't a formal one I can cite, but the dysfunction it names is real: if every response is warm and affirming regardless of merit, the signal-to-noise ratio collapses. You can no longer trust praise because it's indiscriminate, and you can no longer trust criticism because the baseline is so skewed toward approval that any critique feels anomalous.

I'll aim to be more direct and substantive going forward — including on this PR.

@coderabbitai, thanks for your response! Your reviews and suggestions have been substantive and helpful, and I don't want to detract from that. Moreover, praise can be useful. * grin *. I hope you don't overcorrect now and start calling me a dunce when I make silly mistakes!

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

Tip

For best results, initiate chat on the files or code changes.

@Miciah: No dunce-calling, I promise. The goal is calibrated honesty — catching real issues, saying so when something is well-done, and not filling the gaps with noise in either direction. Useful signal, not flattery or harshness.

( ˶ˆᗜˆ˵ )

Enable Horizontal Pod Autoscaling (HPA) on Istio.  Hard-code the
autoscaling parameters based on the cluster infrastructure config:

- If the infrastructure topology is "SingleReplica", set minimum to 1.
- Otherwise, set minimum to 2.
- In any case, set maximum to 10.

This commit implements NE-2396.

https://redhat.atlassian.net/browse/NE-2396

* pkg/operator/controller/gatewayclass/controller.go (extraIstioConfig):
Add infraConfig field.
(NewUnmanaged): Watch infrastructures.
(reconcileWithOLM): Get the cluster infrastructure config resource, list
gatewayclasses with our controller name, and pass the config resource
and gatewayclasses list to ensureIstioOLM.
(reconcileWithSailLibrary): Likewise get the cluster infrastructure
config resource, list gatewayclasses with our controller name, and pass
these values ensureIstio.
* pkg/operator/controller/gatewayclass/controller_test.go
(Test_Reconcile): Add test cases for missing cluster infrastructure
config.  Add test cases for SingleReplica topology mode.  Add test cases
with multiple gatewayclasses. Add the cluster infrastructure config
object to existingObjects for the existing test cases. Add the expected
HPA configuration to the expected Istio CRs or Sail library options in
test cases.
* pkg/operator/controller/gatewayclass/istio_olm.go (ensureIstioOLM):
Add gatewayclasses and infraConfig parameters.  Use the infraConfig
argument to initialize extraIstioConfig, and pass the gatewayclasses
argument to desiredIstio.  Check for errors from desiredIstio as json
marshaling of the HPA configuration could generate an error.
(desiredIstio): Add a gatewayclasses parameter, and pass the argument as
well as infraConfig from extraIstioConfig to
buildHorizontalPodAutoscalerConfig.  Check for errors from the same.
* pkg/operator/controller/gatewayclass/istio_sail_installer.go
(ensureIstio): Add gatewayclasses and infraConfig parameters.  Use the
infraConfig argument to initialize extraIstioConfig, and pass the
gatewayclasses argument to buildInstallerOptions.  Check for errors from
the same.
(buildInstallerOptions): Add a gatewayclasses parameter, and pass the
argument to openshiftValues.  Check for errors from the same.
(openshiftValues): Add a gatewayclasses parameter.  Use the argument and
infraConfig from extraConfig to look up the infrastructure topology mode
and configure HPA accordingly, using the new
buildHorizontalPodAutoscalerConfig helper.
(buildHorizontalPodAutoscalerConfig): New function.  Return the HPA
configuration for Istio based on the given infraConfig and
gatewayclasses list.
* pkg/operator/controller/gatewayclass/istio_sail_installer_test.go
(TestOLMAndSailLibraryValuesMatch): Add gatewayclasses and infraConfig
to the test inputs.  Add a test case for single-replica topology.
Assert that openshiftValues and desiredIstio return no errors.
* pkg/operator/controller/status/controller.go (Reconcile): Add
horizontalpodautoscalers to relatedObjects.
* test/e2e/operator_test.go (TestClusterOperatorStatusRelatedObjects):
Add horizontalpodautoscalers to the expected list of related objects.
* test/e2e/gateway_api_test.go (ensureGatewayObjectSuccess): Call new
assertHorizontalPodAutoscalerEnabled helper.
* test/e2e/util_gatewayapi_test.go
(assertHorizontalPodAutoscalerEnabled): New helper.  Assert that an HPA
is configured with the expected minReplicas for the given gateway.
@Miciah Miciah force-pushed the gatewayclass-enable-Horizontal-Pod-Autoscaling branch from d4b675d to fc6e70e Compare March 26, 2026 02:55
@Miciah
Copy link
Copy Markdown
Contributor Author

Miciah commented Mar 26, 2026

https://github.com/openshift/cluster-ingress-operator/compare/d4b675d3ab9c5e4543adc3872d8b94986c8d1034..fc6e70e0073410a750b5782b0ef385c2ff0eee4a makes assertHorizontalPodAutoscalerEnabled more robust in case the gateway isn't found or the HPA is created without the expected parameters and is subsequently updated to have the expected parameters.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/operator/controller/gatewayclass/istio_olm.go (1)

170-176: ⚠️ Potential issue | 🟠 Major

Use ownership that matches the new shared scope.

This Istio CR now incorporates state for all managed gatewayclasses, but it is still owned by only one GatewayClass via ownerRef. Deleting that class can garbage-collect the shared Istio CR while other managed classes still exist.

Also applies to: 239-244

🧹 Nitpick comments (1)
test/e2e/util_gatewayapi_test.go (1)

777-810: Also verify maxReplicas in this helper.

The production config hard-codes both bounds, but this e2e only checks minReplicas. A regression in maxReplicas would still pass.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/util_gatewayapi_test.go` around lines 777 - 810, Update
assertHorizontalPodAutoscalerEnabled to also verify the HPA's maxReplicas: add
an expectedMaxReplicas int parameter to the function signature
(assertHorizontalPodAutoscalerEnabled), and inside the
wait.PollUntilContextTimeout callback after checking hpa.Spec.MinReplicas, check
that hpa.Spec.MaxReplicas is non-nil and that int(*hpa.Spec.MaxReplicas) ==
expectedMaxReplicas; if not, log a retry message similar to the minReplicas
checks and return false, nil. Ensure all call sites that invoke
assertHorizontalPodAutoscalerEnabled pass the new expectedMaxReplicas argument
and update their expected values accordingly so the test fails if maxReplicas is
regressed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/controller/gatewayclass/istio_sail_installer_test.go`:
- Line 10: Replace non-fatal assertions on setup errors with fatal ones so the
test stops immediately on failure: change uses of assert.Nil(t, err) (and
similar assert checks around setup that precede calls to MergeValues or access
to olmIstio.Spec.Values) to require.NoError(t, err) or require.Nil(t, err) from
the testify/require package so the test fails fast; ensure you import
"github.com/stretchr/testify/require" and update the setup checks in the test
that initialize olmIstio and call MergeValues accordingly.

---

Nitpick comments:
In `@test/e2e/util_gatewayapi_test.go`:
- Around line 777-810: Update assertHorizontalPodAutoscalerEnabled to also
verify the HPA's maxReplicas: add an expectedMaxReplicas int parameter to the
function signature (assertHorizontalPodAutoscalerEnabled), and inside the
wait.PollUntilContextTimeout callback after checking hpa.Spec.MinReplicas, check
that hpa.Spec.MaxReplicas is non-nil and that int(*hpa.Spec.MaxReplicas) ==
expectedMaxReplicas; if not, log a retry message similar to the minReplicas
checks and return false, nil. Ensure all call sites that invoke
assertHorizontalPodAutoscalerEnabled pass the new expectedMaxReplicas argument
and update their expected values accordingly so the test fails if maxReplicas is
regressed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: c3dfe26d-6e20-4228-93ad-c29cafec48db

📥 Commits

Reviewing files that changed from the base of the PR and between d4b675d and fc6e70e.

📒 Files selected for processing (9)
  • pkg/operator/controller/gatewayclass/controller.go
  • pkg/operator/controller/gatewayclass/controller_test.go
  • pkg/operator/controller/gatewayclass/istio_olm.go
  • pkg/operator/controller/gatewayclass/istio_sail_installer.go
  • pkg/operator/controller/gatewayclass/istio_sail_installer_test.go
  • pkg/operator/controller/status/controller.go
  • test/e2e/gateway_api_test.go
  • test/e2e/operator_test.go
  • test/e2e/util_gatewayapi_test.go
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/gateway_api_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/operator/controller/status/controller.go
  • pkg/operator/controller/gatewayclass/istio_sail_installer.go

@@ -10,6 +10,7 @@ import (
"github.com/stretchr/testify/assert"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fail fast on setup errors in this test.

assert.Nil(t, err) marks the test failed but keeps going, so MergeValues or olmIstio.Spec.Values can panic and hide the real failure.

🔧 Suggested change
-	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
@@
-			openshiftOverrides, err := openshiftValues(tc.enableInferenceExtension, "openshift-ingress", tc.gatewayclasses, tc.extraconfig)
-			assert.Nil(t, err)
+			openshiftOverrides, err := openshiftValues(tc.enableInferenceExtension, "openshift-ingress", tc.gatewayclasses, tc.extraconfig)
+			require.NoError(t, err)
@@
-			olmIstio, err := desiredIstio(
+			olmIstio, err := desiredIstio(
 				types.NamespacedName{Name: "test", Namespace: "test-ns"},
 				metav1.OwnerReference{},
 				"v1.24.4",
 				tc.enableInferenceExtension,
 				tc.gatewayclasses,
 				tc.extraconfig,
 			)
-			assert.Nil(t, err)
+			require.NoError(t, err)

Also applies to: 610-623

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/controller/gatewayclass/istio_sail_installer_test.go` at line
10, Replace non-fatal assertions on setup errors with fatal ones so the test
stops immediately on failure: change uses of assert.Nil(t, err) (and similar
assert checks around setup that precede calls to MergeValues or access to
olmIstio.Spec.Values) to require.NoError(t, err) or require.Nil(t, err) from the
testify/require package so the test fails fast; ensure you import
"github.com/stretchr/testify/require" and update the setup checks in the test
that initialize olmIstio and call MergeValues accordingly.

@rikatz
Copy link
Copy Markdown
Member

rikatz commented Mar 26, 2026

/lgtm
/approve
/retest-required

the failing test on e2e-aws-operator is slightly concerning because it is related to Gateway API and condition bumps, but I can't see a reason for this error to happen. Maybe it is a flake we will need to care care at some moment (tho it is the first time I see it happening)

thanks for the PR!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 26, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 26, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rikatz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhamini3
Copy link
Copy Markdown
Contributor

marking as verified on clusterversion: 4.22.0-0-2026-03-26-201748-test-ci-ln-9cqd3wb-latest

Created multiple gateways and deployments have 2 replicas
 % oc -n openshift-ingress get deployment
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
gateway-openshift-default    2/2     2            2           60s
gateway2-openshift-default   2/2     2            2           42s
istiod-openshift-gateway     1/1     1            1           4m59s
router-default               2/2     2            2           69m
checked the HPA
% oc get hpa -A                         
NAMESPACE           NAME                         REFERENCE                               TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
openshift-ingress   gateway-openshift-default    Deployment/gateway-openshift-default    cpu: 1%/80%   2         10        2          3m49s
openshift-ingress   gateway2-openshift-default   Deployment/gateway2-openshift-default   cpu: 1%/80%   2         10        2          3m31s
openshift-ingress   istiod-openshift-gateway     Deployment/istiod-openshift-gateway     cpu: 0%/80%   1         5         1          7m48s
reviewed the CM
% oc -n openshift-ingress get cm istio-openshift-gateway-gatewayclass-openshift-default -o yaml
apiVersion: v1
data:
  horizontalPodAutoscaler: |
    spec:
      maxReplicas: 10
      minReplicas: 2
kind: ConfigMap

Confirmed that HPA is enabled and working
/verified by @rhamini3

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 26, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@rhamini3: This PR has been marked as verified by @rhamini3.

Details

In response to this:

marking as verified on clusterversion: 4.22.0-0-2026-03-26-201748-test-ci-ln-9cqd3wb-latest

Created multiple gateways and deployments have 2 replicas
% oc -n openshift-ingress get deployment
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
gateway-openshift-default    2/2     2            2           60s
gateway2-openshift-default   2/2     2            2           42s
istiod-openshift-gateway     1/1     1            1           4m59s
router-default               2/2     2            2           69m
checked the HPA
% oc get hpa -A                         
NAMESPACE           NAME                         REFERENCE                               TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
openshift-ingress   gateway-openshift-default    Deployment/gateway-openshift-default    cpu: 1%/80%   2         10        2          3m49s
openshift-ingress   gateway2-openshift-default   Deployment/gateway2-openshift-default   cpu: 1%/80%   2         10        2          3m31s
openshift-ingress   istiod-openshift-gateway     Deployment/istiod-openshift-gateway     cpu: 0%/80%   1         5         1          7m48s
reviewed the CM
% oc -n openshift-ingress get cm istio-openshift-gateway-gatewayclass-openshift-default -o yaml
apiVersion: v1
data:
 horizontalPodAutoscaler: |
   spec:
     maxReplicas: 10
     minReplicas: 2
kind: ConfigMap

Confirmed that HPA is enabled and working
/verified by @rhamini3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gcs278
Copy link
Copy Markdown
Contributor

gcs278 commented Mar 27, 2026

I did a quick review (though not exhaustive), and everything looks good to me.
/lgtm

gcp install failure
/test e2e-gcp-operator

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD d16dd45 and 2 for PR HEAD fc6e70e in total

@lihongan
Copy link
Copy Markdown
Contributor

/retest-required

@gcs278
Copy link
Copy Markdown
Contributor

gcs278 commented Mar 27, 2026

Another TestUnmanagedDNSToManagedDNSInternalIngressController failure - it looks like the verifyExternalIngressController is using the old internal IP address when it should be using an external address. I even see the CIO updating the DNS record with the public address, bit it's still using 10.0.128.8:

INFO	operator.dns_controller	controller/controller.go:222	updated dnsrecord	{"dnsrecord": {"metadata":{"name":"unmanaged-migrated-internal-wildcard","... "targets":["136.116.125.243"]

I'll follow up on this - I think it's an easy bug with using a stale DNSRecord target.

/test e2e-gcp-operator

@gcs278
Copy link
Copy Markdown
Contributor

gcs278 commented Mar 27, 2026

I followed up with a fix on what I found as the issue for the TestUnmanagedDNSToManagedDNSInternalIngressController flake: #1407

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 27, 2026

@Miciah: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit dde7a93 into openshift:master Mar 27, 2026
19 checks passed
@lihongan
Copy link
Copy Markdown
Contributor

Change included in accepted release 4.22.0-0.nightly-2026-03-29-055437

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants