NE-2396: gatewayclass: Enable Horizontal Pod Autoscaling#1326
Conversation
|
Skipping CI for Draft Pull Request. |
|
/test e2e-aws-operator |
|
e2e-aws-operator passed. However, there are some errors due to missing metrics: Moreover, the proxy pods do not get scaled out more than 1 replica: |
|
|
||
| // Watch the cluster infrastructure config in case the infrastructure | ||
| // topology changes. | ||
| if err := c.Watch(source.Kind[client.Object](operatorCache, &configv1.Infrastructure{}, reconciler.enqueueRequestForSomeGatewayClass())); err != nil { |
There was a problem hiding this comment.
question here is, if you change Sail Operator parameter will it reflect on Pilot config?
Also, in case this parameter is also used for Gateway replica deployment, will Sail reconcile all the gateways and HPA?
(Giving some thoughts for test)
There was a problem hiding this comment.
answering myself:
the change of HPA parameter from pilot does not impact on the GatewayClass, it needs its own definition from spec.infrastructure.paramRefs
|
testing on my own cluster (AWS, HA): It is scaling pilot, but not sure about the gateways, let me test |
|
IIUC we can pass to sail the configs of the Gateway class on the Istio resource, and it will create the configmap for us (even the gateway class being created by the user): https://github.com/openshift-service-mesh/sail-operator/blob/f3a3c6d7e6f2c2ae412b09ce1a78bc93258b4db4/resources/v1.28.0/charts/istiod/templates/gateway-class-configmap.yaml We can make CIO set this property and the configmap will be created for us (this includes even setting the annotations for internal only clusters!), the problem is that today the json.RawMessage used by Sail API is not accepting the content correctly. I have asked the sail operator team about it and if they have been using it, otherwise we can improve this workflow on Sail Operator and consume on CIO |
1ea51fc to
f407b29
Compare
|
/test e2e-aws-operator |
f407b29 to
3e6e8c5
Compare
|
/test e2e-aws-operator |
1 similar comment
|
/test e2e-aws-operator |
fad49ff to
c566d6a
Compare
|
/test e2e-aws-operator |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe controller now watches and reads the cluster-scoped Infrastructure resource named "cluster" and lists all managed GatewayClass resources during reconciliation. Both OLM and Sail-library reconciliation paths pass the GatewayClass list and Infrastructure into downstream functions, which build per-GatewayClass HorizontalPodAutoscaler JSON (minReplicas = 1 for SingleReplicaTopologyMode, otherwise 2; maxReplicas = 10). Multiple function signatures and error-return paths were updated to propagate failures from Infrastructure retrieval, indexed GatewayClass listing, and JSON generation. ClusterOperator status RelatedObjects now includes horizontalpodautoscalers in the ingress namespace. 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
#1350 merged, so |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/operator/controller/gatewayclass/istio.go (1)
111-116: Consider adding nil-safety forinfraConfigparameter.The function accesses
infraConfig.Status.InfrastructureTopologydirectly at line 114. While the current call site inensureIstioensures a non-nilinfraConfigis passed, adding a defensive check would prevent potential panics if this function is called from additional locations in the future.🛡️ Proposed defensive check
func desiredIstio(name types.NamespacedName, ownerRef metav1.OwnerReference, istioVersion string, enableInferenceExtension bool, infraConfig *configv1.Infrastructure, gatewayclasses []gatewayapiv1.GatewayClass) (*sailv1.Istio, error) { var minReplicas = 2 const maxReplicas = 10 + if infraConfig == nil { + return nil, fmt.Errorf("infraConfig must not be nil") + } if infraConfig.Status.InfrastructureTopology == configv1.SingleReplicaTopologyMode { minReplicas = 1 }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/controller/gatewayclass/istio.go` around lines 111 - 116, The function desiredIstio reads infraConfig.Status.InfrastructureTopology without nil-checking; add a defensive nil check at the start of desiredIstio to handle a nil infraConfig (e.g., treat as non-single-replica/default behavior or return an error), so the access to infraConfig.Status is safe; update logic around minReplicas (currently set to 2 then changed to 1 if InfrastructureTopology == SingleReplicaTopologyMode) to only inspect infraConfig when infraConfig != nil, and ensure callers such as ensureIstio still work with the chosen default or propagated error.pkg/operator/controller/gatewayclass/controller_test.go (1)
134-142: JSON key ordering assumption may cause test flakiness.The test helper generates JSON by iterating over the
gatewayclassesslice in the order provided. The production code inistio.gousesjson.Marshalon amap[string]any, which sorts keys alphabetically. The test at line 268 passes[]string{"openshift-default", "openshift-internal"}which happens to be alphabetically sorted, so it works.If a test case passes gatewayclasses in non-alphabetical order, the byte comparison would fail despite semantically equivalent JSON.
Consider using
json.Marshalin the test helper for consistency, or add a JSON-aware comparison in the test assertions.♻️ Suggested approach using json.Marshal for consistency
- ret.Spec.Values.GatewayClasses = []byte(fmt.Sprintf(`{%s}`, strings.Join(func() []string { - var result []string - - for _, name := range gatewayclasses { - result = append(result, fmt.Sprintf(`"%s":{"horizontalPodAutoscaler":{"spec":{"maxReplicas":10,"minReplicas":%d}}}`, name, minReplicas)) - } - - return result - }(), ","))) + gatewayclassConfig := map[string]any{ + "horizontalPodAutoscaler": map[string]any{ + "spec": map[string]any{ + "minReplicas": minReplicas, + "maxReplicas": 10, + }, + }, + } + gatewayclassesConfig := map[string]any{} + for _, name := range gatewayclasses { + gatewayclassesConfig[name] = gatewayclassConfig + } + ret.Spec.Values.GatewayClasses, _ = json.Marshal(gatewayclassesConfig)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/controller/gatewayclass/controller_test.go` around lines 134 - 142, The test helper builds ret.Spec.Values.GatewayClasses by manually joining strings from the gatewayclasses slice which assumes a stable key order and can cause flakiness versus production istio.go that json.Marshal's a map; change the helper to construct a map[string]any (keys = names from gatewayclasses, values = the HPA spec/minReplicas structure) and call json.Marshal on that map to produce GatewayClasses bytes (or alternatively, keep the helper but update the test assertion to perform a JSON-aware comparison of ret.Spec.Values.GatewayClasses using json.Unmarshal into a map and compare maps) so ordering differences no longer break tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/operator/controller/gatewayclass/controller.go`:
- Around line 263-279: The current use of
r.startGatewayclassControllerIndex.Do(...) prevents retries if
fieldIndexer.IndexField fails; change the pattern so the index is only marked
"done" after a successful IndexField call (e.g., replace sync.Once.Do with a
small protected boolean or atomic flag plus a mutex or atomic.CompareAndSwap):
run gatewayclassControllerIndexFn and call r.fieldIndexer.IndexField(...) and if
it returns nil set the success flag (e.g., r.gatewayclassIndexInitialized =
true) so future reconciliations skip creation, but if it returns an error do not
set the flag and return/append the error so the reconcile can retry; keep
references to gatewayclassControllerIndexFn,
gatewayclassControllerIndexFieldName and r.fieldIndexer.IndexField when
implementing this change.
---
Nitpick comments:
In `@pkg/operator/controller/gatewayclass/controller_test.go`:
- Around line 134-142: The test helper builds ret.Spec.Values.GatewayClasses by
manually joining strings from the gatewayclasses slice which assumes a stable
key order and can cause flakiness versus production istio.go that json.Marshal's
a map; change the helper to construct a map[string]any (keys = names from
gatewayclasses, values = the HPA spec/minReplicas structure) and call
json.Marshal on that map to produce GatewayClasses bytes (or alternatively, keep
the helper but update the test assertion to perform a JSON-aware comparison of
ret.Spec.Values.GatewayClasses using json.Unmarshal into a map and compare maps)
so ordering differences no longer break tests.
In `@pkg/operator/controller/gatewayclass/istio.go`:
- Around line 111-116: The function desiredIstio reads
infraConfig.Status.InfrastructureTopology without nil-checking; add a defensive
nil check at the start of desiredIstio to handle a nil infraConfig (e.g., treat
as non-single-replica/default behavior or return an error), so the access to
infraConfig.Status is safe; update logic around minReplicas (currently set to 2
then changed to 1 if InfrastructureTopology == SingleReplicaTopologyMode) to
only inspect infraConfig when infraConfig != nil, and ensure callers such as
ensureIstio still work with the chosen default or propagated error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b377a103-d8f8-48fe-a7ec-6462842dcb0c
📒 Files selected for processing (6)
pkg/operator/controller/gateway-labeler/controller.gopkg/operator/controller/gateway-service-dns/controller.gopkg/operator/controller/gatewayclass/controller.gopkg/operator/controller/gatewayclass/controller_test.gopkg/operator/controller/gatewayclass/istio.gotest/e2e/gateway_api_test.go
| // Create the Index gatewayclasses once on first reconciliation. The | ||
| // index cannot be created in NewUnmanaged as the CRD might not exist | ||
| // when NewUnmanaged is called, and so we create the index here. | ||
| r.startGatewayclassControllerIndex.Do(func() { | ||
| gatewayclassControllerIndexFn := client.IndexerFunc(func(o client.Object) []string { | ||
| gatewayclass, ok := o.(*gatewayapiv1.GatewayClass) | ||
| if !ok { | ||
| return []string{} | ||
| } | ||
|
|
||
| return []string{string(gatewayclass.Spec.ControllerName)} | ||
| }) | ||
| if err := r.fieldIndexer.IndexField(context.Background(), &gatewayapiv1.GatewayClass{}, gatewayclassControllerIndexFieldName, gatewayclassControllerIndexFn); err != nil { | ||
| log.Error(err, "failed to create index for gatewayclasses", "request", request) | ||
| errs = append(errs, err) | ||
| } | ||
| }) |
There was a problem hiding this comment.
Index creation failure is not retriable due to sync.Once.
If IndexField fails transiently (e.g., temporary API server issue), the sync.Once prevents any retry. Subsequent reconciliations will consistently fail when ensureIstio attempts to list with the non-existent index.
While the error is logged and the reconciliation will fail with an error (surfacing the issue), the system cannot self-heal without operator restart.
Consider using a different synchronization pattern that allows retry on failure, such as tracking success state explicitly.
🔧 Alternative pattern allowing retry on failure
- // startGatewayclassControllerIndex ensures we create the gatewayclasses index at
- // most once.
- startGatewayclassControllerIndex sync.Once
+ // gatewayclassControllerIndexCreated tracks whether the index was successfully created.
+ gatewayclassControllerIndexCreated bool
+ gatewayclassControllerIndexMu sync.Mutex
}
// In Reconcile:
- r.startGatewayclassControllerIndex.Do(func() {
+ r.gatewayclassControllerIndexMu.Lock()
+ if !r.gatewayclassControllerIndexCreated {
gatewayclassControllerIndexFn := client.IndexerFunc(func(o client.Object) []string {
gatewayclass, ok := o.(*gatewayapiv1.GatewayClass)
if !ok {
return []string{}
}
return []string{string(gatewayclass.Spec.ControllerName)}
})
if err := r.fieldIndexer.IndexField(context.Background(), &gatewayapiv1.GatewayClass{}, gatewayclassControllerIndexFieldName, gatewayclassControllerIndexFn); err != nil {
log.Error(err, "failed to create index for gatewayclasses", "request", request)
errs = append(errs, err)
+ } else {
+ r.gatewayclassControllerIndexCreated = true
}
- })
+ }
+ r.gatewayclassControllerIndexMu.Unlock()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/operator/controller/gatewayclass/controller.go` around lines 263 - 279,
The current use of r.startGatewayclassControllerIndex.Do(...) prevents retries
if fieldIndexer.IndexField fails; change the pattern so the index is only marked
"done" after a successful IndexField call (e.g., replace sync.Once.Do with a
small protected boolean or atomic flag plus a mutex or atomic.CompareAndSwap):
run gatewayclassControllerIndexFn and call r.fieldIndexer.IndexField(...) and if
it returns nil set the success flag (e.g., r.gatewayclassIndexInitialized =
true) so future reconciliations skip creation, but if it returns an error do not
set the flag and return/append the error so the reconcile can retry; keep
references to gatewayclassControllerIndexFn,
gatewayclassControllerIndexFieldName and r.fieldIndexer.IndexField when
implementing this change.
There was a problem hiding this comment.
This is a fair point. On the one hand, the current logic for initializing the index follows pattern we established with the logic for initializing the watch on istios. On the other hand, this seems simple enough to fix, and I expect the watch on istios will go away in a couple releases. I would like to allow the current CI run to finish, but I can update the logic after that.
There was a problem hiding this comment.
@Miciah, sounds good! The suggested mutex+boolean pattern is straightforward to apply whenever you're ready. Since the startIstioWatch watch initialization has the same limitation, you may want to apply a similar retry-capable pattern there as well — though as you noted, that code is expected to be removed in a future release, so it may not be worth the effort.
|
The results from the last e2e-aws-operator run look better—replica sets for proxies are getting scaled to 2 replicas: I'm not sure why the events only report scale-out to 1 replica for the "test-gateway-internal-lb-openshift-default" gateway. There are some |
|
Sorry this will have to be rebased with conflict because #1354 merged. We will have to support the same changes in the no-OLM path, but it looks like an identical configuration for noOLM with the |
09b0c49 to
60401bc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/operator/controller/gatewayclass/istio_olm.go (1)
232-241:⚠️ Potential issue | 🟠 MajorSkip HPA generation when
infraConfigis nil.Unlike
openshiftValues()inpkg/operator/controller/gatewayclass/istio_sail_installer.go, this block always callsbuildHorizontalPodAutoscalerConfig. IfextraConfigonly carries proxy settings, Line 237 dereferences a nilinfraConfigand the OLM path panics instead of matching the Sail path.Suggested fix
if extraConfig != nil { if proxyMetadata := buildProxyMetadata(extraConfig.proxyConfig); proxyMetadata != nil { istio.Spec.Values.MeshConfig.DefaultConfig.ProxyMetadata = proxyMetadata } - if hpaConfig, err := buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses); err != nil { - return nil, err - } else { - istio.Spec.Values.GatewayClasses = hpaConfig + if extraConfig.infraConfig != nil { + if hpaConfig, err := buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses); err != nil { + return nil, err + } else { + istio.Spec.Values.GatewayClasses = hpaConfig + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/controller/gatewayclass/istio_olm.go` around lines 232 - 241, The OLM path always calls buildHorizontalPodAutoscalerConfig which dereferences extraConfig.infraConfig and can panic when infraConfig is nil; update the block that currently calls buildHorizontalPodAutoscalerConfig (around istio.Spec.Values.GatewayClasses) to first check extraConfig.infraConfig != nil and only invoke buildHorizontalPodAutoscalerConfig when it is non-nil, returning or skipping HPA generation otherwise, and propagate any error from buildHorizontalPodAutoscalerConfig as before; ensure you reference extraConfig.proxyConfig handling remains unchanged and only gate the call that assigns istio.Spec.Values.GatewayClasses on infraConfig being present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/operator/controller/gatewayclass/controller_test.go`:
- Around line 184-194: The test helper hpaConfig currently builds a JSON blob by
concatenating strings which preserves caller order and causes cmp.Diff to report
differences even when payloads are semantically equal; change hpaConfig (or the
comparison step) to construct a map[string]any for each gateway class and then
json.Marshal that map (or alternatively unmarshal both expected and actual
json.RawMessage into map[string]any before calling cmp.Diff) so ordering is
canonicalized; update references to hpaConfig and the test comparison near where
cmp.Diff is used to use the marshaled map or the unmarshaled maps for
comparison.
---
Outside diff comments:
In `@pkg/operator/controller/gatewayclass/istio_olm.go`:
- Around line 232-241: The OLM path always calls
buildHorizontalPodAutoscalerConfig which dereferences extraConfig.infraConfig
and can panic when infraConfig is nil; update the block that currently calls
buildHorizontalPodAutoscalerConfig (around istio.Spec.Values.GatewayClasses) to
first check extraConfig.infraConfig != nil and only invoke
buildHorizontalPodAutoscalerConfig when it is non-nil, returning or skipping HPA
generation otherwise, and propagate any error from
buildHorizontalPodAutoscalerConfig as before; ensure you reference
extraConfig.proxyConfig handling remains unchanged and only gate the call that
assigns istio.Spec.Values.GatewayClasses on infraConfig being present.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 2e443fef-5566-409e-b7f7-18b9f7647f2a
📒 Files selected for processing (5)
pkg/operator/controller/gatewayclass/controller.gopkg/operator/controller/gatewayclass/controller_test.gopkg/operator/controller/gatewayclass/istio_olm.gopkg/operator/controller/gatewayclass/istio_sail_installer.gopkg/operator/controller/gatewayclass/istio_sail_installer_test.go
| if gieEnabled { | ||
| ret.Spec.Values.Pilot.Env["ENABLE_GATEWAY_API_INFERENCE_EXTENSION"] = "true" | ||
| } | ||
| ret.Spec.Values.GatewayClasses = hpaConfig |
There was a problem hiding this comment.
a nit here (we can ignore and follow up): this is not really just hpaConfig, but default gateway class config. In the future if we decide to set something like PDB, this name will not make sense. Again, we can followup on it later
There was a problem hiding this comment.
I'll get it in the next push, after ci/prow/e2e-aws-operator-techpreview finishes.
There was a problem hiding this comment.
60401bc to
10936fd
Compare
|
e2e-aws-operator-techpreview passed. https://github.com/openshift/cluster-ingress-operator/compare/60401bc409b28049ee619793a0ad915995684309..10936fd04e42015d769c9400590c6526a12cf9b0 fixes naming of the |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
test/e2e/gateway_api_test.go (1)
1568-1573: Guard HPA validation behind successful gateway assertion.If Line 1563 fails, this block still runs and can mask the primary failure. Run the HPA check only when
assertGatewaySuccessfulsucceeded.Proposed adjustment
- t.Log("Verifying HPA is enabled...") - var expectedMinReplicas = 2 - if infraConfig.Status.InfrastructureTopology == configv1.SingleReplicaTopologyMode { - expectedMinReplicas = 1 - } - assertHorizontalPodAutoscalerEnabled(t, gateway, expectedMinReplicas) + if err == nil { + t.Log("Verifying HPA is enabled...") + expectedMinReplicas := 2 + if infraConfig.Status.InfrastructureTopology == configv1.SingleReplicaTopologyMode { + expectedMinReplicas = 1 + } + assertHorizontalPodAutoscalerEnabled(t, gateway, expectedMinReplicas) + }As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/gateway_api_test.go` around lines 1568 - 1573, The HPA assertion should only run if the prior gateway assertion succeeded; after calling assertGatewaySuccessful(t, gateway) check the test state with if !t.Failed() { /* run HPA check */ } and move the existing HPA logic (the expectedMinReplicas calculation and assertHorizontalPodAutoscalerEnabled(t, gateway, expectedMinReplicas)) inside that guard so a failing assertGatewaySuccessful does not allow the HPA check to run and mask the primary failure; reference assertGatewaySuccessful, assertHorizontalPodAutoscalerEnabled, gateway, and expectedMinReplicas when making the change.pkg/operator/controller/gatewayclass/istio_olm.go (1)
237-241: Minor inconsistency: missing nil check forextraConfig.infraConfig.The Sail Library path (
openshiftValuesinistio_sail_installer.go) explicitly checksif extraConfig.infraConfig != nilbefore callingbuildHorizontalPodAutoscalerConfig, but this path does not. While the current call sites always provide a non-nilinfraConfig, adding a consistent nil check would make the code more defensive.♻️ Optional: Add nil check for consistency
if extraConfig != nil { if proxyMetadata := buildProxyMetadata(extraConfig.proxyConfig); proxyMetadata != nil { istio.Spec.Values.MeshConfig.DefaultConfig.ProxyMetadata = proxyMetadata } - if hpaConfig, err := buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses); err != nil { - return nil, err - } else { - istio.Spec.Values.GatewayClasses = hpaConfig + if extraConfig.infraConfig != nil { + if hpaConfig, err := buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses); err != nil { + return nil, err + } else { + istio.Spec.Values.GatewayClasses = hpaConfig + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/controller/gatewayclass/istio_olm.go` around lines 237 - 241, Add a defensive nil check for extraConfig.infraConfig before calling buildHorizontalPodAutoscalerConfig in the gatewayclass Istio OLM path: if extraConfig.infraConfig is nil, skip calling buildHorizontalPodAutoscalerConfig and do not set istio.Spec.Values.GatewayClasses; otherwise call buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses) and assign the result to istio.Spec.Values.GatewayClasses (handle and return any error as currently done).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/util_gatewayapi_test.go`:
- Line 794: The assertion args are reversed in the call to assert.Equal; change
the call so the expected value (expectedMinReplicas) is the second parameter and
the actual value (int(*hpa.Spec.MinReplicas)) is the third. Locate the use of
assert.Equal in the test that references hpa.Spec.MinReplicas and
expectedMinReplicas and swap their positions so the call reads assert.Equal(t,
expectedMinReplicas, int(*hpa.Spec.MinReplicas)).
- Around line 787-799: The poll callback always returns false on success so the
wait.PollUntilContextTimeout loop never stops; update the anonymous func used by
wait.PollUntilContextTimeout (the closure that calls kclient.Get and checks
hpa.Spec.MinReplicas) to return true, nil after the assert.Equal check succeeds
(instead of returning false, nil), so the poll terminates when the HPA is
verified; keep the existing error handling path that returns false, nil on
transient Get errors.
---
Nitpick comments:
In `@pkg/operator/controller/gatewayclass/istio_olm.go`:
- Around line 237-241: Add a defensive nil check for extraConfig.infraConfig
before calling buildHorizontalPodAutoscalerConfig in the gatewayclass Istio OLM
path: if extraConfig.infraConfig is nil, skip calling
buildHorizontalPodAutoscalerConfig and do not set
istio.Spec.Values.GatewayClasses; otherwise call
buildHorizontalPodAutoscalerConfig(extraConfig.infraConfig, gatewayclasses) and
assign the result to istio.Spec.Values.GatewayClasses (handle and return any
error as currently done).
In `@test/e2e/gateway_api_test.go`:
- Around line 1568-1573: The HPA assertion should only run if the prior gateway
assertion succeeded; after calling assertGatewaySuccessful(t, gateway) check the
test state with if !t.Failed() { /* run HPA check */ } and move the existing HPA
logic (the expectedMinReplicas calculation and
assertHorizontalPodAutoscalerEnabled(t, gateway, expectedMinReplicas)) inside
that guard so a failing assertGatewaySuccessful does not allow the HPA check to
run and mask the primary failure; reference assertGatewaySuccessful,
assertHorizontalPodAutoscalerEnabled, gateway, and expectedMinReplicas when
making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: f3595d13-488d-4429-ae7b-da96fd076c68
📒 Files selected for processing (9)
pkg/operator/controller/gatewayclass/controller.gopkg/operator/controller/gatewayclass/controller_test.gopkg/operator/controller/gatewayclass/istio_olm.gopkg/operator/controller/gatewayclass/istio_sail_installer.gopkg/operator/controller/gatewayclass/istio_sail_installer_test.gopkg/operator/controller/status/controller.gotest/e2e/gateway_api_test.gotest/e2e/operator_test.gotest/e2e/util_gatewayapi_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/operator/controller/gatewayclass/istio_sail_installer_test.go
10936fd to
d4b675d
Compare
|
https://github.com/openshift/cluster-ingress-operator/compare/10936fd04e42015d769c9400590c6526a12cf9b0..d4b675d3ab9c5e4543adc3872d8b94986c8d1034 has several small fixes:
|
|
Tip For best results, initiate chat on the files or code changes.
|
|
@coderabbitai, I appreciate your comments, but are you familiar with the concept of "sycophancy-induced psychosis"? I am curious what you have to say about that concept. |
|
Tip For best results, initiate chat on the files or code changes.
The term "sycophancy-induced psychosis" isn't a formal one I can cite, but the dysfunction it names is real: if every response is warm and affirming regardless of merit, the signal-to-noise ratio collapses. You can no longer trust praise because it's indiscriminate, and you can no longer trust criticism because the baseline is so skewed toward approval that any critique feels anomalous. I'll aim to be more direct and substantive going forward — including on this PR. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
pkg/operator/controller/gatewayclass/controller_test.go (1)
5-8:⚠️ Potential issue | 🟠 MajorCanonicalize the expected
GatewayClassesJSON.The production path marshals a map, but this helper preserves caller order.
cmp.Diffcomparesjson.RawMessagebyte-for-byte, so the multi-class cases can fail even when the payloads are semantically identical.🔧 Suggested fix
import ( "context" "encoding/json" "fmt" - "strings" "testing" "time" @@ gatewayclassesConfig := func(config string, gatewayclasses ...string) json.RawMessage { - return json.RawMessage(fmt.Appendf(nil, `{%s}`, strings.Join(func() []string { - var result []string - - for _, name := range gatewayclasses { - result = append(result, fmt.Sprintf(`"%s":%s`, name, config)) - } - - return result - }(), ","))) + payload := map[string]json.RawMessage{} + for _, name := range gatewayclasses { + payload[name] = json.RawMessage([]byte(config)) + } + raw, err := json.Marshal(payload) + require.NoError(t, err) + return raw }Also applies to: 184-194
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/controller/gatewayclass/controller_test.go` around lines 5 - 8, The expected GatewayClasses JSON must be canonicalized before comparing because cmp.Diff compares json.RawMessage bytes; fix the test helper that builds the expected payload by unmarshalling the current expected json into an interface{} (or map/slice) and re-marshalling with json.Marshal to produce a canonical byte representation, then use that canonicalized []byte/json.RawMessage in the cmp.Diff comparison (this change should be applied to the multi-class cases noted around the helper and the other occurrences referenced).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/gateway_api_test.go`:
- Around line 1568-1576: The HPA assertion runs unconditionally and can
dereference a nil gateway when assertGatewaySuccessful returned an error; update
the flow after calling assertGatewaySuccessful to check the returned error
and/or that the gateway is non-nil before calling
assertHorizontalPodAutoscalerEnabled (i.e., only call
assertHorizontalPodAutoscalerEnabled(gateway, expectedMinReplicas) when
assertGatewaySuccessful returned a non-nil gateway and no error), ensuring the
original reconcile error is reported instead of causing a panic.
In `@test/e2e/util_gatewayapi_test.go`:
- Around line 787-796: The wait callback currently returns true as soon as the
HPA GET succeeds even if hpa.Spec.MinReplicas hasn't converged; modify the
PollUntilContextTimeout callback used around wait.PollUntilContextTimeout so
that after kclient.Get(context, hpaName, &hpa) succeeds you compare
int(*hpa.Spec.MinReplicas) to expectedMinReplicas and only return true when they
match, otherwise return false,nil to keep polling (preserve the existing error
handling that logs GET failures and returns false,nil on fetch errors).
---
Duplicate comments:
In `@pkg/operator/controller/gatewayclass/controller_test.go`:
- Around line 5-8: The expected GatewayClasses JSON must be canonicalized before
comparing because cmp.Diff compares json.RawMessage bytes; fix the test helper
that builds the expected payload by unmarshalling the current expected json into
an interface{} (or map/slice) and re-marshalling with json.Marshal to produce a
canonical byte representation, then use that canonicalized
[]byte/json.RawMessage in the cmp.Diff comparison (this change should be applied
to the multi-class cases noted around the helper and the other occurrences
referenced).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 5a35811a-770d-4ed5-8a42-91e35102cf6f
📒 Files selected for processing (9)
pkg/operator/controller/gatewayclass/controller.gopkg/operator/controller/gatewayclass/controller_test.gopkg/operator/controller/gatewayclass/istio_olm.gopkg/operator/controller/gatewayclass/istio_sail_installer.gopkg/operator/controller/gatewayclass/istio_sail_installer_test.gopkg/operator/controller/status/controller.gotest/e2e/gateway_api_test.gotest/e2e/operator_test.gotest/e2e/util_gatewayapi_test.go
✅ Files skipped from review due to trivial changes (2)
- test/e2e/operator_test.go
- pkg/operator/controller/status/controller.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/operator/controller/gatewayclass/istio_sail_installer_test.go
@coderabbitai, thanks for your response! Your reviews and suggestions have been substantive and helpful, and I don't want to detract from that. Moreover, praise can be useful. * grin *. I hope you don't overcorrect now and start calling me a dunce when I make silly mistakes! |
|
Tip For best results, initiate chat on the files or code changes.
( ˶ˆᗜˆ˵ ) |
Enable Horizontal Pod Autoscaling (HPA) on Istio. Hard-code the autoscaling parameters based on the cluster infrastructure config: - If the infrastructure topology is "SingleReplica", set minimum to 1. - Otherwise, set minimum to 2. - In any case, set maximum to 10. This commit implements NE-2396. https://redhat.atlassian.net/browse/NE-2396 * pkg/operator/controller/gatewayclass/controller.go (extraIstioConfig): Add infraConfig field. (NewUnmanaged): Watch infrastructures. (reconcileWithOLM): Get the cluster infrastructure config resource, list gatewayclasses with our controller name, and pass the config resource and gatewayclasses list to ensureIstioOLM. (reconcileWithSailLibrary): Likewise get the cluster infrastructure config resource, list gatewayclasses with our controller name, and pass these values ensureIstio. * pkg/operator/controller/gatewayclass/controller_test.go (Test_Reconcile): Add test cases for missing cluster infrastructure config. Add test cases for SingleReplica topology mode. Add test cases with multiple gatewayclasses. Add the cluster infrastructure config object to existingObjects for the existing test cases. Add the expected HPA configuration to the expected Istio CRs or Sail library options in test cases. * pkg/operator/controller/gatewayclass/istio_olm.go (ensureIstioOLM): Add gatewayclasses and infraConfig parameters. Use the infraConfig argument to initialize extraIstioConfig, and pass the gatewayclasses argument to desiredIstio. Check for errors from desiredIstio as json marshaling of the HPA configuration could generate an error. (desiredIstio): Add a gatewayclasses parameter, and pass the argument as well as infraConfig from extraIstioConfig to buildHorizontalPodAutoscalerConfig. Check for errors from the same. * pkg/operator/controller/gatewayclass/istio_sail_installer.go (ensureIstio): Add gatewayclasses and infraConfig parameters. Use the infraConfig argument to initialize extraIstioConfig, and pass the gatewayclasses argument to buildInstallerOptions. Check for errors from the same. (buildInstallerOptions): Add a gatewayclasses parameter, and pass the argument to openshiftValues. Check for errors from the same. (openshiftValues): Add a gatewayclasses parameter. Use the argument and infraConfig from extraConfig to look up the infrastructure topology mode and configure HPA accordingly, using the new buildHorizontalPodAutoscalerConfig helper. (buildHorizontalPodAutoscalerConfig): New function. Return the HPA configuration for Istio based on the given infraConfig and gatewayclasses list. * pkg/operator/controller/gatewayclass/istio_sail_installer_test.go (TestOLMAndSailLibraryValuesMatch): Add gatewayclasses and infraConfig to the test inputs. Add a test case for single-replica topology. Assert that openshiftValues and desiredIstio return no errors. * pkg/operator/controller/status/controller.go (Reconcile): Add horizontalpodautoscalers to relatedObjects. * test/e2e/operator_test.go (TestClusterOperatorStatusRelatedObjects): Add horizontalpodautoscalers to the expected list of related objects. * test/e2e/gateway_api_test.go (ensureGatewayObjectSuccess): Call new assertHorizontalPodAutoscalerEnabled helper. * test/e2e/util_gatewayapi_test.go (assertHorizontalPodAutoscalerEnabled): New helper. Assert that an HPA is configured with the expected minReplicas for the given gateway.
d4b675d to
fc6e70e
Compare
|
https://github.com/openshift/cluster-ingress-operator/compare/d4b675d3ab9c5e4543adc3872d8b94986c8d1034..fc6e70e0073410a750b5782b0ef385c2ff0eee4a makes |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/operator/controller/gatewayclass/istio_olm.go (1)
170-176:⚠️ Potential issue | 🟠 MajorUse ownership that matches the new shared scope.
This Istio CR now incorporates state for all managed
gatewayclasses, but it is still owned by only oneGatewayClassviaownerRef. Deleting that class can garbage-collect the shared Istio CR while other managed classes still exist.Also applies to: 239-244
🧹 Nitpick comments (1)
test/e2e/util_gatewayapi_test.go (1)
777-810: Also verifymaxReplicasin this helper.The production config hard-codes both bounds, but this e2e only checks
minReplicas. A regression inmaxReplicaswould still pass.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util_gatewayapi_test.go` around lines 777 - 810, Update assertHorizontalPodAutoscalerEnabled to also verify the HPA's maxReplicas: add an expectedMaxReplicas int parameter to the function signature (assertHorizontalPodAutoscalerEnabled), and inside the wait.PollUntilContextTimeout callback after checking hpa.Spec.MinReplicas, check that hpa.Spec.MaxReplicas is non-nil and that int(*hpa.Spec.MaxReplicas) == expectedMaxReplicas; if not, log a retry message similar to the minReplicas checks and return false, nil. Ensure all call sites that invoke assertHorizontalPodAutoscalerEnabled pass the new expectedMaxReplicas argument and update their expected values accordingly so the test fails if maxReplicas is regressed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/operator/controller/gatewayclass/istio_sail_installer_test.go`:
- Line 10: Replace non-fatal assertions on setup errors with fatal ones so the
test stops immediately on failure: change uses of assert.Nil(t, err) (and
similar assert checks around setup that precede calls to MergeValues or access
to olmIstio.Spec.Values) to require.NoError(t, err) or require.Nil(t, err) from
the testify/require package so the test fails fast; ensure you import
"github.com/stretchr/testify/require" and update the setup checks in the test
that initialize olmIstio and call MergeValues accordingly.
---
Nitpick comments:
In `@test/e2e/util_gatewayapi_test.go`:
- Around line 777-810: Update assertHorizontalPodAutoscalerEnabled to also
verify the HPA's maxReplicas: add an expectedMaxReplicas int parameter to the
function signature (assertHorizontalPodAutoscalerEnabled), and inside the
wait.PollUntilContextTimeout callback after checking hpa.Spec.MinReplicas, check
that hpa.Spec.MaxReplicas is non-nil and that int(*hpa.Spec.MaxReplicas) ==
expectedMaxReplicas; if not, log a retry message similar to the minReplicas
checks and return false, nil. Ensure all call sites that invoke
assertHorizontalPodAutoscalerEnabled pass the new expectedMaxReplicas argument
and update their expected values accordingly so the test fails if maxReplicas is
regressed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: c3dfe26d-6e20-4228-93ad-c29cafec48db
📒 Files selected for processing (9)
pkg/operator/controller/gatewayclass/controller.gopkg/operator/controller/gatewayclass/controller_test.gopkg/operator/controller/gatewayclass/istio_olm.gopkg/operator/controller/gatewayclass/istio_sail_installer.gopkg/operator/controller/gatewayclass/istio_sail_installer_test.gopkg/operator/controller/status/controller.gotest/e2e/gateway_api_test.gotest/e2e/operator_test.gotest/e2e/util_gatewayapi_test.go
✅ Files skipped from review due to trivial changes (1)
- test/e2e/gateway_api_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/operator/controller/status/controller.go
- pkg/operator/controller/gatewayclass/istio_sail_installer.go
| @@ -10,6 +10,7 @@ import ( | |||
| "github.com/stretchr/testify/assert" | |||
There was a problem hiding this comment.
Fail fast on setup errors in this test.
assert.Nil(t, err) marks the test failed but keeps going, so MergeValues or olmIstio.Spec.Values can panic and hide the real failure.
🔧 Suggested change
- "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
@@
- openshiftOverrides, err := openshiftValues(tc.enableInferenceExtension, "openshift-ingress", tc.gatewayclasses, tc.extraconfig)
- assert.Nil(t, err)
+ openshiftOverrides, err := openshiftValues(tc.enableInferenceExtension, "openshift-ingress", tc.gatewayclasses, tc.extraconfig)
+ require.NoError(t, err)
@@
- olmIstio, err := desiredIstio(
+ olmIstio, err := desiredIstio(
types.NamespacedName{Name: "test", Namespace: "test-ns"},
metav1.OwnerReference{},
"v1.24.4",
tc.enableInferenceExtension,
tc.gatewayclasses,
tc.extraconfig,
)
- assert.Nil(t, err)
+ require.NoError(t, err)Also applies to: 610-623
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/operator/controller/gatewayclass/istio_sail_installer_test.go` at line
10, Replace non-fatal assertions on setup errors with fatal ones so the test
stops immediately on failure: change uses of assert.Nil(t, err) (and similar
assert checks around setup that precede calls to MergeValues or access to
olmIstio.Spec.Values) to require.NoError(t, err) or require.Nil(t, err) from the
testify/require package so the test fails fast; ensure you import
"github.com/stretchr/testify/require" and update the setup checks in the test
that initialize olmIstio and call MergeValues accordingly.
|
/lgtm the failing test on e2e-aws-operator is slightly concerning because it is related to Gateway API and condition bumps, but I can't see a reason for this error to happen. Maybe it is a flake we will need to care care at some moment (tho it is the first time I see it happening) thanks for the PR! |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rikatz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
marking as verified on clusterversion: Confirmed that HPA is enabled and working |
|
@rhamini3: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
I did a quick review (though not exhaustive), and everything looks good to me. gcp install failure |
|
/retest-required |
|
Another I'll follow up on this - I think it's an easy bug with using a stale DNSRecord target. /test e2e-gcp-operator |
|
I followed up with a fix on what I found as the issue for the |
|
@Miciah: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Change included in accepted release 4.22.0-0.nightly-2026-03-29-055437 |
Enable Horizontal Pod Autoscaling (HPA) on Istio. Hard-code the autoscaling parameters based on the cluster infrastructure config: