Skip to content

Conversation

@joseorpa
Copy link

@joseorpa joseorpa commented Oct 27, 2025

Enhancement: Ingress Operator Resource Configuration via v1alpha1 API
This enhancement proposes adding the ability to configure resource limits
and requests for the ingress-operator deployment containers via a new
v1alpha1 API field in the IngressController custom resource.

This addresses the need for:

  • Setting resource limits for QoS guarantees
  • Compliance requirements for resource constraints
  • Scaling operator resources for large deployments

Relates to: RFE-1476

jortizpa and others added 3 commits October 14, 2025 13:45
This enhancement proposes adding the ability to configure resource limits
and requests for the ingress-operator deployment containers via a new
v1alpha1 API field in the IngressController custom resource.

This addresses the need for:
- Setting resource limits for QoS guarantees
- Compliance requirements for resource constraints
- Scaling operator resources for large deployments

Relates to: RFE-1476
@openshift-ci openshift-ci bot requested review from Miciah and rfredette October 27, 2025 15:32
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 27, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2025

Hi @joseorpa. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment on lines 574 to 588
### Alternative 2: Modify v1 API directly

Add `operatorResourceRequirements` field directly to stable v1 API.

**Pros**:
- No need for v1alpha1 version
- Simpler for users (one API version)

**Cons**:
- Changes stable API (breaking compatibility promise)
- Cannot iterate on design easily
- Difficult to remove if issues found
- Against OpenShift API stability guarantees

**Decision**: Rejected - Use v1alpha1 for new features as per OpenShift conventions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this v1alpha1 convention coming from? Can we introduce v1alpha1 when we already have v1?

The usual approach is to add the field directly to the existing v1 API:

  1. Define a new featuregate, initially in the TPNU feature set (but not Default).
  2. Add a field to the v1 API, using the new featuregate (as you've done using the // +openshift:enable:FeatureGate marker).
  3. Implement the feature and write tests.
  4. Add the featuregate to the Default feature set when it's ready.

Copy link
Author

@joseorpa joseorpa Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This v1alpha1 convention comes from openshift/api#2485 (review)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed can you help here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference between adding a field to an already stable API (which Miciah has pointed out) and adding a completely new API.

The PR I reviewed, and left feedback on, was introducing a completely new API type, and as such, starting as alpha is correct per our latest guidelines.

If you think this should just be a field on an existing v1 API then that's a different discussion

Comment on lines 105 to 107
Create a new v1alpha1 API version for IngressController in the
`operator.openshift.io` group, following the pattern made for example by
[cluster monitoring v1alpha1 configuration](https://github.com/openshift/api/blob/94481d71bb6f3ce6019717ea7900e6f88f42fa2c/config/v1alpha1/types_cluster_monitoring.go#L172-L193).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a shared type for all operators?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean core Kubernetes corev1.ResourceRequirements ? I've seen there is a lot of types in operator.openshift.io group.

- Maintain backward compatibility with existing IngressController v1 API
- Use v1alpha1 API version for this Tech Preview feature
- Provide sensible defaults that work for most deployments
- Support both the ingress-operator and kube-rbac-proxy containers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why kube-rbac-proxy? Is that only for QoS?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm correcting this as well, it is for QoS but I agree, not directly related to router pods.

Comment on lines 258 to 266
A new controller (`operator-deployment-controller`) in the cluster-ingress-operator
watches the default IngressController CR and reconciles the operator's own deployment
when `operatorResourceRequirements` is specified.

**Controller responsibilities:**
1. Watch IngressController resources (v1alpha1)
2. Reconcile `ingress-operator` Deployment in `openshift-ingress-operator` namespace
3. Update container resource specifications
4. Handle error cases gracefully (invalid values, conflicts, etc.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work; CVO manages the ingress-operator deployment. You can't have cluster-ingress-operator update its own deployment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm updating this as well, ingress operator would control just the deployment of the router pods.

**Mitigation**:
- Controller reconciliation loop detects and corrects drift
- Document that configuration should be via IngressController CR, not direct deployment edits
- Admission webhooks prevent direct deployment modifications
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you proposing adding an admission webhook to block updates to the ingress-operator deployment?

Copy link
Author

@joseorpa joseorpa Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm correcting this and changing it for a conversion webhook for the different API versions.


This enhancement proposes adding the ability to configure resource limits and
requests for the ingress-operator deployment containers via a new v1alpha1 API
field in the IngressController custom resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is for the ingress-operator deployment, it doesn't make sense to put this in the IngressController CRD, which describes configuration for router pods.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will update the content of this part of the enhancement as well.

Comment on lines 351 to 352
1. **Q**: Should we support auto-scaling (VPA) in the future?
- **A**: Out of scope for initial implementation, but API should not preclude it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Autoscaling the operator?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the router pod for sure, updating this too

Comment on lines 357 to 358
3. **Q**: Should this apply to all IngressControllers or only the default?
- **A**: Initial implementation only default, but API supports any IngressController
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the configuration apply to IngressControllers (router) pods at all, or only to the ingress-operator pod?

If you mean it applies only to the ingress-operator pod, are you saying that resource requests and limits for the ingress-operator pod are read from the "default" IngressController, and resource request and limits specified on other IngressController CRs are ignored? Putting configuration for the operator in the IngressController CRD is confusing (see #1877 (comment)).

If you actually mean resource requests and limits for router pods, then it seems to me that it is simplest and least surprising to respect the configuration for all IngressControllers, not only for the default. Does respecting configuration for other IngressControllers pose some problem?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be for all router pods.

Comment on lines 360 to 361
4. **Q**: How do we handle the operator modifying its own deployment safely?
- **A**: Use owner references carefully, reconcile loop with backoff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this point? How do you avoid conflicts with CVO?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to router pods controlled by ingress-controller

Comment on lines 426 to 427
- [ ] Sufficient field testing (2+ minor releases in Tech Preview)
- [ ] No major bugs reported for 2 consecutive releases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unusual requirement for OpenShift. For a feature like this, we would usually introduce as Tech Preview and graduate to GA in the same release development cycle.

- [ ] No major bugs reported for 2 consecutive releases
- [ ] Performance impact assessed and documented
- [ ] API design validated by diverse user scenarios
- [ ] At least 10 production users providing positive feedback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you believe you will be able to find 10 production users of this feature?

Comment on lines +562 to +564
- Simpler to implement
- No API version changes needed
- Easy to update without CRD changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need a CRD change to add a reference to the ConfigMap... unless you would have the operator just check for a ConfigMap in openshift-config with some hard-coded name?

Comment on lines 590 to 604
### Alternative 3: Separate CRD for operator configuration

Create a new OperatorConfiguration CRD (similar to how cluster monitoring works).

**Pros**:
- Separation of concerns
- Can configure multiple operators uniformly

**Cons**:
- Increases API surface unnecessarily
- IngressController is the logical place for ingress-operator configuration
- More CRDs to manage
- Inconsistent with how other operators handle self-configuration

**Decision**: Rejected - IngressController CR is the appropriate configuration location
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really to mean for this EP to be specifically for the ingress-operator pod (and not router pods), then I really like this alternative. Have you considered a variant: adding configuration for resource requests and limits to the ClusterVersion CRD (alongside the existing component overrides)? This makes a lot of sense for a few reasons:

  • CVO is the thing that manages the deployment right now; trying to have cluster-ingress-operator update the deployment that CVO manages is asking for trouble.
  • The resource requests and limits configuration logically fits under CVO configuration, not the IngressController API.
  • The configuration logically fits in with component overrides.
  • The resource requests and limits configuration could apply to any operator, not just cluster-ingress-operator; putting the configuration under the ClusterVersion CRD would provide a centralized, consistent way to configure it for multiple operators.

Comment on lines 614 to 620
**Cons**:
- Not GitOps friendly
- Requires direct deployment modification
- Not discoverable via API
- Doesn't follow OpenShift declarative configuration patterns
- Difficult to audit and version control

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would require a CVO override.


## Design Details

### Open Questions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you address this point from https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits?

We do not want cluster components to be restarted based on their resource consumption (for example, being killed due to an out-of-memory condition). We need to detect and handle those cases more gracefully, without degrading cluster performance.

@Miciah
Copy link
Contributor

Miciah commented Oct 28, 2025

/assign

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 29, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from miciah. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@joseorpa joseorpa changed the title Ingress operator resource config Ingress router resource config Oct 29, 2025
@rikatz
Copy link
Member

rikatz commented Nov 5, 2025

/cc @alebedev87

@openshift-ci openshift-ci bot requested a review from alebedev87 November 5, 2025 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants