SPLAT-2137: Support Security Group on NLB for Default router on AWS#1802
SPLAT-2137: Support Security Group on NLB for Default router on AWS#1802mtulio wants to merge 8 commits intoopenshift:masterfrom
Conversation
|
@mtulio: This pull request references SPLAT-2137 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
/test all |
bchandra-ocp
left a comment
There was a problem hiding this comment.
Thanks for sharing this enhancement - it's great to see this progress ahead.
I'm just starting to review but had basic questions on the summary so want to wait before I proceed.
elmiko
left a comment
There was a problem hiding this comment.
this is generally making sense to me, i've left some comments and questions.
| - Configure Ingress rules in the Security Group to allow traffic on the ports defined in the Service's `spec.ports`. The source for these rules will be determined by the `service.beta.kubernetes.io/load-balancer-source-ranges` annotation on the Service (if present, otherwise default to allowing from all IPs). | ||
| - Configure Egress rules in the Security Group to allow traffic to the backend pods on the targetPort specified in the Service's `spec.ports` and the health check port. Initially, this should be restricted to the cluster's VPC CIDR or the specific CIDRs of the worker nodes. | ||
| - When creating the NLB using the AWS ELBv2 API, the CCM will include the ID of the newly created Security Group in the `SecurityGroups` parameter of the `CreateLoadBalancerInput.` | ||
| - When the Service is deleted, the CCM will also delete the associated Security Group, ensuring proper cleanup. |
There was a problem hiding this comment.
what happens if this annotation is added after the Service is created? (ie what happens on update)
There was a problem hiding this comment.
I am working on it, ensuring I will follow the current state of CCM along side the ALBC to correctly document it. Thanks for raising that question.
There was a problem hiding this comment.
We need to be able to answer these questions for upstream, but downstream we could prevent those transitions with VAP
|
|
||
| - The CCM's service controller will watch for Service creations and updates. | ||
| - When it encounters a Service with the annotation `service.beta.kubernetes.io/aws-load-balancer-managed-security-group: "true"` and `service.beta.kubernetes.io/aws-load-balancer-type: nlb`, the CCM will: | ||
| - Create a new AWS Security Group for the NLB. The name should follow a convention like `k8s-elb-a<generated-name-from-service-uid>`. |
There was a problem hiding this comment.
The name should follow a convention like `k8s-elb-a
This is a interesting point, the convention for CCM to create NLB from Services is different than the ALBC, which follow the pattern: k8s-<namespace>-<service_name>-<id>
There was a problem hiding this comment.
Furthermore, I see NLB tags aren't standardized too:
CCM:
kubernetes.io/cluster/clusterID: owned
kubernetes.io/service-name: namespace/service-name
ALBC:
elbv2.k8s.aws/cluster: clusterID
service.k8s.aws/resource: LoadBalancer
service.k8s.aws/stack: namespace/service-name
There was a problem hiding this comment.
Question to @JoelSpeed @elmiko - do we want to standardize the NLB Tags between controllers too?
IIUC kubernetes.io/cluster/clusterID: owned was not added in my ALBC exploration because the service was created by ALBO/ALBC which seems not to enforce cluster tags.
| region: us-east-1 | ||
| lbType: NLB <-- deprecate by platform.aws.ingressController.loadBalancerType? | ||
| ingressController: <-- proposing to aggregate CIO configurations | ||
| securityGroupEnabled: True <-- new field |
There was a problem hiding this comment.
What if I want to have different security groups for ingress vs the rest of the cluster? Is that possible?
Do we need the option for this to be automatic (use the same as you'd expect for default) but also a BYO option where users can specify specific SG IDs to be used?
There was a problem hiding this comment.
What if I want to have different security groups for ingress vs the rest of the cluster? Is that possible?
Would you mind elaborate it? I am not sure if I followed correctly as the proposal is already add a dedicated SG to the NLB of the rest of cluster.
Do we need the option for this to be automatic (use the same as you'd expect for default) but also a BYO option where users can specify specific SG IDs to be used?
That's a fair point, but I am note sure if we have customer use case for BYO SG on CIO, and also I wonder if supporting BYO SG would diverge of the main focus of this EP: enable NLB with security group.
BYO SG would increase a bit the implementation scope, specially in the CCM. IIUC By definition when SG IDs are added (BYO SG) through annotations, the CCM (Classic LB), or ALBC, won't manage those SGs' lifecycle. The ALBC also provides an extra annotation (manage-backend-security-group-rules) to allow managing node rules:
If you specify this annotation, you need to configure the security groups on your Node/Pod to allow inbound traffic from the load balancer. You could also set the manage-backend-security-group-rules if you want the controller to manage the access rule
So what we are targeting is to provide the initial ability of enabling SG on NLB, similar it deploys CLB by default, as requested by managed Services. I am thinking if any additional feature/parity with ALBC would fall into the long-term planning we've been discussing with PMs. Do you think we could phase it? Thoughts?
There was a problem hiding this comment.
in the latest version I added the BYO SG workflow as a later phase as opt-in to the Service object, removing the installer/CIO option/API.
| annotations: | ||
| service.beta.kubernetes.io/aws-load-balancer-type: nlb | ||
| service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing | ||
| service.beta.kubernetes.io/aws-load-balancer-managed-security-group: "true" <-- new annotation |
There was a problem hiding this comment.
What does the annotation scheme look like in the AWS LBC? I thought it just allowed you to specify IDs
I think the upstream change to the CCM wants to mimic the behaviour described in https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/service/annotations/#security-groups
Is our described behaviour here compatible with that, if not, have we deliberately deviated from that pattern?
There was a problem hiding this comment.
Is our described behaviour here compatible with that
It is not, proposed annotation service.beta.kubernetes.io/aws-load-balancer-managed-security-group is not the same as BYO SG annotation. To recap BYO SG annotations:
on ALBC:
- service.beta.kubernetes.io/aws-load-balancer-security-groups
- alb.ingress.kubernetes.io/frontend-nlb-security-groups (I really didnt understood the difference from later)
on CCM:
- service.beta.kubernetes.io/aws-load-balancer-security-groups
- service.beta.kubernetes.io/aws-load-balancer-extra-security-groups
, if not, have we deliberately deviated from that pattern?
Yes, it is intentionally proposing a new annotation to signalize the CCM to manage the SG when NLB (allowing users to transition to this config: opt-in). It was added mainly to prevent changing the default behavior of CCM when provisioning NLB.
AFAICT the ALBC does not provide this option as it defaults to SG since v2.6.0 (Aug 10, 2023), and it's not possible to disable it (?).
Alternatively, I can see:
- Changing explicitly the default behavior of NLB to always create SGs (do we want that?)
I believe we can converge to the thread https://github.com/openshift/enhancements/pull/1802/files#r2111532244 where you mentioned the transition and suggested configuration changes.
There was a problem hiding this comment.
In the latest version of this EP we are moving to a global configuration (cloud-config) for CCM, enforced in OpenShift by CCCMO, instead of a "managed" annotation as described above.
The BYO SG flow is also covered in a later phase of this EP, ensuring customers can opt-out the enforced managed SG on NLBs, following existing ALBC flow.
| - Configure Ingress rules in the Security Group to allow traffic on the ports defined in the Service's `spec.ports`. The source for these rules will be determined by the `service.beta.kubernetes.io/load-balancer-source-ranges` annotation on the Service (if present, otherwise default to allowing from all IPs). | ||
| - Configure Egress rules in the Security Group to allow traffic to the backend pods on the targetPort specified in the Service's `spec.ports` and the health check port. Initially, this should be restricted to the cluster's VPC CIDR or the specific CIDRs of the worker nodes. | ||
| - When creating the NLB using the AWS ELBv2 API, the CCM will include the ID of the newly created Security Group in the `SecurityGroups` parameter of the `CreateLoadBalancerInput.` | ||
| - When the Service is deleted, the CCM will also delete the associated Security Group, ensuring proper cleanup. |
There was a problem hiding this comment.
We need to be able to answer these questions for upstream, but downstream we could prevent those transitions with VAP
| // ServiceAnnotationLoadBalancerManagedSecurityGroup is the annotation used | ||
| // on the service to specify the instruct CCM to manage the security group when creating a Network Load Balancer. When enabled, | ||
| // the CCM creates the security group and it's rules. This option can not be used with annotations | ||
| // "service.beta.kubernetes.io/aws-load-balancer-security-groups" and "service.beta.kubernetes.io/aws-load-balancer-extra-security-groups". | ||
| const ServiceAnnotationLoadBalancerManagedSecurityGroup = "service.beta.kubernetes.io/aws-load-balancer-managed-security-group" |
There was a problem hiding this comment.
So this doesn't exist in LBC right? Is this being introduced to allow a transition from a CCM where it does not currently create a security group, to enabling users to opt-in to creating security groups?
Have you considered if it might be better to make this a CCM configuration that an admin would set for the cluster, rather than setting it for each service?
I could see in the future OpenShift changing the default to say that all new NLBs should have a security group created automatically for them
There was a problem hiding this comment.
So this doesn't exist in LBC right? Is this being introduced to allow a transition from a CCM where it does not currently create a security group, to enabling users to opt-in to creating security groups?
yes and yes. The idea was to prevent disrupt existing flow when creating services with NLB.
Have you considered if it might be better to make this a CCM configuration that an admin would set for the cluster, rather than setting it for each service?
I didn't but this is an excellent idea. It would decrease a lot the amount of API changes proposed in this EP, furthermore helping us in the future by (if) transitioning to ALBC.
@elmiko mentioned about requiring the CCM changes to be under a feature gate, what about if we introduce a FG that will enable SG by default when provisioning NLBs on CCM, so we can enable it on OCP and remove mostly API proposals, and annotations, in this EP?
It would also decrease the UX overhead, and also laser focus in the initial problem.
Would the workflow be like the following options (superficially)?:
openshift-install:
- user sets `platform.aws.lbType` to `NLB` value (currently opt-in)
- CCM config is added on OCP deployments (do we need/expose it through installer manifest?)
- CCM creates SG when gate is enabled when provisioning NLB
ROSA Classic or HCP:
- ensure CCM config is updated (or will it be enabled by default in KCM when API FG is set?)
- (same CCM flow)
No changes in CIO.
Is that makes sense?
There was a problem hiding this comment.
I just finished the exploration, and this is the main idea (tl;dr):
-
- Create a new configuration in the cloud config (upstream CCM). Example
-
- Enforce the configuration in the 3CMO. Example
Once new service type loadbalancer NLB is created, the controller will manage an Security Group, attaching it to the new LB.
There was a problem hiding this comment.
I think we want to follow the pattern set out in LBC (https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/deploy/security_groups/#security-groups-for-load-balancers)
Which means:
service.beta.kubernetes.io/aws-load-balancer-security-groupson the service allows a user to specify a pre-existing set of security groups to attach to the front-end of the LB- If the annotation is not set, create and manage a front-end security group for each LB automatically
We don't want to just enable this create and manage front-end LB by default, since that would be a major change.
So, this is where the CCM config option would come in, and allow users to opt-in/out of having a default security group created for each service.
I think that mostly aligns with your suggestions above in this thread, but I think we still want to have the annotation to allow the user to override the behaviour?
Do we need to also account for the shared backend SG behaviour of LBC?
There was a problem hiding this comment.
but I think we still want to have the annotation to allow the user to override the behaviour?
Are you referring to opt-out/override the global config to manage frontend SG creation (proposal of this EP) without a BYO SG approach? Do we need or have a strong reason/use case to do so considering a best practice/recommendation is to assign a SG for NLB? I also wonder if we would be against the ALBC strategy used v2.6.0+ (I really didn't find a configuration to opt-out SG in NLBs in recent ALBC versions).
Do we need to also account for the shared backend SG behaviour of LBC?
I think it would benefit in clusters with high number of services, but if we don't have an strong use case to do so in short-term, I would not increase the amount of features to incorporate to CCM in this EP as the long-term approach on OCP is TBD.
LMK WDYT +@elmiko
There was a problem hiding this comment.
once enabled, will the ccm try to create SGs for older LB services?
It is important not to change the old services, they should remain without the SG
is this saying we don't want to autocreate SGs once the feature is enabled?
We should not just blanketly change this in the upstream CCM, it needs to be introduced slowly and opt-in at first, later we may change the default though
There was a problem hiding this comment.
Joel and Mike - Thanks for your thoughts.
it needs to be introduced slowly and opt-in at first
ACK. My understanding is the cloud-config flag covers that expectation in upstream, and on OCP we can gate it until we look it's ready to default to SG enforced by 3CMO. (current proposal)
Looks like we have a plan/scope defined for this EP. My takeaway from this thread and Slack conversation are:
-
- we are introducing a global cloud-config on CCM allowing to opt-in the managed SG by default across all Service type-loadbalancer NLB
-
- in later phase (still in this EP) we are introducing/enabling a BYO SG annotation on NLB, and this one will be available in the Service level (not planning to change CIO/Installer)
-
- We don't need to an additional annotation to opt-out SG in the Service level
-
- We are not introducing backend/shared SG as it would be covered in long-term research - and it is not an use case we are working in this EP
LMK if I missed something to wrap up this thread. Thanks!
There was a problem hiding this comment.
in later phase (still in this EP) we are introducing/enabling a BYO SG annotation on NLB, and this one will be available in the Service level (not planning to change CIO/Installer)
I would expect users to want to be able to configure this through CIO eventually, cc @Miciah @alebedev87 who might have opinions
Otherwise all agreed
There was a problem hiding this comment.
in later phase (still in this EP) we are introducing/enabling a BYO SG annotation on NLB, and this one will be available in the Service level (not planning to change CIO/Installer)
It seems to me that this is primarily a question of the EP’s scope. From a quick review, I understand the intent of the EP is to support SG for the load balancer that sits in front of the OCP router. If that’s the case, then the cluster ingress operator should be able to determine when to apply the new annotation (which adds the BYO frontend SG) to the publishing service - similar to what we did for the subnet configuration.
However, if the EP’s scope is more generic and aims to enable frontend SG support for NLB services in CCM, then we likely don’t need to configure the router during installation (as part of this EP, can be done as a follow-up EP).
There was a problem hiding this comment.
the intent of the EP is to support SG for the load balancer that sits in front of the OCP router.[...] If that’s the case, then the cluster ingress operator should be able to determine when to apply the new annotation
However, if the EP’s scope is more generic and aims to enable frontend SG support for NLB services in CCM, then we likely don’t need to configure the router during installation
@alebedev87 those are the goal of this EP: https://github.com/openshift/enhancements/pull/1802/files#diff-84882e6fc6fb023742b0ac09960b79620cfea983c45def4739a89fd404cdc05aR70-R91
- (Phase 1, 2) Enable opt-in configuration to CCM, and enforced to OCP, to provision NLB with SG by default on all new services including new routers, without CIO intervention (requirement for ROSA HCP)
- (Phase 3) Introduce BYO SG annotation to CCM when provisioning NLB services, so CIO would be able to expose it to users when it is prioritized (follow up EP).
|
|
||
| > WIP/TBReviewed | ||
|
|
||
| - The implementation in CCM should handle the case where the `service.beta.kubernetes.io/aws-load-balancer-managed-security-group` annotation is set to `true` but the service type is not `NLB` (`aws-load-balancer-type: nlb`). In this scenario, the CCM should likely log a warning mentioning the annotation is supported only on NLB. |
There was a problem hiding this comment.
What does the CCM do today for annotations that don't apply? I suspect it ignores them
We can use VAP downstream to prevent this
There was a problem hiding this comment.
I suspect too, so we don't need to warn/log. I will ensure the existing approach and update this thread. Thanks
|
|
||
| Customers deploying OpenShift on AWS using Network Load Balancers (NLBs) for the default router have expressed the need for a similar security configuration as provided by Classic Load Balancers (CLBs), where a security group is created by CCM and associated with the load balancer. This allows for more granular control over inbound and outbound traffic at the load balancer level, aligning with AWS security best practices and addressing security findings that flag the lack of security groups on NLBs provisioned by the default CCM. | ||
|
|
||
| The default router in OpenShift, an IngressController object managed by Cluster Ingress Controller Operator (CIO), can be created with a Service type Load Balancer NLB instead of default Classic Load Balancer (CLB) during installation by enabling it in the `install-config.yaml`. Currently, the Cloud Controller Manager (CCM), which satisfies Service resources, provisions an AWS Load Balancer of type NLB without a Security Group (SG) directly attached to it. Instead, security rules are managed on the worker nodes' security groups. |
There was a problem hiding this comment.
Instead, security rules are managed on the worker nodes' security groups.
What are the benefits of relying on LB security groups over the node sg? Do we get more fine-grained rules that are managed corresponding to the services? Can we reduce the current rules on compute nodes?
There was a problem hiding this comment.
What are the benefits of relying on LB security groups over the node sg?
Do we get more fine-grained rules that are managed corresponding to the services?
User can improve security rules targeting the lb only, instead of opening rules on node's SG. But also a best practice to associate SG to an NLB (minimum privileges approach):
"We recommend that you associate a security group with your Network Load Balancer when you create it."
Can we reduce the current rules on compute nodes?
I don't think this could be a primarily goal, but we can review if it would have some duplicated/unused rule on node's SG.
Action item: I will keep this thread open to make sure this is reflected in the EP.
mtulio
left a comment
There was a problem hiding this comment.
Thanks @patrickdillon and @JoelSpeed for the review/suggestions. Hopefully I've address your questions.
Perhaps we could focus in the thread where is suggested to change the CCM configuration to enable SG by default in NLBs? if this would be the path forward for this EP (I personally think it is an excellent idea), we could decrease the scope of changes in many components here.
Please let me know your thoughts.
|
|
||
| Customers deploying OpenShift on AWS using Network Load Balancers (NLBs) for the default router have expressed the need for a similar security configuration as provided by Classic Load Balancers (CLBs), where a security group is created by CCM and associated with the load balancer. This allows for more granular control over inbound and outbound traffic at the load balancer level, aligning with AWS security best practices and addressing security findings that flag the lack of security groups on NLBs provisioned by the default CCM. | ||
|
|
||
| The default router in OpenShift, an IngressController object managed by Cluster Ingress Controller Operator (CIO), can be created with a Service type Load Balancer NLB instead of default Classic Load Balancer (CLB) during installation by enabling it in the `install-config.yaml`. Currently, the Cloud Controller Manager (CCM), which satisfies Service resources, provisions an AWS Load Balancer of type NLB without a Security Group (SG) directly attached to it. Instead, security rules are managed on the worker nodes' security groups. |
There was a problem hiding this comment.
What are the benefits of relying on LB security groups over the node sg?
Do we get more fine-grained rules that are managed corresponding to the services?
User can improve security rules targeting the lb only, instead of opening rules on node's SG. But also a best practice to associate SG to an NLB (minimum privileges approach):
"We recommend that you associate a security group with your Network Load Balancer when you create it."
Can we reduce the current rules on compute nodes?
I don't think this could be a primarily goal, but we can review if it would have some duplicated/unused rule on node's SG.
Action item: I will keep this thread open to make sure this is reflected in the EP.
|
|
||
| > WIP/TBReviewed | ||
|
|
||
| - The implementation in CCM should handle the case where the `service.beta.kubernetes.io/aws-load-balancer-managed-security-group` annotation is set to `true` but the service type is not `NLB` (`aws-load-balancer-type: nlb`). In this scenario, the CCM should likely log a warning mentioning the annotation is supported only on NLB. |
There was a problem hiding this comment.
I suspect too, so we don't need to warn/log. I will ensure the existing approach and update this thread. Thanks
Thanks you all for the feedabck. The EP has been reviewed with the comments, updating the proposal to limit to CCM changes by introducing a cloud-config (global configuration) to opt-in enable the managed front-end security group when creating Service type-LoadBalancer NLB, allowing CCCMO to enforce the default on OpenShift. The proposal also introduce an optional Service annotation to BYO SG will opt-out the manage SG. This PR is ready for review. |
elmiko
left a comment
There was a problem hiding this comment.
this is reading well to me, we probably need to chat about the TBD items but i have a couple suggestions/questions.
|
|
||
| AWS [announced support for Security Groups when deploying an NLB in August 2023][nlb-supports-sg], but the CCM for AWS (within kubernetes/cloud-provider-aws) does not currently implement the feature of automatically creating and managing security groups for `Service` resources type-LoadBalancer using NLBs. While the [AWS Load Balancer Controller (ALBC/LBC)][aws-lbc] project already supports deploying security groups for NLBs, this enhancement focuses on adding minimal, opt-in support to the existing CCM to address immediate customer needs without a full migration to the LBC. This approach aims to provide the necessary functionality without requiring significant changes in other OpenShift components like the Ingress Controller, installer, ROSA, etc. | ||
|
|
||
| Using a Network Load Balancer is a recommended network-based Load Balancer by AWS, and attaching a Security Group to an NLB is a security best practice. NLBs also do not support attaching security groups after they are created. |
There was a problem hiding this comment.
the beginning of this sentence is a little confusing:
Using a Network Load Balancer is a recommended network-based Load Balancer by AWS,
is this saying that NLB is the recommended way to do load balancing?
There was a problem hiding this comment.
it's recommended way for network-based LBs. Currently AWS offers two LB replacing ELB/Classic (default by CCM): NLB (network-based) and ALB (application-based). So the idea is to mention the NLB is the recommended one. Do you think I need to state that replacement to improve the reading?
There was a problem hiding this comment.
that makes sense, perhaps to make the sentence clearer you could say:
| Using a Network Load Balancer is a recommended network-based Load Balancer by AWS, and attaching a Security Group to an NLB is a security best practice. NLBs also do not support attaching security groups after they are created. | |
| Using a Network Load Balancer, as opposed to an Application Load Balancer, is the recommended way to do network-based load balancing by AWS, and attaching a Security Group to an NLB is a security best practice. NLBs also do not support attaching security groups after they are created. |
is that accurate?
There was a problem hiding this comment.
Hey @elmiko , what about this?
| Using a Network Load Balancer is a recommended network-based Load Balancer by AWS, and attaching a Security Group to an NLB is a security best practice. NLBs also do not support attaching security groups after they are created. | |
| Using a Network Load Balancer, as opposed to an Classic Load Balancer, is the recommended way to do network-based load balancing by AWS, and attaching a Security Group to an NLB is a security best practice. NLBs also do not support attaching security groups after they are created. |
We can compare NLB with CLB.
There was a problem hiding this comment.
Thanks. Applying the suggestion in the next commit.
| - a) decreases the amount of provider-specific changes on CIO; | ||
| - b) decreases the amount of maintained code/projects by the team (e.g., ALBC); | ||
| - c) enhances new configurations to the Ingress Controller when using NLB; | ||
| - d) decreases the amount of images in the core payload; |
There was a problem hiding this comment.
is this decrease in reference to the ALBC?
There was a problem hiding this comment.
Correct, ALBC + ALBO would be required if CIO defaults to ALBC
There was a problem hiding this comment.
i might say this as "does not increase the amount of images in the core payload"
|
|
||
| ## Alternatives (Not Implemented) | ||
|
|
||
| > TODO/TBD |
There was a problem hiding this comment.
i think it's worth mentioning the idea of making the ALBC functionality into a module that can be imported into the CCM as something we should investigate for the future.
There was a problem hiding this comment.
Good idea, Added! Thanks!
| - a) decreases the amount of provider-specific changes on CIO; | ||
| - b) decreases the amount of maintained code/projects by the team (e.g., ALBC); | ||
| - c) enhances new configurations to the Ingress Controller when using NLB; | ||
| - d) decreases the amount of images in the core payload; |
There was a problem hiding this comment.
Correct, ALBC + ALBO would be required if CIO defaults to ALBC
| ## Graduation Criteria | ||
|
|
||
| > TODO/TBD | ||
|
|
||
| ### Dev Preview -> Tech Preview | ||
|
|
||
| N/A. This feature will be introduced as Tech Preview (TBReviewed). | ||
|
|
||
| ### Tech Preview -> GA | ||
|
|
||
| The E2E tests should be consistently passing, and a PR will be created to enable the feature gate by default. |
There was a problem hiding this comment.
Expand the FG added here openshift/api#2354
Initially we've been asked to go directly to TP, but considering the impact of this change (default to SG) we are considering starting from DP. We are evaluating the velocity in upstream and how fast we can move it.
|
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
|
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
|
@mtulio: The DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/reopen |
|
@mtulio: Reopened this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@mtulio: This pull request references SPLAT-2137 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.22." or "openshift-4.22.", but it targets "openshift-4.20" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Interim update:
|
|
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
|
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
|
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
|
||
| AWS [announced support for Security Groups when deploying an NLB in August 2023][nlb-supports-sg], but the CCM for AWS (within kubernetes/cloud-provider-aws) does not currently implement the feature of automatically creating and managing security groups for `Service` resources type-LoadBalancer using NLBs. While the [AWS Load Balancer Controller (ALBC/LBC)][aws-lbc] project already supports deploying security groups for NLBs, this enhancement focuses on adding minimal, opt-in support to the existing CCM to address immediate customer needs without a full migration to the LBC. This approach aims to provide the necessary functionality without requiring significant changes in other OpenShift components like the Ingress Controller, installer, ROSA, etc. | ||
|
|
||
| Using a Network Load Balancer is a recommended network-based Load Balancer by AWS, and attaching a Security Group to an NLB is a security best practice. NLBs also do not support attaching security groups after they are created. |
There was a problem hiding this comment.
(nit) I would slightly clarify the last portion of the sentence as:
| Using a Network Load Balancer is a recommended network-based Load Balancer by AWS, and attaching a Security Group to an NLB is a security best practice. NLBs also do not support attaching security groups after they are created. | |
| Using a Network Load Balancer is a recommended network-based Load Balancer by AWS, and attaching a Security Group to an NLB is a security best practice. NLBs initially created without an associated Security Group do not support Security Group association after creation. |
The reason is that if a NLB was initally provisioned with a Security Group, then one can associate new SGs with it after creation, the limitation holds only if the NLB was originally provisioned without a SG.
There was a problem hiding this comment.
Great, good suggestion. I am updating in a batch commit.
|
|
||
| The CCM, the controller which manages the `Service` resource, will have a global configuration on cloud-config to signalize the controller to manage the Security Group by default when creating a Service type-LoadBalancer NLB - annotation `service.beta.kubernetes.io/aws-load-balancer-type` set to `nlb`. This change paves the path to default the controller to managed security groups, following the same path AWS LBC defaults to since version v2.6.0. | ||
|
|
||
| The controller must create and manage the entire lifecycle of the Security Group resource when the load balancer is created, update the SG ingress rules according to the NLB Listeners configurations, and the Egress Rules according to the Target Group configurations. |
There was a problem hiding this comment.
I believe we need to be explicit that the current proposal won't allow users using the "managed" (not BYO) Security Group to customize the ingress rules of the security group, essentially only allowing in all internet traffic or blocking it entirely.
This is an important limitation in my opinion since the ability to selectively limit inbound traffic by incoming IP CIDR ranges is of the core security capabilities of Security Groups. Furthermore this gap will increase the importance of the BYO Security Group feature for those customers needing this capability.
Additionally, we could consider to assess the feasibility of adding this capability e.g. as a "Phase 4" feature, after BYO SG implementation, maybe using the ingresscontroller.spec.endpointPublishingStrategy.loadBalancer.allowedSourceRanges property of an Ingress Controller and/or the standard K8s loadBalancerSourceRanges property of a Service to read and configure the Security Group rules.
There was a problem hiding this comment.
That's a great call.
Good point, @mfbonfigli . I've added this information:
https://github.com/openshift/enhancements/pull/1802/changes#diff-84882e6fc6fb023742b0ac09960b79620cfea983c45def4739a89fd404cdc05aR339
Allowing custom sources is really a good idea to enhance security keeping the automation. I can see two paths here:
- Users can use BYO SG approach, so they will manage the SG in their end, including all SG rules.
- (your suggestion) CCM support that custom annotation with custom rules.
As this was not requested by the Epic, I would defer to component SME @JoelSpeed and @alebedev87 to share if we need to plan adding this feature (not sure if we will be able to implement soon cc @rvanderp3 ), or defer this for later.
Thanks!
There was a problem hiding this comment.
Replying here to my own comment, I did some digging and it seems that the upstream AWS CCM actually already implements support for NLB SGs and source IP range restrictions via both the loadBalancerSourceRanges property of the Service and also via the service.beta.kubernetes.io/load-balancer-source-ranges annotation.
The annotation support in particular does not seem to be explicitly documented in AWS CCM but nonetheless it works because source ranges are derived by AWS CCM through the helper function contained in the upstream library K8s Cloud Provider library, which checks both the service definition and the annotation content, in this order.
There was a problem hiding this comment.
Good point @mfbonfigli , thanks for digging into this. So I think my last comment can be ignore. We just need to make sure this functionality inherited automatically from the cloud-provider library is tested and documented on OpenShift as part of Phase 2. I will update the EP accordainly. 👍🏽
|
Reviewing planning in Hypershift phase, as well addressing recent PR updates. /reopen |
|
@mtulio: Reopened this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@mtulio: This pull request references SPLAT-2137 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.22." or "openshift-4.22.", but it targets "openshift-4.20" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@mtulio: The DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This commit addresses unresolved feedback from PR openshift#1802, particularly around Phase 2b HyperShift implementation and other technical gaps: Phase 2b - HyperShift/ROSA HCP Implementation: - Documented that HyperShift does NOT use CCCMO (critical architectural difference) - Explained Control Plane Operator's role in managing cloud-config ConfigMap - Detailed cluster-scoped feature gate evaluation from HostedControlPlane.Spec.Configuration.FeatureGate - Provided implementation approach for CPO's AWS CCM config adapter - Removed blocking TODO from Hypershift section Phase 2 Restructuring: - Split Phase 2 into clearly defined sub-phases (2a: Self-Managed/ROSA Classic, 2b: ROSA HCP) - Added specific implementation goals for each architecture - Clarified CCCMO vs CPO usage patterns Workflow Descriptions: - Separated ROSA Classic and ROSA HCP workflows to show architectural differences - Added detailed step-by-step flows for both deployment models - Clarified component roles (CPO, CCCMO, CIO, CCM) Implementation Details Enhancements: - Documented limitation: managed SG does not support custom ingress CIDR filtering (addresses comment #3017312641) - Added explicit IAM permissions list required for CCM service account - Converted TODO items into concrete Security Group naming convention details - Noted that custom CIDR filtering requires BYO SG (Phase 3) Phase 3 Clarifications: - Resolved TBD about backend security group rule management annotation - Clarified it is deferred to future phase - Specified exact ALBC annotation names for consistency Other Improvements: - Enhanced ROSA Classic section with specific CCCMO enforcement details - Expanded Single-Node Deployments section with clear guidance - Fixed grammar: "an Classic" → "a Classic" - Multiple wording improvements for technical clarity These changes leverage knowledge from recent HyperShift implementation work where cluster-scoped feature gate evaluation was implemented for AWS CCM configuration. Grammar Reviewed by Claude Code. Hypershift implementation coverage crated by Claude Code. Signed-off-by: Marco Braga <mrbraga@redhat.com> Assisted-by: Claude Sonnet 4.5 (via Cursor)
mtulio
left a comment
There was a problem hiding this comment.
Thanks you all for review/feedback.
I just updated pending comments as well, the pending phase that was uncertain for now: ROSA HCP/Hypershift.
|
|
||
| AWS [announced support for Security Groups when deploying an NLB in August 2023][nlb-supports-sg], but the CCM for AWS (within kubernetes/cloud-provider-aws) does not currently implement the feature of automatically creating and managing security groups for `Service` resources type-LoadBalancer using NLBs. While the [AWS Load Balancer Controller (ALBC/LBC)][aws-lbc] project already supports deploying security groups for NLBs, this enhancement focuses on adding minimal, opt-in support to the existing CCM to address immediate customer needs without a full migration to the LBC. This approach aims to provide the necessary functionality without requiring significant changes in other OpenShift components like the Ingress Controller, installer, ROSA, etc. | ||
|
|
||
| Using a Network Load Balancer is a recommended network-based Load Balancer by AWS, and attaching a Security Group to an NLB is a security best practice. NLBs also do not support attaching security groups after they are created. |
There was a problem hiding this comment.
Great, good suggestion. I am updating in a batch commit.
|
|
||
| AWS [announced support for Security Groups when deploying an NLB in August 2023][nlb-supports-sg], but the CCM for AWS (within kubernetes/cloud-provider-aws) does not currently implement the feature of automatically creating and managing security groups for `Service` resources type-LoadBalancer using NLBs. While the [AWS Load Balancer Controller (ALBC/LBC)][aws-lbc] project already supports deploying security groups for NLBs, this enhancement focuses on adding minimal, opt-in support to the existing CCM to address immediate customer needs without a full migration to the LBC. This approach aims to provide the necessary functionality without requiring significant changes in other OpenShift components like the Ingress Controller, installer, ROSA, etc. | ||
|
|
||
| Using a Network Load Balancer is a recommended network-based Load Balancer by AWS, and attaching a Security Group to an NLB is a security best practice. NLBs also do not support attaching security groups after they are created. |
There was a problem hiding this comment.
Thanks. Applying the suggestion in the next commit.
|
|
||
| - Introduce Annotations to CCM to allow BYO SG to Service type-LoadBalancer NLB to opt-out the global `Managed` security group configuration. | ||
| - The annotation must follow the same standard as ALBC. Must be optional. | ||
| - (TBD if it is required) An annotation to allow managing backend rules must be added to prevent manual changes by the user. Must be opt-out by default |
There was a problem hiding this comment.
Not yet, I think it will be out of the scope of this deliverable, unless @alebedev87 think this is something required by NE-1792. cc @mfbonfigli
|
|
||
| The CCM, the controller which manages the `Service` resource, will have a global configuration on cloud-config to signalize the controller to manage the Security Group by default when creating a Service type-LoadBalancer NLB - annotation `service.beta.kubernetes.io/aws-load-balancer-type` set to `nlb`. This change paves the path to default the controller to managed security groups, following the same path AWS LBC defaults to since version v2.6.0. | ||
|
|
||
| The controller must create and manage the entire lifecycle of the Security Group resource when the load balancer is created, update the SG ingress rules according to the NLB Listeners configurations, and the Egress Rules according to the Target Group configurations. |
There was a problem hiding this comment.
That's a great call.
Good point, @mfbonfigli . I've added this information:
https://github.com/openshift/enhancements/pull/1802/changes#diff-84882e6fc6fb023742b0ac09960b79620cfea983c45def4739a89fd404cdc05aR339
Allowing custom sources is really a good idea to enhance security keeping the automation. I can see two paths here:
- Users can use BYO SG approach, so they will manage the SG in their end, including all SG rules.
- (your suggestion) CCM support that custom annotation with custom rules.
As this was not requested by the Epic, I would defer to component SME @JoelSpeed and @alebedev87 to share if we need to plan adding this feature (not sure if we will be able to implement soon cc @rvanderp3 ), or defer this for later.
Thanks!
|
@mtulio: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Enhancement proposal to introduce the support of Security Group to the Service type-loadBalancer NLBs to the AWS Cloud Controller Manager (CCM), ensuring OpenShift sets the default configuration to teach to manage CCM to new NLBs.
https://issues.redhat.com/browse/OCPSTRAT-1553
https://issues.redhat.com/browse/SPLAT-2137