Skip to content

Conversation

mzazrivec
Copy link
Contributor

@mzazrivec mzazrivec commented Apr 9, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This pull request implements CRD and a controller for provisioning complete networking infrastructure required to install a ROSA-HCP cluster in AWS. The proposal for this implementation has been described in #5381.

Under the hood, the implementation uses cloudformation stack and a static (i.e. no possibility of customization) cloudformation template from rosa-cli

This pull request depends on openshift/rosa#2904 (now merged).

Quick howto:

$ export ROSA_NETWORK_NAME=rosa-net-01
$ export AWS_REGION=us-west-2
$ export AVAILABILITY_ZONE_COUNT=2
$ export CIDR_BLOCK=10.0.0.0/16
$ clusterctl generate yaml --from templates/rosa-network.yaml > rosa-net-01.yaml
$ kubectl apply -f rosa-net-01.yaml

To use the ROSANetwork from ROSA control plane:

apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: ROSAControlPlane
metadata:
  name: rosa-hcp01-control-plane
  namespace: default
spec:
  rosaNetworkRef:
    name: rosa-net01

and skip / remove subnets and availability zones from the CP spec.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

New API for provisioning network infrastructure for ROSA clusters

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority labels Apr 9, 2025
@k8s-ci-robot k8s-ci-robot requested review from faiq and serngawy April 9, 2025 19:27
@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 9, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @mzazrivec. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

webhookClientConfig:
# this is "\n" used as a placeholder, otherwise it will be rejected by the apiserver for being blank,
# but we're going to set it later using the cert-manager (or potentially a patch if not using cert-manager)
caBundle: Cg==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add the caBundle.

Resource string `json:"resource"`

// Identified of the created resource. Will be filled in once the resource is created & ready
ID string `json:"ID"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ID string `json:"ID"`
Id string `json:"id"`

Or resourceId

// CFResource groups information pertaining to a resource created as a part of a cloudformation stack
type CFResource struct {
// Name of the created resource: NATGateway1, VPC, SecurityGroup, ...
Resource string `json:"resource"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Resource string `json:"resource"`
Name string `json:"name"`

OR resourceName

Status string `json:"status"`

// Message pertaining to the status of the resource
Reason string `json:"reason"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message is better I guess ?

Suggested change
Reason string `json:"reason"`
Message string `json:"message"`

// Availability zone of the subnet pair
AvailabilityZone string `json:"availabilityZone"`

// ID of the public subnet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ID of the public subnet
// Public subnet Id ex; subnet-xxxxxxxxxx

main.go Outdated
}
}

// TODO: feature gates?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a new feature gate, we can have it under ROSA feature gate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I did not mean a new feature gate here, just the existing rosa FG.

@serngawy
Copy link
Contributor

you also need to update the ValidatingWebhookConfiguration and MutatingWebhookConfiguration here

@mzazrivec mzazrivec force-pushed the rosa_network branch 4 times, most recently from 5907fb1 to 24a5950 Compare April 24, 2025 13:20
@mzazrivec mzazrivec force-pushed the rosa_network branch 3 times, most recently from a947563 to a255790 Compare May 19, 2025 13:43
Copy link
Contributor

@serngawy serngawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

// If no identity is specified, the default identity for this controller will be used.
//
// +optional
IdentityRef *infrav1.AWSIdentityReference `json:"identityRef,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, if we want to provide this option to end user. We don't do that with RosaControlPlane only default aws identity. However, we should provide OCM identityRef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shouldn't we provide this option to the end user? We need to specify the ref to the aws secret somehow. Here I'm just reusing existing structures & code.

What do you mean by OCM identity ref? OCM will not be involved here in any way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, to use openshift/rosa and establish ocm client you need to have ocm authentication. Is this not the case with the RosaNetwork CF stack creation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. No OCM credentials are needed for rosanet, just AWS credentials.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serngawy Are you satisfied with the answers here?

Copy link
Contributor

@serngawy serngawy Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzazrivec I do remember we discuss that, but after checking the ROSANetwork cloud formation stack template , there are tags added as rosa_hcp_policy and roas service here.
Those tags I think is used to check for privileges ?
I think we have to authenticate the ocm credential. Even if we don't need to create the CF stack but enduser must be a valid OCM user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serngawy What does creating VPC with certain tags and checking OCM credentials have to do with each other?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, as we discussed no need to have ocm authentication.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 28, 2025
@mzazrivec mzazrivec force-pushed the rosa_network branch 4 times, most recently from d2534a7 to dcc599d Compare June 9, 2025 08:19
@mzazrivec mzazrivec force-pushed the rosa_network branch 2 times, most recently from 023f99b to 1729dff Compare June 27, 2025 12:31
@mzazrivec mzazrivec force-pushed the rosa_network branch 10 times, most recently from 8d886df to c6b826d Compare September 3, 2025 15:25
- patches/webhook_in_awsmanagedcontrolplanetemplates.yaml
- patches/webhook_in_eksconfigs.yaml
- patches/webhook_in_eksconfigtemplates.yaml
#- patches/webhook_in_rosanetworks.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment on lines 274 to 302
subnets := make(map[string]*expinfrav1.ROSANetworkSubnet)

for _, resource := range rosaNet.Status.Resources {
if resource.ResourceType != "AWS::EC2::Subnet" { // Skip all non subnets
continue
}

az, err := r.awsClient.GetSubnetAvailabilityZone(resource.PhysicalID)
if err != nil {
return err
}

if subnets[az] == nil {
subnets[az] = &expinfrav1.ROSANetworkSubnet{
AvailabilityZone: az,
PublicSubnet: "",
PrivateSubnet: "",
}
}

if strings.HasPrefix(resource.LogicalID, "SubnetPrivate") {
subnets[az].PrivateSubnet = resource.PhysicalID
} else {
subnets[az].PublicSubnet = resource.PhysicalID
}
}

rosaNet.Status.Subnets = make([]expinfrav1.ROSANetworkSubnet, len(subnets))
for i, v := range slices.Collect(maps.Values(subnets)) {
rosaNet.Status.Subnets[i] = *v
}

return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better golang style and avoiding extra loop

Suggested change
subnets := make(map[string]*expinfrav1.ROSANetworkSubnet)
for _, resource := range rosaNet.Status.Resources {
if resource.ResourceType != "AWS::EC2::Subnet" { // Skip all non subnets
continue
}
az, err := r.awsClient.GetSubnetAvailabilityZone(resource.PhysicalID)
if err != nil {
return err
}
if subnets[az] == nil {
subnets[az] = &expinfrav1.ROSANetworkSubnet{
AvailabilityZone: az,
PublicSubnet: "",
PrivateSubnet: "",
}
}
if strings.HasPrefix(resource.LogicalID, "SubnetPrivate") {
subnets[az].PrivateSubnet = resource.PhysicalID
} else {
subnets[az].PublicSubnet = resource.PhysicalID
}
}
rosaNet.Status.Subnets = make([]expinfrav1.ROSANetworkSubnet, len(subnets))
for i, v := range slices.Collect(maps.Values(subnets)) {
rosaNet.Status.Subnets[i] = *v
}
return nil
subnets := make(map[string]expinfrav1.ROSANetworkSubnet)
for _, resource := range rosaNet.Status.Resources {
if resource.ResourceType != "AWS::EC2::Subnet" {
continue
}
az, err := r.awsClient.GetSubnetAvailabilityZone(resource.PhysicalID)
if err != nil {
return fmt.Errorf("failed to get AZ for subnet %s: %w", resource.PhysicalID, err)
}
subnet := subnets[az]
subnet.AvailabilityZone = az
if strings.HasPrefix(resource.LogicalID, "SubnetPrivate") {
subnet.PrivateSubnet = resource.PhysicalID
} else {
subnet.PublicSubnet = resource.PhysicalID
}
subnets[az] = subnet
}
rosaNet.Status.Subnets = slices.Collect(maps.Values(subnets))
return nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

main.go Outdated

setupLog.Debug("enabling ROSA network controller")
if err = (&expcontrollers.ROSANetworkReconciler{
Client: mgr.GetClient(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add watchFilterValue string

Suggested change
Client: mgr.GetClient(),
Client: mgr.GetClient(),
WatchFilterValue: watchFilterValue,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

// ROSANetworkReconciler reconciles a ROSANetwork object.
type ROSANetworkReconciler struct {
client.Client
Log logr.Logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the WatchFilterValue to the struct

Suggested change
Log logr.Logger
Log logr.Logger
WatchFilterValue string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the webhook patch file as it is not included in kustomization.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@mzazrivec mzazrivec force-pushed the rosa_network branch 2 times, most recently from 95b12df to bd3b722 Compare September 19, 2025 19:43
@damdo
Copy link
Member

damdo commented Sep 23, 2025

/assign @nrb @damdo @richardcase

@damdo
Copy link
Member

damdo commented Sep 30, 2025

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 30, 2025
Comment on lines 158 to 160
func getSessionName(region string, clusterScoper cloud.SessionMetadata) string {
return fmt.Sprintf("%s-%s-%s", region, clusterScoper.InfraClusterName(), clusterScoper.Namespace())
return fmt.Sprintf("%s-%s-%s-%s", region, clusterScoper.ControllerName(), clusterScoper.InfraClusterName(), clusterScoper.Namespace())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the session name in all the places already using this (not only ROSA ones). Is this a backward compatible change? (cc. @richardcase @punkwalker)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it should be okay as when the capa-manager update happen all cache sessions will be re-created with the new session name.
Does the cache stored some how even when the pod re-created, we need to fail safe when loading session fail ?

// Is the referenced ROSANetwork ready yet?
if !conditions.IsTrue(rosaNet, expinfrav1.ROSANetworkReadyCondition) {
rosaScope.Info(fmt.Sprintf("referenced ROSANetwork %s is not ready", rosaNet.Name))
return ctrl.Result{RequeueAfter: time.Minute}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minute seems quite a lot here, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes about 5 minutes to create the cloudformation stack, i.e. approximately 5 cycles through the reconciliation loop. I'm fine with making it smaller (suggestions welcome), but not quite sure how it would help or improve the situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok and do we need to requeue after or are we watching and we should get a reconciliation event anyway?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from nrb. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks for addressing my comments

/assign @nrb @richardcase

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 2, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5f64d3fa3341e2ef3426645208e69a234b4147e3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants