Skip to content

WIP: CAPI AWS implementation#7879

Closed
patrickdillon wants to merge 23 commits intoopenshift:masterfrom
patrickdillon:capi-aws-implementation
Closed

WIP: CAPI AWS implementation#7879
patrickdillon wants to merge 23 commits intoopenshift:masterfrom
patrickdillon:capi-aws-implementation

Conversation

@patrickdillon
Copy link
Contributor

This is a WIP implementation of installing AWS using CAPI providers and the interface from #7824. The intent is to provide a reference for installing a complete, production-level cluster.

Depends on #7824
(the first four commits are from that PR)

Moves tfvars to its own package to decouple it from the cluster asset.
Updates the infrastructure provider interface to accept Parent assets--
rather than a list of files. This allows for easier handling of assets
by the infrastructure provider.
Moves LoadMetadata to a separate package to allow implementers of
the infrastructure provider interface to utilize the function. The
cluster asset/package does not use LoadMetadata but does depend on
the implementers of the interface. Moving to a separate package
breaks the dependency loop.
Implements the infrastructure provider interface with the CAPI system.
This encapsulates the CAPI implementation similar to Terraform. It also
maintains pkg/infrastructure/platform.go (and build variants) as the
canonical source of truth for choosing an infrastructure provider.

This also adds an interface that cloud platforms utilizing the CAPI
provisioning should implement to provision additional resources.

Add installconfig to capi preprovision input

infrastructure/clusterapi: execute control plane available hook

SQUASH: CAPI bootstrap destroy
Adds an initial implementation of the CAPI infrastructure provider
interface for AWS.
Create IAM roles to be used by control plane and compute nodes.
Add creation of dns records when control plane endpoint becomes
available.
Adds filters to the machine api machinesets so they will recognize
CAPI-created security groups and subnets. Adds these filters in
addition to the Terraform-specific filters, which should be
subsequently removed.

fixup sg filter
Export load balancer function for use with CAPI provisioning workflow.
This can be merged if we refactor the function to allow
target type to be set, but at this stage it is not worth
refactoring.
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 8, 2024
@openshift-ci openshift-ci bot requested review from mtulio and rwsu January 8, 2024 16:47
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 8, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from patrickdillon. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

"path/filepath"

"github.com/openshift/installer/pkg/types"
"github.com/pkg/errors"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not introduce this lib in new code.


var metadata *types.ClusterMetadata
if err = json.Unmarshal(raw, &metadata); err != nil {
return nil, errors.Wrapf(err, "failed to Unmarshal data from %q to types.ClusterMetadata", path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, errors.Wrapf(err, "failed to Unmarshal data from %q to types.ClusterMetadata", path)
return nil, fmt.Errorf("failed to Unmarshal data from %q to types.ClusterMetadata: %w", path, err)

"github.com/aws/aws-sdk-go/service/ec2"
"github.com/aws/aws-sdk-go/service/elbv2"
"github.com/aws/aws-sdk-go/service/route53"
"github.com/pkg/errors"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/awserr"
"github.com/aws/aws-sdk-go/service/iam"
"github.com/pkg/errors"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

// https://docs.aws.amazon.com/sdk-for-go/api/service/iam/#IAM.CreateRole
session, err := ic.AWS.Session(context.TODO())
if err != nil {
return errors.Wrap(err, "failed to load AWS session")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrap(err, "failed to load AWS session")
return fmt.Errorf("failed to load AWS session: %w", err)

}
time.Sleep(10 * time.Second)
if err := svc.WaitUntilInstanceProfileExists(&iam.GetInstanceProfileInput{InstanceProfileName: profileName}); err != nil {
return errors.Wrapf(err, "failed to wait for %s role to exist", role)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrapf(err, "failed to wait for %s role to exist", role)
return fmt.Errorf("failed to wait for %s role to exist: %w", role, err)

InstanceProfileName: profileName,
RoleName: roleName,
}); err != nil {
return errors.Wrapf(err, "failed to add %s role to instance profile", role)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrapf(err, "failed to add %s role to instance profile", role)
return fmt.Errorf("failed to add %s role to instance profile: %w", role, err)

"fmt"
"time"

"github.com/pkg/errors"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

masterIgn := string(masterIgnAsset.Files()[0].Data)
bootstrapIgn, err := injectInstallInfo(bootstrapIgnAsset.Files()[0].Data)
if err != nil {
return fileList, errors.Wrap(err, "unable to inject installation info")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fileList, errors.Wrap(err, "unable to inject installation info")
return fileList, fmt.Errorf("unable to inject installation info: %w", err)

Comment on lines +242 to +243
errMsg := fmt.Sprintf("failed to create infrastructure manifest %s from InstallConfig", fileName)
return fileList, errors.Wrapf(err, errMsg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
errMsg := fmt.Sprintf("failed to create infrastructure manifest %s from InstallConfig", fileName)
return fileList, errors.Wrapf(err, errMsg)
return fileList, fmt.Errorf("failed to create infrastructure manifest %s from InstallConfig: %w", fileName, err)

Switches the config so that CAPI creates the internal LB and the
external LB is created with direct SDK calls.

In the opposite configuration, one control plane would always be
NOTREADY. This seems preferred anyway because we will ALWAYS need
an internal LB, but an external LB is not needed with private clusters.

Ultimately this logic will be moved to the CAPA provider.
@patrickdillon
Copy link
Contributor Author

Switched the config so that CAPI creates the int LB (and use SDK to create ext) in ecb4551

Now installs are completing!

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 12, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Embeds the default CAPI provider.
Adds an ignition function to the CAPI provider interface and
moves the default ignition secret generation to the default CAPI
provider implementation.
@bfournie
Copy link
Contributor

/cc @bfournie

@openshift-ci openshift-ci bot requested a review from bfournie January 30, 2024 20:51
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 19, 2024

@patrickdillon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/altinfra-images 6815b69 link true /test altinfra-images
ci/prow/altinfra-e2e-aws-ovn-fips 6815b69 link false /test altinfra-e2e-aws-ovn-fips
ci/prow/e2e-aws-ovn 6815b69 link true /test e2e-aws-ovn
ci/prow/altinfra-e2e-aws-custom-security-groups 6815b69 link false /test altinfra-e2e-aws-custom-security-groups
ci/prow/golint 6815b69 link true /test golint
ci/prow/gofmt 6815b69 link true /test gofmt
ci/prow/altinfra-e2e-aws-ovn-single-node 6815b69 link false /test altinfra-e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-edge-zones-manifest-validation 6815b69 link true /test e2e-aws-ovn-edge-zones-manifest-validation

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants