-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Make VPC creation idempotent to avoid indefinite creation of new VPCs if storage of the ID fails #4723
🐛 Make VPC creation idempotent to avoid indefinite creation of new VPCs if storage of the ID fails #4723
Conversation
/test pull-cluster-api-provider-aws-e2e |
1 similar comment
/test pull-cluster-api-provider-aws-e2e |
pkg/cloud/services/network/vpc.go
Outdated
if !vpc.Tags.HasOwned(s.scope.Name()) { | ||
return errors.Errorf("an unmanaged VPC %s already exists, refusing to create another managed VPC", vpc.ID) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VPC today is created with tags within the same call, the error message is confusing to users given that if there are no tags, there isn't another managed VPC
if !vpc.Tags.HasOwned(s.scope.Name()) { | |
return errors.Errorf("an unmanaged VPC %s already exists, refusing to create another managed VPC", vpc.ID) | |
} | |
if !vpc.Tags.HasOwned(s.scope.Name()) { | |
return errors.Errorf("found VPC named %s which cannot be managed by CAPA due to lack of tags, either tag the VPC manually [add tag needed here], or provide the `vpc.id` field instead if you wish to bring your own VPC (link to the doc?)", vpc.ID) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/cloud/services/network/vpc.go
Outdated
} | ||
|
||
if s.scope.VPC().IsIPv6Enabled() != (vpc.IPv6 != nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split this in two lines for readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in both locations
2b579fc
to
811ef3e
Compare
pkg/cloud/services/network/vpc.go
Outdated
actualIPv6Enabled := (vpc.IPv6 != nil) | ||
if s.scope.VPC().IsIPv6Enabled() != actualIPv6Enabled { | ||
return errors.Errorf("IPv6 support of found unmanaged VPC %s differs from desired spec. Changing IP family is currently not supported.", vpc.ID) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a net-new check (repeated in the other function as well); should we remove this for now and add it in a different PR? Generally these checks would be best in a webhook at Update time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The webhook already checks that IPv6 can't be toggled later:
cluster-api-provider-aws/controlplane/eks/api/v1beta2/awsmanagedcontrolplane_webhook.go
Lines 162 to 165 in a1918aa
if oldAWSManagedControlplane.Spec.NetworkSpec.VPC.IsIPv6Enabled() != r.Spec.NetworkSpec.VPC.IsIPv6Enabled() { | |
allErrs = append(allErrs, | |
field.Invalid(field.NewPath("spec", "networkSpec", "vpc", "enableIPv6"), r.Spec.NetworkSpec.VPC.IsIPv6Enabled(), "changing IP family is not allowed after it has been set")) | |
} |
IIRC, my change was to get a clear error message in unit tests, since they can simulate mismatching situations such as someone manually having turned on IPv6 via AWS Console. In such scenarios, the webhook wouldn't be called, but we still want an error instead of being silent about not reconciling the difference. I don't think this would be a user-facing breakage since normally, the reconciled state (here: IPv6 support on/off in the VPC) matches the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @vincepri here, it's better to do it in separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the 2 occurrences of the check, and the new test case for that. I'll take note to re-add this in a separate PR, potentially targeting a later milestone.
/milestone v2.4.0 |
811ef3e
to
0e78bb0
Compare
E2E tests didn't work some weeks ago, it seems. Rebased onto /test pull-cluster-api-provider-aws-e2e |
/test pull-cluster-api-provider-aws-e2e |
… if storage of the ID fails
0e78bb0
to
af9ffa5
Compare
/test pull-cluster-api-provider-aws-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @Ankitasw
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ankitasw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
In case of a repeating storage error, for example
which was caused by something totally different, CAPA would create an indefinite number of VPCs (and potentially other resources depending how far reconciliation succeeds!), filling up the AWS account to its limits and requiring manual cleanup of the whole mess (the one case I had wasn't fun 😆).
This is terrifying but based on a very simple, horrible bug: VPC creation wasn't idempotent. This PR introduces the typical look-up-else-create logic.
Unfortunately, VPC creation tests were faulty because the
mockCtrl
object was shared across all test cases, so the expected mock calls of test case A could be "used" by a unrelated test case B, potentially making a test case pass that normally should fail. Or in short, we weren't testing what we thought we were. Also, some test cases had a description differing from the test, and made no sense overall, so I removed those. The tests now describe that VPC creation should happen only once. On top, since some tests were related to IPv6, I added a check to see if the AWS VPC IPv6 support matches our spec, or fail on mismatch. This made all tests green again. To ensure that we test the right thing in the future, test cases now check for specific errors.Checklist:
Release note: