-
Notifications
You must be signed in to change notification settings - Fork 1.5k
pkg: AWS shared-subnet handling in Go #2477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg: AWS shared-subnet handling in Go #2477
Conversation
971bc76 to
9da9738
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like incorrect place to make this assumptions. This should be done as validation and not part of the behaviour of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#DescribeSubnetsOutput is paged. so we might be missing some results.
Also bumping to atleast v1.19.30 should be fine.. any reason why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also bumping to atleast v1.19.30 should be fine.. any reason why not?
I'm lazy? Minimal viable implementation? Something in that space ;).
|
it would be great if we can keep d472beed9f98313ccbe3594c4760f76f312db5e1 in it's PR and not create dependencies between the PRs... |
This PR depends on #2467, just land that first and I'll rebase on on master once it lands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all consumers are having to iterate to split the public from private, could we returns 2 lists??
also we can keep the uniquefication internal to the function and return lists maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all consumers are having to iterate to split the public from private, could we returns 2 lists??
Done since the reroll to build on #2512.
also we can keep the uniquefication internal to the function and return lists maybe.
It's still returning maps, which allows me to keep the IDs out of the Subnet metadata type. Is there a benefit to using slices instead of maps?
eb6a6db to
167050a
Compare
167050a to
72c533b
Compare
72c533b to
b98a00d
Compare
b98a00d to
80a2521
Compare
80a2521 to
06817a1
Compare
|
/approve @wking if you have been testing this with a setup, it would be great if you can provide that example in a comment here, otherwise I'll try to test this out |
This allows users to feed in prexisting subnets. I've also added classification logic copied from the upstream Kubernetes AWS cloud provider for categorizing private vs. public subnets. There's no explicit install-config field for the VPC; we'll extract that from the given subnets. populateSubnets is a bit heavy if all you need is the VPC, but Abhinav prefers it [1], it would save future public/private subnet lookup by warming those caches, and we'll only call Metadata.AWS late in pkg/asset/cluster/tfvars.go (via a future commit), so the VPC cache-warming logic doesn't get called at the moment anyway. There's no verification yet; we'll get to that in follow-up work. More on the DescribeSubnetsPagesWithContext FIXME in 37a7f49 (pkg/destroy/aws: Delete subnets by VPC, 2019-08-13, openshift#2214). [1]: openshift#2477 (comment)
06817a1 to
a8b7a9d
Compare
User-provided VPCs may not have our convenient
{clusterID}-private-{zone} tag to search on. We can get close with:
$ aws ec2 describe-subnets --filters Name=tag-key,Values=kubernetes.io/cluster/${clusterID} Name=availability-zone,Values=us-west-2a
or similar, but that still returns both the private and public subnets
in that zone. I couldn't figure out a generic way to find
user-provided subnets by tags, so with this commit I'm switching on
"are our subnets user-provided?", and if they are I'm attaching
MachineSets to them by subnet ID.
While I was adjusting the call signatures, I also exploded the types
in the Machines, MachineSets, and provider (e.g. to take a region
string, etc. instead of an InstallConfig type). This more clearly
separates information that is being applied at a higher level
(e.g. AMI lookup for osImage, vs. InstallConfig.Platform.AWS.AMIID)
and decouples folks who are calling MachineSets externally (like Hive
[1]) from some of our internal data types.
[1]: https://github.com/openshift/hive/blob/8ec56c103aba24ed216b432fa3d3e592d677d4e2/pkg/controller/remotemachineset/remotemachineset_controller.go#L374
So the kubelet cloud-provider can place load balancers appropriately.
This also helps make it clear to other consumers that we are using
these resources for the cluster ("hmm, I wonder if it would be a bad
idea to delete this subnet? Ah, yeah, I don't want to break cluster
example-123"). But because AWS only allows 50 tags per resource [1],
we're not tagging shared resources like the VPC or Route 53 zones
because those don't need to be tagged to get a working cluster. Users
are free to add 'shared' tags to resources like that as they see fit.
I also considered using the EC2-specific API [2], but went with the
generic tag API in case we wanted to tag Route 53 zones or other
non-EC2 resources as well in the future.
Tag before entering Terraform to claim the space before we actually
put cluster resources inside the subnets. That keeps folks from
unknowingly reaping them before we start adding resources, and it
ensures we have space for the new tags.
The 20-tag limit that leads to the loop is from [3]. We're unlikely
to exceed 20 with just the VPC and subnets, but it seemed reasonable
to plan for the future to avoid surprises if we grow this list going
forward.
[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#tag-restrictions
[2]: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateTags.html
[3]: https://docs.aws.amazon.com/sdk-for-go/api/service/resourcegroupstaggingapi/#TagResourcesInput
No sense in carrying these around in Go and then creating new subnets in Terraform anyway ;).
a8b7a9d to
b27d633
Compare
|
Verifying along the lines of my earlier Terraform verification: $ hack/build.sh
$ openshift-install version
openshift-install unreleased-master-1953-gb27d6333e72cd37f52cf14c7b67053c26e3c2ec3
built from commit b27d6333e72cd37f52cf14c7b67053c26e3c2ec3
release image registry.svc.ci.openshift.org/origin/release:4.3
$ aws cloudformation create-stack --stack-name wking-vpc --template-body "$(cat upi/aws/cloudformation/01_vpc.yaml)" --parameters ParameterKey=AvailabilityZoneCount,ParameterValue=3
$ aws cloudformation wait stack-create-complete --stack-name wking-vpc
$ SUBNETS="$(aws cloudformation describe-stacks --stack-name wking-vpc --query 'Stacks[].Outputs[]' --output json | jq -c '[.[] | select(.OutputKey | contains("SubnetIds")).OutputValue | split(",")[]]')"
$ mkdir wking
$ cat <<EOF >wking/install-config.yaml
> apiVersion: v1
> baseDomain: devcluster.openshift.com
> metadata:
> name: wking
> platform:
> aws:
> region: us-west-2
> subnets: ${SUBNETS}
> pullSecret: REDACTED
> EOF
$ openshift-install --dir wking create cluster
...
INFO Install complete!
...
$ cat wking/.openshift_install.log
...
time="2019-10-16T18:11:26-07:00" level=info msg="Creating infrastructure resources..."
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-04b524467fcbebb3b with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-0b74b7ab3e29783db with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-049539be7d117cae1 with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-04b3ea91f964858cf with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-0b84d153a475515da with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-0fb9f7769baee50e8 with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:27-07:00" level=debug msg="Symlinking plugin terraform-provider-random src: \"/home/trking/.local/lib/go/src/github.com/openshift/installer/bin/openshift-install\" dst: \"/tmp/openshift-install-205760094/plugins/terraform-provider-random\""
...
time="2019-10-16T18:11:28-07:00" level=debug msg="If you ever set or change modules or backend configuration for Terraform,"
time="2019-10-16T18:11:28-07:00" level=debug msg="rerun this command to reinitialize your working directory. If you forget, other"
time="2019-10-16T18:11:28-07:00" level=debug msg="commands will detect it and remind you to do so if necessary."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.dns.data.aws_route53_zone.public: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.private[0]: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_vpc.cluster_vpc: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.private[2]: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.public[0]: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.private[1]: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.public[2]: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.public[1]: Refreshing state..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.bootstrap.aws_iam_role.bootstrap: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.iam.aws_iam_role.worker_role: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.masters.aws_iam_role.master_role: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.vpc.aws_security_group.worker: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.vpc.aws_security_group.master: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.vpc.aws_lb_target_group.services: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.vpc.aws_lb.api_internal: Creating..."
... |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Catching up with 32356dd (pkg/types/aws/platform: Add Subnets property, 2019-10-07, openshift#2477).
This allows users to feed in prexisting subnets. I've also added classification logic copied from the upstream Kubernetes AWS cloud provider for categorizing private vs. public subnets. There's no explicit install-config field for the VPC; we'll extract that from the given subnets. populateSubnets is a bit heavy if all you need is the VPC, but Abhinav prefers it [1], it would save future public/private subnet lookup by warming those caches, and we'll only call Metadata.AWS late in pkg/asset/cluster/tfvars.go (via a future commit), so the VPC cache-warming logic doesn't get called at the moment anyway. There's no verification yet; we'll get to that in follow-up work. More on the DescribeSubnetsPagesWithContext FIXME in 37a7f49 (pkg/destroy/aws: Delete subnets by VPC, 2019-08-13, openshift#2214). [1]: openshift#2477 (comment)
Catching up with 32356dd (pkg/types/aws/platform: Add Subnets property, 2019-10-07, openshift#2477).
Building on #2438 (which has landed) and #2467 (which I'm building on top of in this PR; review it first), this PR adds subnet handling to install-config and the Go asset handling. Fiddly details in the commit messages ;).