Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Oct 8, 2019

Building on #2438 (which has landed) and #2467 (which I'm building on top of in this PR; review it first), this PR adds subnet handling to install-config and the Go asset handling. Fiddly details in the commit messages ;).

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 8, 2019
@wking wking force-pushed the aws-bring-your-own-vpc branch from 971bc76 to 9da9738 Compare October 8, 2019 23:52
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 8, 2019
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like incorrect place to make this assumptions. This should be done as validation and not part of the behaviour of this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#DescribeSubnetsOutput is paged. so we might be missing some results.

Also bumping to atleast v1.19.30 should be fine.. any reason why not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also bumping to atleast v1.19.30 should be fine.. any reason why not?

I'm lazy? Minimal viable implementation? Something in that space ;).

@abhinavdahiya
Copy link
Contributor

it would be great if we can keep d472beed9f98313ccbe3594c4760f76f312db5e1 in it's PR and not create dependencies between the PRs...

@wking
Copy link
Member Author

wking commented Oct 9, 2019

it would be great if we can keep d472bee in it's PR and not create dependencies between the PRs.

This PR depends on #2467, just land that first and I'll rebase on on master once it lands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all consumers are having to iterate to split the public from private, could we returns 2 lists??

also we can keep the uniquefication internal to the function and return lists maybe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all consumers are having to iterate to split the public from private, could we returns 2 lists??

Done since the reroll to build on #2512.

also we can keep the uniquefication internal to the function and return lists maybe.

It's still returning maps, which allows me to keep the IDs out of the Subnet metadata type. Is there a benefit to using slices instead of maps?

@wking wking force-pushed the aws-bring-your-own-vpc branch 4 times, most recently from eb6a6db to 167050a Compare October 14, 2019 23:04
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 14, 2019
@wking wking force-pushed the aws-bring-your-own-vpc branch from 167050a to 72c533b Compare October 14, 2019 23:07
@wking wking force-pushed the aws-bring-your-own-vpc branch from 72c533b to b98a00d Compare October 15, 2019 03:38
@openshift-ci-robot openshift-ci-robot removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 15, 2019
@wking wking force-pushed the aws-bring-your-own-vpc branch from b98a00d to 80a2521 Compare October 15, 2019 03:40
@wking wking force-pushed the aws-bring-your-own-vpc branch from 80a2521 to 06817a1 Compare October 15, 2019 23:54
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 15, 2019
@abhinavdahiya
Copy link
Contributor

/approve

@wking if you have been testing this with a setup, it would be great if you can provide that example in a comment here, otherwise I'll try to test this out

This allows users to feed in prexisting subnets.  I've also added
classification logic copied from the upstream Kubernetes AWS cloud
provider for categorizing private vs. public subnets.  There's no
explicit install-config field for the VPC; we'll extract that from the
given subnets.

populateSubnets is a bit heavy if all you need is the VPC, but Abhinav
prefers it [1], it would save future public/private subnet lookup by
warming those caches, and we'll only call Metadata.AWS late in
pkg/asset/cluster/tfvars.go (via a future commit), so the VPC
cache-warming logic doesn't get called at the moment anyway.

There's no verification yet; we'll get to that in follow-up work.

More on the DescribeSubnetsPagesWithContext FIXME in 37a7f49
(pkg/destroy/aws: Delete subnets by VPC, 2019-08-13, openshift#2214).

[1]: openshift#2477 (comment)
@wking wking force-pushed the aws-bring-your-own-vpc branch from 06817a1 to a8b7a9d Compare October 17, 2019 01:05
wking added 3 commits October 16, 2019 18:08
User-provided VPCs may not have our convenient
{clusterID}-private-{zone} tag to search on.  We can get close with:

  $ aws ec2 describe-subnets --filters Name=tag-key,Values=kubernetes.io/cluster/${clusterID} Name=availability-zone,Values=us-west-2a

or similar, but that still returns both the private and public subnets
in that zone.  I couldn't figure out a generic way to find
user-provided subnets by tags, so with this commit I'm switching on
"are our subnets user-provided?", and if they are I'm attaching
MachineSets to them by subnet ID.

While I was adjusting the call signatures, I also exploded the types
in the Machines, MachineSets, and provider (e.g. to take a region
string, etc. instead of an InstallConfig type).  This more clearly
separates information that is being applied at a higher level
(e.g. AMI lookup for osImage, vs. InstallConfig.Platform.AWS.AMIID)
and decouples folks who are calling MachineSets externally (like Hive
[1]) from some of our internal data types.

[1]: https://github.com/openshift/hive/blob/8ec56c103aba24ed216b432fa3d3e592d677d4e2/pkg/controller/remotemachineset/remotemachineset_controller.go#L374
So the kubelet cloud-provider can place load balancers appropriately.

This also helps make it clear to other consumers that we are using
these resources for the cluster ("hmm, I wonder if it would be a bad
idea to delete this subnet?  Ah, yeah, I don't want to break cluster
example-123").  But because AWS only allows 50 tags per resource [1],
we're not tagging shared resources like the VPC or Route 53 zones
because those don't need to be tagged to get a working cluster.  Users
are free to add 'shared' tags to resources like that as they see fit.
I also considered using the EC2-specific API [2], but went with the
generic tag API in case we wanted to tag Route 53 zones or other
non-EC2 resources as well in the future.

Tag before entering Terraform to claim the space before we actually
put cluster resources inside the subnets.  That keeps folks from
unknowingly reaping them before we start adding resources, and it
ensures we have space for the new tags.

The 20-tag limit that leads to the loop is from [3].  We're unlikely
to exceed 20 with just the VPC and subnets, but it seemed reasonable
to plan for the future to avoid surprises if we grow this list going
forward.

[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#tag-restrictions
[2]: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateTags.html
[3]: https://docs.aws.amazon.com/sdk-for-go/api/service/resourcegroupstaggingapi/#TagResourcesInput
No sense in carrying these around in Go and then creating new subnets
in Terraform anyway ;).
@wking wking force-pushed the aws-bring-your-own-vpc branch from a8b7a9d to b27d633 Compare October 17, 2019 01:08
@wking
Copy link
Member Author

wking commented Oct 17, 2019

Verifying along the lines of my earlier Terraform verification:

$ hack/build.sh
$ openshift-install version
openshift-install unreleased-master-1953-gb27d6333e72cd37f52cf14c7b67053c26e3c2ec3
built from commit b27d6333e72cd37f52cf14c7b67053c26e3c2ec3
release image registry.svc.ci.openshift.org/origin/release:4.3
$ aws cloudformation create-stack  --stack-name wking-vpc --template-body "$(cat upi/aws/cloudformation/01_vpc.yaml)" --parameters ParameterKey=AvailabilityZoneCount,ParameterValue=3
$ aws cloudformation wait stack-create-complete --stack-name wking-vpc
$ SUBNETS="$(aws cloudformation describe-stacks --stack-name wking-vpc --query 'Stacks[].Outputs[]' --output json | jq -c '[.[] | select(.OutputKey | contains("SubnetIds")).OutputValue | split(",")[]]')"
$ mkdir wking
$ cat <<EOF >wking/install-config.yaml
> apiVersion: v1
> baseDomain: devcluster.openshift.com
> metadata:
>   name: wking
> platform:
>   aws:
>     region: us-west-2
>     subnets: ${SUBNETS}
> pullSecret: REDACTED
> EOF
$ openshift-install --dir wking create cluster
...
INFO Install complete!
...
$ cat wking/.openshift_install.log
...
time="2019-10-16T18:11:26-07:00" level=info msg="Creating infrastructure resources..."
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-04b524467fcbebb3b with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-0b74b7ab3e29783db with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-049539be7d117cae1 with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-04b3ea91f964858cf with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-0b84d153a475515da with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:26-07:00" level=debug msg="Tagging arn:aws:ec2:us-west-2:269733383066:subnet/subnet-0fb9f7769baee50e8 with kubernetes.io/cluster/wking-nk9x9: shared"
time="2019-10-16T18:11:27-07:00" level=debug msg="Symlinking plugin terraform-provider-random src: \"/home/trking/.local/lib/go/src/github.com/openshift/installer/bin/openshift-install\" dst: \"/tmp/openshift-install-205760094/plugins/terraform-provider-random\""
...
time="2019-10-16T18:11:28-07:00" level=debug msg="If you ever set or change modules or backend configuration for Terraform,"
time="2019-10-16T18:11:28-07:00" level=debug msg="rerun this command to reinitialize your working directory. If you forget, other"
time="2019-10-16T18:11:28-07:00" level=debug msg="commands will detect it and remind you to do so if necessary."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.dns.data.aws_route53_zone.public: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.private[0]: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_vpc.cluster_vpc: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.private[2]: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.public[0]: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.private[1]: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.public[2]: Refreshing state..."
time="2019-10-16T18:11:32-07:00" level=debug msg="module.vpc.data.aws_subnet.public[1]: Refreshing state..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.bootstrap.aws_iam_role.bootstrap: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.iam.aws_iam_role.worker_role: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.masters.aws_iam_role.master_role: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.vpc.aws_security_group.worker: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.vpc.aws_security_group.master: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.vpc.aws_lb_target_group.services: Creating..."
time="2019-10-16T18:11:37-07:00" level=debug msg="module.vpc.aws_lb.api_internal: Creating..."
...

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 b27d633 link /test e2e-aws-scaleup-rhel7
ci/prow/e2e-openstack b27d633 link /test e2e-openstack

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 3e2f62c into openshift:master Oct 17, 2019
@wking wking deleted the aws-bring-your-own-vpc branch October 17, 2019 17:46
wking added a commit to wking/openshift-installer that referenced this pull request Oct 18, 2019
Catching up with 32356dd (pkg/types/aws/platform: Add Subnets
property, 2019-10-07, openshift#2477).
jhixson74 pushed a commit to jhixson74/installer that referenced this pull request Dec 6, 2019
This allows users to feed in prexisting subnets.  I've also added
classification logic copied from the upstream Kubernetes AWS cloud
provider for categorizing private vs. public subnets.  There's no
explicit install-config field for the VPC; we'll extract that from the
given subnets.

populateSubnets is a bit heavy if all you need is the VPC, but Abhinav
prefers it [1], it would save future public/private subnet lookup by
warming those caches, and we'll only call Metadata.AWS late in
pkg/asset/cluster/tfvars.go (via a future commit), so the VPC
cache-warming logic doesn't get called at the moment anyway.

There's no verification yet; we'll get to that in follow-up work.

More on the DescribeSubnetsPagesWithContext FIXME in 37a7f49
(pkg/destroy/aws: Delete subnets by VPC, 2019-08-13, openshift#2214).

[1]: openshift#2477 (comment)
jhixson74 pushed a commit to jhixson74/installer that referenced this pull request Dec 6, 2019
Catching up with 32356dd (pkg/types/aws/platform: Add Subnets
property, 2019-10-07, openshift#2477).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants