-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add shared vpc CI job for installer repo and informing periodics #5550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add shared vpc CI job for installer repo and informing periodics #5550
Conversation
|
/test ci/rehearse/openshift/installer/master/e2e-aws-shared-vpc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton hopefully this is okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want whatever new hotness the test-platform folks are cooking up to allow us to mix in assorted flavors in ci-operator. Until then, I think regexp/substring matching like this is fine.
|
/retest |
|
/test pj-rehearse |
|
/retest |
d9767ae to
323cca7
Compare
|
/assign @smarterclayton @wking @stevekuznetsov |
93a1cac to
bdab26d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have committed code somewhere to set these up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, these just exist in the CI account using CF templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want a doc somewhere (maybe just the commit message) for "someone blew away our long-lived subnets, how do we recover?".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, added to the commit message 85ff0d8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Should the explicit subnets: [] be an error? I dunno if it's worth distinguishing between nil/unset and explicitly-an-empty-array. But except for the convenience here, I wouldn't expect folks to set subnets at all unless they were using shared subnets. On the other hand, composing the install-config from a bunch of chunks so we could insert subnets: ... if and only if ${subnets} was not empty seems like a pain, so maybe the convenience is worth it ;). Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The install process not-set/empty the same. So it's not worth it.
Do we need a backstop for untagging these? Or will we just lean in manually and blow the tags off periodically? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use do-not-delete-shared-vnet-centralus and:
vnetname="do-not-delete-shared-vnet-${AZURE_REGION}"instead of the case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done with commit 85ff0d8
|
Looks ok to me for all the stuff outside of the actual fields in the install config. |
the backstop whatever we have for other IPI jobs, not sure how this is different. |
bdab26d to
85ff0d8
Compare
|
ping @wking @smarterclayton |
Not fundamentally different. But the current backstop looks for old VPCs. Things after or orthogonal to VPC removal which do not support direct tagging can leak. For example, we leak records in the public Route 53 zone now. That's not a big deal, because there's space for thousands of stale records. I'm worried that if/when we leak |
@wking |
|
/retest |
|
/lgtm Pull the hold when you're satisfied with the rehearsal results :) |
* `==` vs `=~` the regex match allows the cluster_variant to be a mix like `compact,shared-vpc` * Azure vritual network (an other resources) need to be in the same location as the various cluster resources like virtual machines, load balancers etc. So, The job picks the right resources based on the regions picked. The resources were created using the work-in-progress UPI Azure docs with some modifications [4]. * AWS needs to picks set of subnets from 4 groups because of the limitation on the number of tags on the each resource [1] The max tags allowed per resource is 50, and since we need atleast one tag (kubernetes.io/cluster/<>: shared) on each subnet, and also internal-lb on some subnets, 25% load sharing on 4 sets allows ~200, but since these subnets were created by CloudFormation, there are already 3 tags on each subnet, so its definitely not 200. There is no regions switch on AWS currently, therefore there was no need to crate subnets per region, but whenever that happends we might have to add those. The resources where created using the AWS CF templates for creating VPC [3] ``` aws cloudformation create-stack --stack-policy-body --stack-name do-not-delete-shared-vpc-1 --template-body "$(cat upi/aws/cloudformation/01_vpc.yaml)" --parameters ParameterKey=AvailabilityZoneCount,ParameterValue=3 ``` * GCP has one set of network resources There is no regions switch on GCP currently, therefore there was no need to crate subnets per region, but whenever that happends we might have to add those. The resources where created using the GCP UPI template for creating a VPC [2] ``` gcloud --project openshift-gce-devel-ci deployment-manager deployments create do-no-delete-shared-vpc --config 01_vpc.yaml ``` [1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#tag-restrictions [2]: https://github.com/openshift/installer/blob/release-4.3/upi/gcp/01_vpc.py [3]: https://github.com/openshift/installer/blob/release-4.3/upi/aws/cloudformation/01_vpc.yaml [4]: https://github.com/glennswest/ocpupi4azure
85ff0d8 to
424a04a
Compare
|
because the previous subnets were pruned again, also sent out an email to dp team to whiltelist these resources. |
|
/hold cancel |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, stevekuznetsov, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@abhinavdahiya: Updated the following 6 configmaps:
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@abhinavdahiya: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
…ions Catching up with b717933 (ci-operator/templates/openshift/installer/cluster-launch-installer-*: Random AWS regions for IPI, 2020-01-23, openshift#6833), add shares subnets for the new regions. This is basically as described in 424a04a (Add shared vpc CI job for installer repo and informing periodics, 2019-08-22, openshift#5550), although I've dropped the --stack-policy-body mentioned in that commit message (I dunno what it was about, since the commit message didn't give it the argument that --stack-policy-body expects). Generated using: $ export AWS_PROFILE=ci # or whatever you call it locally $ git fetch origin $ date --iso=m --utc 2020-01-23T20:41+0000 $ git checkout origin/release-4.3 $ git --no-pager log --oneline -1 2055609f9 (HEAD, origin/release-4.3) Merge pull request openshift#2928 from ashcrow/4.3-signed-rhcos-bump with: for REGION in us-east-1 us-west-1 us-west-2 do COUNT=3 if test us-west-1 = "${REGION}" then COUNT=2 fi for INDEX in 1 2 3 4 do NAME="do-not-delete-shared-vpc-${INDEX}" aws --region "${REGION}" cloudformation create-stack --stack-name "${NAME}" --template-body "$(cat upi/aws/cloudformation/01_vpc.yaml)" --parameters "ParameterKey=AvailabilityZoneCount,ParameterValue=${COUNT}" >/dev/null aws --region "${REGION}" cloudformation wait stack-create-complete --stack-name "${NAME}" SUBNETS="$(aws --region "${REGION}" cloudformation describe-stacks --stack-name "${NAME}" | jq -c '[.Stacks[].Outputs[] | select(.OutputKey | endswith("SubnetIds")).OutputValue | split(",")[]]' | sed "s/\"/'/g")" echo "${REGION}_$((INDEX - 1))) subnets=\"${SUBNETS}\";;" done done which spits out: us-east-2_0) subnets="['subnet-0faf6d16c378ee7a7','subnet-0e104572db1b7d092','subnet-014ca96c04f36adec','subnet-0ea06057dceadfe8e','subnet-0689efe5e1f9f4212','subnet-0d36bb8edbcb3d916']";; us-east-2_1) subnets="['subnet-085787cc4b80b84b2','subnet-09dfbf66e8f6e5b50','subnet-0db5d90ff3087444e','subnet-047f15f2a0210fbe0','subnet-0bf13f041c4233849','subnet-0e2a5320549e289d8']";; ... The COUNT bit for us-west-1 is because our CI account only has access to two zones there: $ aws --region us-west-1 ec2 describe-availability-zones --output text AVAILABILITYZONES us-west-1 available usw1-az3 us-west-1a AVAILABILITYZONES us-west-1 available usw1-az1 us-west-1b
==vs=~the regex match allows the cluster_variant to be a mix like
compact,shared-vpcAzure vritual network (an other resources) need to be in the same location as the various cluster resources like virtual machines, load balancers etc.
So, The job picks the right resources based on the regions picked.
AWS needs to picks set of subnets from 4 groups because of the limitation on the number of tags on the each resource 1
The max tags allowed per resource is 50, and since we need atleast one tag (kubernetes.io/cluster/<>: shared) on each subnet, and also internal-lb on some subnets, 25% load sharing on 4 sets
allows ~200, but since these subnets were created by CloudFormation, there are already 3 tags on each subnet, so its definitely not 200.
There is no regions switch on AWS currently, therefore there was no need to crate subnets per region, but whenever that happends we might have to add those.
GCP has one set of network resources
There is no regions switch on GCP currently, therefore there was no need to crate subnets per region, but whenever that happends we might have to add those.