Skip to content

Conversation

@abhinavdahiya
Copy link
Contributor

@abhinavdahiya abhinavdahiya commented Oct 22, 2019

  • == vs =~
    the regex match allows the cluster_variant to be a mix like compact,shared-vpc

  • Azure vritual network (an other resources) need to be in the same location as the various cluster resources like virtual machines, load balancers etc.
    So, The job picks the right resources based on the regions picked.

  • AWS needs to picks set of subnets from 4 groups because of the limitation on the number of tags on the each resource 1
    The max tags allowed per resource is 50, and since we need atleast one tag (kubernetes.io/cluster/<>: shared) on each subnet, and also internal-lb on some subnets, 25% load sharing on 4 sets
    allows ~200, but since these subnets were created by CloudFormation, there are already 3 tags on each subnet, so its definitely not 200.

    There is no regions switch on AWS currently, therefore there was no need to crate subnets per region, but whenever that happends we might have to add those.

  • GCP has one set of network resources

    There is no regions switch on GCP currently, therefore there was no need to crate subnets per region, but whenever that happends we might have to add those.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 22, 2019
@abhinavdahiya
Copy link
Contributor Author

/test ci/rehearse/openshift/installer/master/e2e-aws-shared-vpc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton hopefully this is okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want whatever new hotness the test-platform folks are cooking up to allow us to mix in assorted flavors in ci-operator. Until then, I think regexp/substring matching like this is fine.

@abhinavdahiya
Copy link
Contributor Author

/retest

@abhinavdahiya
Copy link
Contributor Author

@abhinavdahiya
Copy link
Contributor Author

/test pj-rehearse

@abhinavdahiya
Copy link
Contributor Author

/retest

@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 24, 2019
@abhinavdahiya
Copy link
Contributor Author

/assign @smarterclayton @wking @stevekuznetsov

@abhinavdahiya abhinavdahiya changed the title WIP: add shared-vpc cluster variant Add shared vpc CI job for installer repo and informing periodics Oct 24, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 24, 2019
@abhinavdahiya abhinavdahiya force-pushed the shared_network branch 2 times, most recently from 93a1cac to bdab26d Compare October 24, 2019 20:27
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have committed code somewhere to set these up?

Copy link
Contributor Author

@abhinavdahiya abhinavdahiya Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, these just exist in the CI account using CF templates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want a doc somewhere (maybe just the commit message) for "someone blew away our long-lived subnets, how do we recover?".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added to the commit message 85ff0d8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Should the explicit subnets: [] be an error? I dunno if it's worth distinguishing between nil/unset and explicitly-an-empty-array. But except for the convenience here, I wouldn't expect folks to set subnets at all unless they were using shared subnets. On the other hand, composing the install-config from a bunch of chunks so we could insert subnets: ... if and only if ${subnets} was not empty seems like a pain, so maybe the convenience is worth it ;). Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The install process not-set/empty the same. So it's not worth it.

@wking
Copy link
Member

wking commented Oct 24, 2019

AWS needs to picks set of subnets from 4 groups because of the limitation on the number of tags on the each resource...

Do we need a backstop for untagging these? Or will we just lean in manually and blow the tags off periodically?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use do-not-delete-shared-vnet-centralus and:

vnetname="do-not-delete-shared-vnet-${AZURE_REGION}"

instead of the case here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with commit 85ff0d8

@smarterclayton
Copy link
Contributor

Looks ok to me for all the stuff outside of the actual fields in the install config.

@abhinavdahiya
Copy link
Contributor Author

AWS needs to picks set of subnets from 4 groups because of the limitation on the number of tags on the each resource...

Do we need a backstop for untagging these? Or will we just lean in manually and blow the tags off periodically?

the backstop whatever we have for other IPI jobs, not sure how this is different.

@abhinavdahiya
Copy link
Contributor Author

ping @wking @smarterclayton

@wking
Copy link
Member

wking commented Oct 26, 2019

the backstop whatever we have for other IPI jobs, not sure how this is different.

Not fundamentally different. But the current backstop looks for old VPCs. Things after or orthogonal to VPC removal which do not support direct tagging can leak. For example, we leak records in the public Route 53 zone now. That's not a big deal, because there's space for thousands of stale records. I'm worried that if/when we leak shared tags there won't be as much headroom.

@abhinavdahiya
Copy link
Contributor Author

the backstop whatever we have for other IPI jobs, not sure how this is different.

Not fundamentally different. But the current backstop looks for old VPCs. Things after or orthogonal to VPC removal which do not support direct tagging can leak. For example, we leak records in the public Route 53 zone now. That's not a big deal, because there's space for thousands of stale records. I'm worried that if/when we leak shared tags there won't be as much headroom.

@wking
I can definitely make sure the installer team works with platform team to setup pruners for these tags.. but this sounds not something this PR can fix?

@abhinavdahiya
Copy link
Contributor Author

/retest

@wking
Copy link
Member

wking commented Oct 29, 2019

/lgtm
/hold

Pull the hold when you're satisfied with the rehearsal results :)

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Oct 29, 2019
* `==` vs `=~`
  the regex match allows the cluster_variant to be a mix like `compact,shared-vpc`

* Azure vritual network (an other resources) need to be in the same location as the various cluster resources like virtual machines, load balancers etc.
  So, The job picks the right resources based on the regions picked.

  The resources were created using the work-in-progress UPI Azure docs with some modifications [4].

* AWS needs to picks set of subnets from 4 groups because of the limitation on the number of tags on the each resource [1]
  The max tags allowed per resource is 50, and since we need atleast one tag (kubernetes.io/cluster/<>: shared) on each subnet, and also internal-lb on some subnets, 25% load sharing on 4 sets
  allows ~200, but since these subnets were created by CloudFormation, there are already 3 tags on each subnet, so its definitely not 200.

  There is no regions switch on AWS currently, therefore there was no need to crate subnets per region, but whenever that happends we might have to add those.

  The resources where created using the AWS CF templates for creating VPC [3]
  ```
  aws cloudformation create-stack --stack-policy-body --stack-name do-not-delete-shared-vpc-1 --template-body "$(cat upi/aws/cloudformation/01_vpc.yaml)" --parameters ParameterKey=AvailabilityZoneCount,ParameterValue=3
  ```

* GCP has one set of network resources

  There is no regions switch on GCP currently, therefore there was no need to crate subnets per region, but whenever that happends we might have to add those.

  The resources where created using the GCP UPI template for creating a VPC [2]
  ```
  gcloud --project  openshift-gce-devel-ci  deployment-manager deployments create do-no-delete-shared-vpc --config 01_vpc.yaml
  ```

[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#tag-restrictions
[2]: https://github.com/openshift/installer/blob/release-4.3/upi/gcp/01_vpc.py
[3]: https://github.com/openshift/installer/blob/release-4.3/upi/aws/cloudformation/01_vpc.yaml
[4]: https://github.com/glennswest/ocpupi4azure
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2019
@abhinavdahiya
Copy link
Contributor Author

pushed a change https://github.com/openshift/release/compare/85ff0d87bde1f0bed3d63bf6dc5d90af24d6c05a..424a04ac2db6ea268ab7de73b09d7b86bdc37f04

because the previous subnets were pruned again, also sent out an email to dp team to whiltelist these resources.

@abhinavdahiya
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 29, 2019
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, stevekuznetsov, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2019
@openshift-merge-robot openshift-merge-robot merged commit 59c62b9 into openshift:master Oct 29, 2019
@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: Updated the following 6 configmaps:

  • job-config-4.3 configmap in namespace ci using the following files:
    • key openshift-release-release-4.3-periodics.yaml using file ci-operator/jobs/openshift/release/openshift-release-release-4.3-periodics.yaml
  • prow-job-cluster-launch-installer-e2e configmap in namespace ci using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
  • prow-job-cluster-launch-installer-e2e configmap in namespace ci-stg using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
  • prow-job-cluster-launch-installer-src configmap in namespace ci using the following files:
    • key cluster-launch-installer-src.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-src.yaml
  • prow-job-cluster-launch-installer-src configmap in namespace ci-stg using the following files:
    • key cluster-launch-installer-src.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-src.yaml
  • job-config-master configmap in namespace ci using the following files:
    • key openshift-installer-master-presubmits.yaml using file ci-operator/jobs/openshift/installer/openshift-installer-master-presubmits.yaml
Details

In response to this:

  • == vs =~
    the regex match allows the cluster_variant to be a mix like compact,shared-vpc

  • Azure vritual network (an other resources) need to be in the same location as the various cluster resources like virtual machines, load balancers etc.
    So, The job picks the right resources based on the regions picked.

  • AWS needs to picks set of subnets from 4 groups because of the limitation on the number of tags on the each resource 1
    The max tags allowed per resource is 50, and since we need atleast one tag (kubernetes.io/cluster/<>: shared) on each subnet, and also internal-lb on some subnets, 25% load sharing on 4 sets
    allows ~200, but since these subnets were created by CloudFormation, there are already 3 tags on each subnet, so its definitely not 200.

There is no regions switch on AWS currently, therefore there was no need to crate subnets per region, but whenever that happends we might have to add those.

  • GCP has one set of network resources

There is no regions switch on GCP currently, therefore there was no need to crate subnets per region, but whenever that happends we might have to add those.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/rehearse/release-openshift-origin-installer-e2e-azure-shared-vpc-4.3 424a04a link /test pj-rehearse
ci/rehearse/openshift/installer/master/e2e-aws-shared-vpc 424a04a link /test pj-rehearse
ci/prow/pj-rehearse 424a04a link /test pj-rehearse

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

wking added a commit to wking/openshift-release that referenced this pull request Jan 23, 2020
…ions

Catching up with b717933
(ci-operator/templates/openshift/installer/cluster-launch-installer-*:
Random AWS regions for IPI, 2020-01-23, openshift#6833), add shares subnets for
the new regions.  This is basically as described in 424a04a (Add
shared vpc CI job for installer repo and informing periodics,
2019-08-22, openshift#5550), although I've dropped the --stack-policy-body
mentioned in that commit message (I dunno what it was about, since the
commit message didn't give it the argument that --stack-policy-body
expects).  Generated using:

  $ export AWS_PROFILE=ci  # or whatever you call it locally
  $ git fetch origin
  $ date --iso=m --utc
  2020-01-23T20:41+0000
  $ git checkout origin/release-4.3
  $ git --no-pager log --oneline -1
  2055609f9 (HEAD, origin/release-4.3) Merge pull request openshift#2928 from ashcrow/4.3-signed-rhcos-bump

with:

  for REGION in us-east-1 us-west-1 us-west-2
  do
    COUNT=3
    if test us-west-1 = "${REGION}"
    then
      COUNT=2
    fi
    for INDEX in 1 2 3 4
    do
      NAME="do-not-delete-shared-vpc-${INDEX}"
      aws --region "${REGION}" cloudformation create-stack --stack-name "${NAME}" --template-body "$(cat upi/aws/cloudformation/01_vpc.yaml)" --parameters "ParameterKey=AvailabilityZoneCount,ParameterValue=${COUNT}" >/dev/null
      aws --region "${REGION}" cloudformation wait stack-create-complete --stack-name "${NAME}"
      SUBNETS="$(aws --region "${REGION}" cloudformation describe-stacks --stack-name "${NAME}" | jq -c '[.Stacks[].Outputs[] | select(.OutputKey | endswith("SubnetIds")).OutputValue | split(",")[]]' | sed "s/\"/'/g")"
      echo "${REGION}_$((INDEX - 1))) subnets=\"${SUBNETS}\";;"
    done
  done

which spits out:

  us-east-2_0) subnets="['subnet-0faf6d16c378ee7a7','subnet-0e104572db1b7d092','subnet-014ca96c04f36adec','subnet-0ea06057dceadfe8e','subnet-0689efe5e1f9f4212','subnet-0d36bb8edbcb3d916']";;
  us-east-2_1) subnets="['subnet-085787cc4b80b84b2','subnet-09dfbf66e8f6e5b50','subnet-0db5d90ff3087444e','subnet-047f15f2a0210fbe0','subnet-0bf13f041c4233849','subnet-0e2a5320549e289d8']";;
  ...

The COUNT bit for us-west-1 is because our CI account only has access
to two zones there:

  $ aws --region us-west-1 ec2 describe-availability-zones --output text
  AVAILABILITYZONES		 us-west-1					available												usw1-az3 us-west-1a
  AVAILABILITYZONES		 us-west-1					available												usw1-az1 us-west-1b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants