Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Oct 15, 2020

@coverprice recently bumped the Azure limits to:

  • Central US:

    • 2000 standardDSv3Family
    • 400 LowPriorityCores (Total Regional Spot vCPUs) rhbz#1888380
    • 100 Public IPs
  • Our three other regions:

    • 400 standardDSv3Family
    • 200 LowPriorityCores (Total Regional Spot vCPUs)
    • 100 Public IPs

Surpassing the DSv3 limits leads to errors like:

Code=OperationNotAllowed
Message=Operation could not be completed as it results in exceeding approved standardDSv3Family Cores quota. Additional details - Deployment Model: Resource Manager, Location: centralus, Current Limit: 1000, Current Usage: 1000, Additional Required: 4, (Minimum) New Limit Required: 1004. Submit a request for Quota increase at...

Surpassing LowPriorityCores limits leads to errors like:

Code=OperationNotAllowed
Message=Operation could not be completed as it results in exceeding approved LowPriorityCores quota. Additional details - Deployment Model: Resource Manager, Location: eastus2, Current Limit: 10, Current Usage: 8, Additional Required: 4, (Minimum) New Limit Required: 12. Submit a request for Quota increase at ...

Surpassing public IP limits leads to errors like:

Code=PublicIPCountLimitReached
Message=Cannot create more than 50 public IP addresses for this subscription in this region.

Current docs recommend 40 vCPU per 3-compute cluster with 3 public IP addresses, which makes for the following limits:

I dunno what our current private IP quota is. I guess we'll see when we bump into it. Anyhow, limit is 33 clusters for central US (most of our tests do not involve spot instances) and 10 in the other regions. Next thing to bump would be standardDSv3Family in the other regions, followed by public IPs.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 15, 2020
@wking wking force-pushed the expand-azure-quota branch from 78d99b5 to 86f5480 Compare October 15, 2020 03:14
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 18, 2020
@petr-muller
Copy link
Member

LGTM but has conflicts

@wking
Copy link
Member Author

wking commented Oct 22, 2020

Conflicts are with #12842. We can't move forward here until we figure out why the transition to static names didn't work.

In October, James Russell bumped the Azure limits to:

Central US:
  2000 standardDSv3Family
   400 LowPriorityCores (Total Regional Spot vCPUs) [1]
   100 Public IPs

Our three other regions:
  400 standardDSv3Family
  200 LowPriorityCores (Total Regional Spot vCPUs)
  100 Public IPs

Surpassing the DSv3 limits leads to errors like:

  Code=OperationNotAllowed
  Message=Operation could not be completed as it results in exceeding
    approved standardDSv3Family Cores quota. Additional details -
    Deployment Model: Resource Manager, Location: centralus, Current
    Limit: 1000, Current Usage: 1000, Additional Required: 4,
    (Minimum) New Limit Required: 1004. Submit a request for Quota
    increase at...

Surpassing LowPriorityCores limits leads to errors like:

  Code=OperationNotAllowed
  Message=Operation could not be completed as it results in exceeding
    approved LowPriorityCores quota. Additional details - Deployment
    Model: Resource Manager, Location: eastus2, Current Limit: 10,
    Current Usage: 8, Additional Required: 4, (Minimum) New Limit
    Required: 12. Submit a request for Quota increase at ...

Surpassing public IP limits leads to errors like:

  Code=PublicIPCountLimitReached
  Message=Cannot create more than 50 public IP addresses for this
    subscription in this region.

There are also "Standard Sku Public IP Addresses" and "Static Public
IP Addresses".  The former lead to errors like:

  Code=StandardSkuPublicIPCountLimitReached
  Message=Cannot create more than 50 standard sku publicIpAddresses
    for this subscription in this region.

I don't think I've ever seen the latter in CI, but the error is supposed to look like:

  Code=StaticPublicIPCountLimitReached
  Message=Cannot create more than 20 public IP addresses with static
    allocation method for this subscription in this region.

Current docs recommend 40 vCPU per 3-compute cluster [2] with 3 public
IP addresses [3], which makes for the following limits:

Central US:
  2000 standardDSv3Family / 40 = 50 clusters
   400 LowPriorityCores (Total Regional Spot vCPUs) [1] / 18 vCPUs per spot test = 22 clusters
   100 Public IPs / 3 per cluster = 33 clusters

Our three other regions:
  400 standardDSv3Family / 40 = 10 clusters
  200 LowPriorityCores (Total Regional Spot vCPUs) / 18 vCPUs per spot test = 11 clusters
  100 Public IPs / 3 per cluster = 33 clusters

Our default limits:
   1000 VNets / 1 per cluster [4] = 1000 clusters
  65536 network interfaces / 6+ per cluster [5] = 10+k clusters
   5000 network security groups / 2+ per cluster [6] = 2+k clusters
   1000 network load balancers / 3+ per cluster [7] = 300+ clusters
     ?? private IP addresses / 7 per cluster [8] = ?? clusters

18 vCPUs per spot test is from Joel Speed.

I dunno what our current private IP quota is.  I guess we'll see when
we bump into it.  Anyhow, limit is 33 clusters for central US (most of
our tests do not involve spot instances) and 10 in the other regions.
Next thing to bump would be standardDSv3Family in the other regions,
followed by public IPs.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1888380
[2]: https://github.com/openshift/openshift-docs/blame/1338581a9d0c8e44aecf0a415f8d7a2a61d48df2/modules/installation-azure-limits.adoc#L33
[3]: https://github.com/openshift/openshift-docs/blame/1338581a9d0c8e44aecf0a415f8d7a2a61d48df2/modules/installation-azure-limits.adoc#L105-L110
[4]: https://github.com/openshift/openshift-docs/blame/1338581a9d0c8e44aecf0a415f8d7a2a61d48df2/modules/installation-azure-limits.adoc#L66-L68
[5]: https://github.com/openshift/openshift-docs/blame/1338581a9d0c8e44aecf0a415f8d7a2a61d48df2/modules/installation-azure-limits.adoc#L72-L75
[6]: https://github.com/openshift/openshift-docs/blame/1338581a9d0c8e44aecf0a415f8d7a2a61d48df2/modules/installation-azure-limits.adoc#L79-L83
[7]: https://github.com/openshift/openshift-docs/blame/1338581a9d0c8e44aecf0a415f8d7a2a61d48df2/modules/installation-azure-limits.adoc#L92-L102
[8]: https://github.com/openshift/openshift-docs/blame/1338581a9d0c8e44aecf0a415f8d7a2a61d48df2/modules/installation-azure-limits.adoc#L113-L116
@wking wking force-pushed the expand-azure-quota branch from 86f5480 to 1b664cc Compare December 15, 2020 23:19
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 15, 2020
@wking
Copy link
Member Author

wking commented Dec 15, 2020

Rebased around #14285 with 86f548040d -> 1b664cc. That also adds some handwaving around StandardSkuPublicIPCountLimitReached and StaticPublicIPCountLimitReached to the commit message, although I'm still not clear on how OCP cluster consumption of those resources differs from PublicIPCountLimitReached.

Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤞

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 17, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 6af8d89 into openshift:master Dec 17, 2020
@openshift-ci-robot
Copy link
Contributor

@wking: Updated the following 2 configmaps:

  • resources configmap in namespace ci at cluster app.ci using the following files:
    • key boskos.yaml using file core-services/prow/02_config/_boskos.yaml
  • resources configmap in namespace ci at cluster api.ci using the following files:
    • key boskos.yaml using file core-services/prow/02_config/_boskos.yaml
Details

In response to this:

@coverprice recently bumped the Azure limits to:

  • Central US:

    • 2000 standardDSv3Family
    • 400 LowPriorityCores (Total Regional Spot vCPUs) rhbz#1888380
    • 100 Public IPs
  • Our three other regions:

    • 400 standardDSv3Family
    • 200 LowPriorityCores (Total Regional Spot vCPUs)
    • 100 Public IPs

Surpassing the DSv3 limits leads to errors like:

Code=OperationNotAllowed
Message=Operation could not be completed as it results in exceeding approved standardDSv3Family Cores quota. Additional details - Deployment Model: Resource Manager, Location: centralus, Current Limit: 1000, Current Usage: 1000, Additional Required: 4, (Minimum) New Limit Required: 1004. Submit a request for Quota increase at...

Surpassing LowPriorityCores limits leads to errors like:

Code=OperationNotAllowed
Message=Operation could not be completed as it results in exceeding approved LowPriorityCores quota. Additional details - Deployment Model: Resource Manager, Location: eastus2, Current Limit: 10, Current Usage: 8, Additional Required: 4, (Minimum) New Limit Required: 12. Submit a request for Quota increase at ...

Surpassing public IP limits leads to errors like:

Code=PublicIPCountLimitReached
Message=Cannot create more than 50 public IP addresses for this subscription in this region.

Current docs recommend 40 vCPU per 3-compute cluster with 3 public IP addresses, which makes for the following limits:

I dunno what our current private IP quota is. I guess we'll see when we bump into it. Anyhow, limit is 33 clusters for central US (most of our tests do not involve spot instances) and 10 in the other regions. Next thing to bump would be standardDSv3Family in the other regions, followed by public IPs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the expand-azure-quota branch December 17, 2020 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants