Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Feb 28, 2019

Based on @sferich888's approach in #1314.

$ aws-check-limits.sh | expand --tabs=7,20,31,43
count  limit        region     code        name
61     700          us-east-1  0Xc6LMYG8P  On-Demand instances - m4.large
3      6            us-east-1  0Xc6LMYG8P  On-Demand instances - i3.xlarge
...
0      5            -          dx8afcdfMr  Route 53 Traffic Policy Instances
131    ?            us-west-2  -           S3 buckets
61     20?          us-west-2  -           Gateway VPC endpoints
375    350?         us-west-2  -           EC2 network interfaces
72     5-per-zone?  us-west-2  -           EC2 NAT gateways
130    2500?        us-west-2  -           EC2 security groups
44     500          us-west-2  -           Network load balancers

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 28, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2019
@wking wking force-pushed the aws-check-limits-script branch from 24e161d to 328bbb0 Compare February 28, 2019 20:30
@abhinavdahiya
Copy link
Contributor

IMO this script is too bashy and hard-coded to support for very long. Why do we need this again? keeping these just in docs means they are guidelines.

  $ aws-check-limits.sh | expand --tabs=7,20,31,43
  count  limit        region     code        name
  61     700          us-east-1  0Xc6LMYG8P  On-Demand instances - m4.large
  3      6            us-east-1  0Xc6LMYG8P  On-Demand instances - i3.xlarge
  ...
  0      5            -          dx8afcdfMr  Route 53 Traffic Policy Instances
  131    ?            us-west-2  -           S3 buckets
  61     20?          us-west-2  -           Gateway VPC endpoints
  375    350?         us-west-2  -           EC2 network interfaces
  72     5-per-zone?  us-west-2  -           EC2 NAT gateways
  130    2500?        us-west-2  -           EC2 security groups
  44     500          us-west-2  -           Network load balancers
@wking wking force-pushed the aws-check-limits-script branch from 328bbb0 to ccbdd89 Compare February 28, 2019 21:30
@wking
Copy link
Member Author

wking commented Feb 28, 2019

IMO this script is too bashy...

I'm fine translating to Go or Python or whatever.

... and hard-coded to support for very long.

What hard-coded parts are you concerned about? The trusted-advisor loop seems fairly robust, and the per-resource entries after that are the only way (to my knowledge) to recover limit/usage information for those important resource types which trusted-advisor does not (yet?) support.

Why do we need this again? keeping these just in docs means they are guidelines.

If we're going to keep them anyway, it doesn't seem like much more of a maintenance burden to collect them in a script, and that makes it easier for users to invoke them without multiple copy/pastes. I'm fine adding a stderr echo disclaimer to make it clear that our maintenance will just be best-effort.

@sferich888
Copy link
Contributor

Why not add it directly to the installer as a subcommand[s] openshift-install check limits

@sferich888
Copy link
Contributor

This prints a lot of data. While running the script is simple and prints all the known limits (now) I am worried about maintainability and extensibility of this.

In reality what we care about, is are we having a problem!

$ command | awk '{if($1 >= $2) {print $0}}

Or your going to care about what your close to hitting a problem!

  • replace N, with some number or threshold you want to search for where you're close to a known limit.
$ command | awk '{if($1 >= $2-N) {print $0}}

@wking
Copy link
Member Author

wking commented Mar 8, 2019

Or your going to care about what your close to hitting a problem!

This (if you're hitting a limit, you probably don't need this script to tell you). And this is where I start to worry about maintainability, as resource types and per-cluster creation counts are somewhat amorphous ;). How about a compromise where I remove resources that we expect the cluster to not need, but do not hard-code the numbers we expect the cluster to create? E.g. instance requirements would depend on cluster configuration.

Copy link
Contributor

@sferich888 sferich888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking I'm interested in your way of defining a script/checking if your close to a limit.
I am also interested in this script as a whole as I want to know how we connect/use it with https://access.redhat.com/labs/ocplimitscalculator/

fi

printf 'count\tlimit\tregion\tcode\tname\n' || die 'failed to write header'
aws --region us-east-1 support describe-trusted-advisor-checks --language en --query "checks[? category == 'service_limits'].{id: @.id, name: @.name}" --output text | while read -r CHECK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

us-east-1 needs to be ${REGION} here.

do
CHECK_ID="$(echo "${CHECK}" | cut -d ' ' -f 1)" || die 'failed to extract ID from %s\n' "${CHECK}"
CHECK_NAME="$(echo "${CHECK}" | cut -d ' ' -f 2)" || die 'failed to extract name from %s\n' "${CHECK}"
RESULT="$(aws --region us-east-1 support describe-trusted-advisor-check-result --check-id "${CHECK_ID}" --query "join(\`\\n\`, result.flaggedResources[].join(\`\\t\`, [@.metadata[4] || '0', @.metadata[3], @.region || '-', '${CHECK_ID}', @.metadata[2]]))" --output text)" || die 'failed to check %s (%s)\n' "${CHECK_ID}" "${CHECK_NAME}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

us-east-1 needs to be ${REGION} here.

SECURITY_GROUPS_LIMIT='2500?' # per region
printf '%d\t%s\t%s\t-\tEC2 security groups\n' "$(echo "${SECURITY_GROUPS}" | wc -l)" "${SECURITY_GROUPS_LIMIT}" "${REGION}" || die 'failed to write result for EC2 security groups\n'

NETWORK_LOAD_BALANCERS="$(aws elbv2 describe-load-balancers --query "join(\`\\n\`, @.LoadBalancers[].LoadBalancerArn)" --output text)" || die 'failed to list network load balancers\n'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to scope this to a REGION?

printf '%d\t%s\t%s\t-\tEC2 security groups\n' "$(echo "${SECURITY_GROUPS}" | wc -l)" "${SECURITY_GROUPS_LIMIT}" "${REGION}" || die 'failed to write result for EC2 security groups\n'

NETWORK_LOAD_BALANCERS="$(aws elbv2 describe-load-balancers --query "join(\`\\n\`, @.LoadBalancers[].LoadBalancerArn)" --output text)" || die 'failed to list network load balancers\n'
NETWORK_LOAD_BALANCER_LIMIT="$(aws elbv2 describe-account-limits --query "Limits[? @.Name == 'network-load-balancers'].Max" --output text)" || die 'failed to get network load balancer limit\n'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to scope this to a REGION?

@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-upgrade ccbdd89 link /test e2e-aws-upgrade
ci/prow/e2e-aws ccbdd89 link /test e2e-aws
ci/prow/e2e-aws-disruptive ccbdd89 link /test e2e-aws-disruptive

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@abhinavdahiya
Copy link
Contributor

/close

closing without prejudice due to age. Please consider opening an issue or enhancement to discuss and bring consensus on the fix, or if you think this is still important in current state feel free to reopen.

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: Closed this PR.

Details

In response to this:

/close

closing without prejudice due to age. Please consider opening an issue or enhancement to discuss and bring consensus on the fix, or if you think this is still important in current state feel free to reopen.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants