scripts/aws-check-limits: Automatically check AWS limits in your account #1343

wking · 2019-02-28T20:23:01Z

Based on @sferich888's approach in #1314.

$ aws-check-limits.sh | expand --tabs=7,20,31,43
count  limit        region     code        name
61     700          us-east-1  0Xc6LMYG8P  On-Demand instances - m4.large
3      6            us-east-1  0Xc6LMYG8P  On-Demand instances - i3.xlarge
...
0      5            -          dx8afcdfMr  Route 53 Traffic Policy Instances
131    ?            us-west-2  -           S3 buckets
61     20?          us-west-2  -           Gateway VPC endpoints
375    350?         us-west-2  -           EC2 network interfaces
72     5-per-zone?  us-west-2  -           EC2 NAT gateways
130    2500?        us-west-2  -           EC2 security groups
44     500          us-west-2  -           Network load balancers

openshift-ci-robot · 2019-02-28T20:25:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

abhinavdahiya · 2019-02-28T21:21:55Z

IMO this script is too bashy and hard-coded to support for very long. Why do we need this again? keeping these just in docs means they are guidelines.

$ aws-check-limits.sh | expand --tabs=7,20,31,43 count limit region code name 61 700 us-east-1 0Xc6LMYG8P On-Demand instances - m4.large 3 6 us-east-1 0Xc6LMYG8P On-Demand instances - i3.xlarge ... 0 5 - dx8afcdfMr Route 53 Traffic Policy Instances 131 ? us-west-2 - S3 buckets 61 20? us-west-2 - Gateway VPC endpoints 375 350? us-west-2 - EC2 network interfaces 72 5-per-zone? us-west-2 - EC2 NAT gateways 130 2500? us-west-2 - EC2 security groups 44 500 us-west-2 - Network load balancers

wking · 2019-02-28T21:54:46Z

IMO this script is too bashy...

I'm fine translating to Go or Python or whatever.

... and hard-coded to support for very long.

What hard-coded parts are you concerned about? The trusted-advisor loop seems fairly robust, and the per-resource entries after that are the only way (to my knowledge) to recover limit/usage information for those important resource types which trusted-advisor does not (yet?) support.

Why do we need this again? keeping these just in docs means they are guidelines.

If we're going to keep them anyway, it doesn't seem like much more of a maintenance burden to collect them in a script, and that makes it easier for users to invoke them without multiple copy/pastes. I'm fine adding a stderr echo disclaimer to make it clear that our maintenance will just be best-effort.

sferich888 · 2019-03-02T21:42:39Z

Why not add it directly to the installer as a subcommand[s] openshift-install check limits

sferich888 · 2019-03-08T21:38:25Z

This prints a lot of data. While running the script is simple and prints all the known limits (now) I am worried about maintainability and extensibility of this.

In reality what we care about, is are we having a problem!

$ command | awk '{if($1 >= $2) {print $0}}

Or your going to care about what your close to hitting a problem!

replace N, with some number or threshold you want to search for where you're close to a known limit.

$ command | awk '{if($1 >= $2-N) {print $0}}

wking · 2019-03-08T21:59:38Z

Or your going to care about what your close to hitting a problem!

This (if you're hitting a limit, you probably don't need this script to tell you). And this is where I start to worry about maintainability, as resource types and per-cluster creation counts are somewhat amorphous ;). How about a compromise where I remove resources that we expect the cluster to not need, but do not hard-code the numbers we expect the cluster to create? E.g. instance requirements would depend on cluster configuration.

sferich888

@wking I'm interested in your way of defining a script/checking if your close to a limit.
I am also interested in this script as a whole as I want to know how we connect/use it with https://access.redhat.com/labs/ocplimitscalculator/

sferich888 · 2019-03-14T14:43:34Z

scripts/aws-check-limits.sh

+fi
+
+printf 'count\tlimit\tregion\tcode\tname\n' || die 'failed to write header'
+aws --region us-east-1 support describe-trusted-advisor-checks --language en --query "checks[? category == 'service_limits'].{id: @.id, name: @.name}" --output text | while read -r CHECK


us-east-1 needs to be ${REGION} here.

sferich888 · 2019-03-14T14:43:40Z

scripts/aws-check-limits.sh

+do
+	CHECK_ID="$(echo "${CHECK}" | cut -d '	' -f 1)" || die 'failed to extract ID from %s\n' "${CHECK}"
+	CHECK_NAME="$(echo "${CHECK}" | cut -d '	' -f 2)" || die 'failed to extract name from %s\n' "${CHECK}"
+	RESULT="$(aws --region us-east-1 support describe-trusted-advisor-check-result --check-id "${CHECK_ID}" --query "join(\`\\n\`, result.flaggedResources[].join(\`\\t\`, [@.metadata[4] || '0', @.metadata[3], @.region || '-', '${CHECK_ID}', @.metadata[2]]))" --output text)" || die 'failed to check %s (%s)\n' "${CHECK_ID}" "${CHECK_NAME}"


us-east-1 needs to be ${REGION} here.

sferich888 · 2019-03-14T14:44:21Z

scripts/aws-check-limits.sh

+SECURITY_GROUPS_LIMIT='2500?'  # per region
+printf '%d\t%s\t%s\t-\tEC2 security groups\n' "$(echo "${SECURITY_GROUPS}" | wc -l)" "${SECURITY_GROUPS_LIMIT}" "${REGION}" || die 'failed to write result for EC2 security groups\n'
+
+NETWORK_LOAD_BALANCERS="$(aws elbv2 describe-load-balancers --query "join(\`\\n\`, @.LoadBalancers[].LoadBalancerArn)" --output text)" || die 'failed to list network load balancers\n'


Do we need to scope this to a REGION?

sferich888 · 2019-03-14T14:45:13Z

scripts/aws-check-limits.sh

+printf '%d\t%s\t%s\t-\tEC2 security groups\n' "$(echo "${SECURITY_GROUPS}" | wc -l)" "${SECURITY_GROUPS_LIMIT}" "${REGION}" || die 'failed to write result for EC2 security groups\n'
+
+NETWORK_LOAD_BALANCERS="$(aws elbv2 describe-load-balancers --query "join(\`\\n\`, @.LoadBalancers[].LoadBalancerArn)" --output text)" || die 'failed to list network load balancers\n'
+NETWORK_LOAD_BALANCER_LIMIT="$(aws elbv2 describe-account-limits --query "Limits[? @.Name == 'network-load-balancers'].Max" --output text)" || die 'failed to get network load balancer limit\n'


Do we need to scope this to a REGION?

openshift-ci-robot · 2019-10-04T16:23:24Z

@wking: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-upgrade	`ccbdd89`	link	`/test e2e-aws-upgrade`
ci/prow/e2e-aws	`ccbdd89`	link	`/test e2e-aws`
ci/prow/e2e-aws-disruptive	`ccbdd89`	link	`/test e2e-aws-disruptive`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

abhinavdahiya · 2019-10-11T20:58:55Z

/close

closing without prejudice due to age. Please consider opening an issue or enhancement to discuss and bring consensus on the fix, or if you think this is still important in current state feel free to reopen.

openshift-ci-robot · 2019-10-11T20:58:56Z

@abhinavdahiya: Closed this PR.

Details

In response to this:

/close

closing without prejudice due to age. Please consider opening an issue or enhancement to discuss and bring consensus on the fix, or if you think this is still important in current state feel free to reopen.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 28, 2019

openshift-ci-robot requested review from smarterclayton and steveej February 28, 2019 20:25

wking mentioned this pull request Feb 28, 2019

Providing a way for users to see limits/usage #1314

Closed

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2019

wking force-pushed the aws-check-limits-script branch from 24e161d to 328bbb0 Compare February 28, 2019 20:30

wking force-pushed the aws-check-limits-script branch from 328bbb0 to ccbdd89 Compare February 28, 2019 21:30

sferich888 suggested changes Mar 14, 2019

View reviewed changes

openshift-ci-robot closed this Oct 11, 2019

scripts/aws-check-limits: Automatically check AWS limits in your account #1343

scripts/aws-check-limits: Automatically check AWS limits in your account #1343

Uh oh!

Conversation

wking commented Feb 28, 2019

Uh oh!

openshift-ci-robot commented Feb 28, 2019

Uh oh!

abhinavdahiya commented Feb 28, 2019

Uh oh!

wking commented Feb 28, 2019

Uh oh!

sferich888 commented Mar 2, 2019

Uh oh!

sferich888 commented Mar 8, 2019

Uh oh!

wking commented Mar 8, 2019

Uh oh!

sferich888 left a comment

Choose a reason for hiding this comment

Uh oh!

sferich888 Mar 14, 2019

Choose a reason for hiding this comment

Uh oh!

sferich888 Mar 14, 2019

Choose a reason for hiding this comment

Uh oh!

sferich888 Mar 14, 2019

Choose a reason for hiding this comment

Uh oh!

sferich888 Mar 14, 2019

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Oct 4, 2019

Uh oh!

abhinavdahiya commented Oct 11, 2019

Uh oh!

openshift-ci-robot commented Oct 11, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants