Revert "Revert "data/aws: Switch to m4.large"" #882

crawford · 2018-12-12T18:44:18Z

Reverts #858

Even though we got our AWS quota increased, it looks like AWS just doesn't have the physical capacity for these machines in us-east-1. We are seeing a lot of failures in CI:

Error: Error applying plan:

1 error occurred:
    * module.masters.aws_instance.master[0]: 1 error occurred:
    * aws_instance.master.0: Error launching source instance: timeout while waiting for state to become 'success' (timeout: 30s)

Looking at CloudTrail, I see the following error, which corresponds to that failure:

We currently do not have sufficient t3.medium capacity in the Availability Zone you requested (us-east-1a). Our system will be working on provisioning additional capacity. You can currently get t3.medium capacity by not specifying an Availability Zone in your request or choosing us-east-1d, us-east-1b, us-east-1c, us-east-1f.

crawford · 2018-12-12T18:44:51Z

/cc @wking @abhinavdahiya @eparis

abhinavdahiya · 2018-12-12T18:46:03Z

/lgtm

ashcrow · 2018-12-12T19:45:43Z

👍 Thanks for putting this in @crawford!

cgwalters · 2018-12-12T19:52:22Z

Can Terraform express "give me one from a list of instance types ordered by preference"?

crawford · 2018-12-12T19:54:42Z

@cgwalters even if it could, I don't believe there is an API to check AWS's capacity in certain regions and availability zones. And if there was such an API, it would be racy. By the time we've verified the capacity and then tried to create instances, it may have been consumed by someone else.

wking · 2018-12-12T20:46:54Z

/lgtm

What's going on in CI? I see lots of "Pending - Job triggered", even for tests that should turn around in minutes.

Even though we got our AWS quota increased, it looks like AWS just doesn't have the physical capacity for these machines in us-east-1. We are seeing a lot of failures in CI: ``` Error: Error applying plan: 1 error occurred: * module.masters.aws_instance.master[0]: 1 error occurred: * aws_instance.master.0: Error launching source instance: timeout while waiting for state to become 'success' (timeout: 30s) ``` Looking at CloudTrail, I see the following error, which corresponds to that failure: > We currently do not have sufficient t3.medium capacity in the > Availability Zone you requested (us-east-1a). Our system will be > working on provisioning additional capacity. You can currently > get t3.medium capacity by not specifying an Availability Zone in > your request or choosing us-east-1d, us-east-1b, us-east-1c, > us-east-1f.

abhinavdahiya · 2018-12-12T20:54:48Z

/lgtm

wking · 2018-12-12T20:54:49Z

/lgtm

openshift-ci-robot · 2018-12-12T20:54:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya,crawford,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2018-12-12T20:54:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya,crawford,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

abhinavdahiya · 2018-12-12T21:06:34Z

/retest
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/882/pull-ci-openshift-installer-master-e2e-aws/2239

info: Manifests will be extracted to /tmp/release-image-0.0.1-2018-12-12-210459800060254
error: unable to connect to image repository registry.svc.ci.openshift.org/ci-op-f54sng97/stable@sha256:58b79dec7b54b6ade89615e2afc9cfdefb2f03bd612f6f27a4eff2763a342443: Get https://registry.svc.ci.openshift.org/v2/: net/http: TLS handshake timeout
2018/12/12 21:05:30 Container release in pod release-latest failed, exit code 1, reason Error

ashcrow · 2018-12-12T21:06:59Z

/retest

Registry timeout:

error: unable to connect to image repository

wking · 2018-12-12T21:21:56Z

error: unable to connect to image repository

Added a mention in openshift/release#2070 in case that helps bump further investigation there ;).

openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 12, 2018

openshift-ci-robot requested review from rajatchopra and russellb December 12, 2018 18:44

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 12, 2018

openshift-ci-robot requested review from abhinavdahiya, eparis and wking December 12, 2018 18:44

openshift-ci-robot assigned abhinavdahiya Dec 12, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2018

crawford mentioned this pull request Dec 12, 2018

controller: Prefix generated configs with pool type openshift/machine-config-operator#218

Merged

ashcrow approved these changes Dec 12, 2018

View reviewed changes

ashcrow mentioned this pull request Dec 12, 2018

hack: Add cluster-push-*.sh scripts openshift/machine-config-operator#231

Merged

openshift-ci-robot assigned wking Dec 12, 2018

crawford force-pushed the revert-858-back-to-t3.medium branch from 0ea6d35 to 9f34143 Compare December 12, 2018 20:54

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2018

openshift-merge-robot merged commit 7a46568 into master Dec 12, 2018

crawford deleted the revert-858-back-to-t3.medium branch December 12, 2018 22:07

Revert "Revert "data/aws: Switch to m4.large"" #882

Revert "Revert "data/aws: Switch to m4.large"" #882

Uh oh!

Conversation

crawford commented Dec 12, 2018

Uh oh!

crawford commented Dec 12, 2018

Uh oh!

abhinavdahiya commented Dec 12, 2018

Uh oh!

ashcrow commented Dec 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cgwalters commented Dec 12, 2018

Uh oh!

crawford commented Dec 12, 2018

Uh oh!

wking commented Dec 12, 2018

Uh oh!

abhinavdahiya commented Dec 12, 2018

Uh oh!

wking commented Dec 12, 2018

Uh oh!

openshift-ci-robot commented Dec 12, 2018

Uh oh!

openshift-ci-robot commented Dec 12, 2018

Uh oh!

abhinavdahiya commented Dec 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashcrow commented Dec 12, 2018

Uh oh!

wking commented Dec 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ashcrow commented Dec 12, 2018 •

edited

Loading

abhinavdahiya commented Dec 12, 2018 •

edited

Loading