Skip to content

Conversation

@crawford
Copy link
Contributor

Reverts #858

Even though we got our AWS quota increased, it looks like AWS just doesn't have the physical capacity for these machines in us-east-1. We are seeing a lot of failures in CI:

Error: Error applying plan:

1 error occurred:
    * module.masters.aws_instance.master[0]: 1 error occurred:
    * aws_instance.master.0: Error launching source instance: timeout while waiting for state to become 'success' (timeout: 30s)

Looking at CloudTrail, I see the following error, which corresponds to that failure:

We currently do not have sufficient t3.medium capacity in the Availability Zone you requested (us-east-1a). Our system will be working on provisioning additional capacity. You can currently get t3.medium capacity by not specifying an Availability Zone in your request or choosing us-east-1d, us-east-1b, us-east-1c, us-east-1f.

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 12, 2018
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 12, 2018
@crawford
Copy link
Contributor Author

/cc @wking @abhinavdahiya @eparis

@abhinavdahiya
Copy link
Contributor

/lgtm

@ashcrow
Copy link
Member

ashcrow commented Dec 12, 2018

👍 Thanks for putting this in @crawford!

@cgwalters
Copy link
Member

Can Terraform express "give me one from a list of instance types ordered by preference"?

@crawford
Copy link
Contributor Author

@cgwalters even if it could, I don't believe there is an API to check AWS's capacity in certain regions and availability zones. And if there was such an API, it would be racy. By the time we've verified the capacity and then tried to create instances, it may have been consumed by someone else.

@wking
Copy link
Member

wking commented Dec 12, 2018

/lgtm

What's going on in CI? I see lots of "Pending - Job triggered", even for tests that should turn around in minutes.

Even though we got our AWS quota increased, it looks like AWS just
doesn't have the physical capacity for these machines in us-east-1. We
are seeing a lot of failures in CI:

```
Error: Error applying plan:

1 error occurred:
    * module.masters.aws_instance.master[0]: 1 error occurred:
    * aws_instance.master.0: Error launching source instance: timeout
      while waiting for state to become 'success' (timeout: 30s)
```

Looking at CloudTrail, I see the following error, which corresponds to
that failure:

> We currently do not have sufficient t3.medium capacity in the
> Availability Zone you requested (us-east-1a). Our system will be
> working on provisioning additional capacity. You can currently
> get t3.medium capacity by not specifying an Availability Zone in
> your request or choosing us-east-1d, us-east-1b, us-east-1c,
> us-east-1f.
@crawford crawford force-pushed the revert-858-back-to-t3.medium branch from 0ea6d35 to 9f34143 Compare December 12, 2018 20:54
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2018
@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2018
@wking
Copy link
Member

wking commented Dec 12, 2018

/lgtm

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,crawford,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,crawford,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Dec 12, 2018

/retest
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/882/pull-ci-openshift-installer-master-e2e-aws/2239

info: Manifests will be extracted to /tmp/release-image-0.0.1-2018-12-12-210459800060254
error: unable to connect to image repository registry.svc.ci.openshift.org/ci-op-f54sng97/stable@sha256:58b79dec7b54b6ade89615e2afc9cfdefb2f03bd612f6f27a4eff2763a342443: Get https://registry.svc.ci.openshift.org/v2/: net/http: TLS handshake timeout
2018/12/12 21:05:30 Container release in pod release-latest failed, exit code 1, reason Error

@ashcrow
Copy link
Member

ashcrow commented Dec 12, 2018

/retest

Registry timeout:

error: unable to connect to image repository 

@wking
Copy link
Member

wking commented Dec 12, 2018

error: unable to connect to image repository 

Added a mention in openshift/release#2070 in case that helps bump further investigation there ;).

@openshift-merge-robot openshift-merge-robot merged commit 7a46568 into master Dec 12, 2018
@crawford crawford deleted the revert-858-back-to-t3.medium branch December 12, 2018 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants