Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Dec 2, 2019

Add some echos to the pokes that initially landed in 0ec2cd9 (#5720) to make it easier to rule out that code when debugging mysterious failures like:

Container setup exited with code 6, reason Error
---
Lease acquired, installing...
Installing from release registry.svc.ci.openshift.org/ci-op-r6dy480t/release@sha256:284ff92845dbfc3ca1be73159acc58b36cbfe03aed05d0f79582ea4207035da9
---

@openshift-ci-robot openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 2, 2019
@wking wking force-pushed the gcp-token-poke-debug branch from b141f98 to 4759620 Compare December 2, 2019 23:46
@wking wking changed the title ci-operator/templates/openshift/installer/cluster-launch-installer-e2e: Debug logging for Google OAuth pokes ci-operator/templates/openshift/installer/cluster-launch-installer-e2e: Error-catching for Google OAuth pokes Dec 2, 2019
@smarterclayton
Copy link
Contributor

I’m not sure what you’re trying to achieve with this change. Describe why you think it’s related?

@smarterclayton
Copy link
Contributor

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 3, 2019
@wking
Copy link
Member Author

wking commented Dec 3, 2019

#6190 (comment) is why DNS failures are giving us Container setup exited with code 6, reason Error and the echos would make the relationship very obvious, because the last line in the logs would mention the OAuth poke and non-400 response code.

@smarterclayton
Copy link
Contributor

/hold cancel

But want some changes

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 3, 2019
@wking wking force-pushed the gcp-token-poke-debug branch from 4759620 to 3ef3905 Compare December 3, 2019 00:16
@wking wking force-pushed the gcp-token-poke-debug branch from 3ef3905 to d612a99 Compare December 3, 2019 00:23
@wking wking force-pushed the gcp-token-poke-debug branch from d612a99 to 2336b88 Compare December 3, 2019 00:31
…e: Error-catching for Google OAuth pokes

Catch non-zero exit codes in the poke that initially landed in 0ec2cd9
(template: Try to poke the GCP auth endpoint in the container,
2019-10-31, openshift#5720) to make it easier to rule out that code when
debugging mysterious failures like [1]:

  Container setup exited with code 6, reason Error
  ---
  Lease acquired, installing...
  Installing from release registry.svc.ci.openshift.org/ci-op-r6dy480t/release@sha256:284ff92845dbfc3ca1be73159acc58b36cbfe03aed05d0f79582ea4207035da9
  ---

From curl(1), exit 6 is:

  Couldn't resolve host. The given remote host was not resolved.

Clayton suggested including the exit code in the non-zero exit log
entry [2].  Testing locally:

  $ echo $BASH_VERSION
  4.2.46(2)-release
  $ code="$( curl -s -o /dev/null -w "%{http_code}" https://does-not-exist.example.com -X POST -d '' || echo "Failed to POST https://oauth2.googleapis.com/token with $?" 1>&2)"
  Failed to POST https://oauth2.googleapis.com/token with 6

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/2724/pull-ci-openshift-installer-release-4.3-e2e-gcp/8
[2]: openshift#6190 (comment)
@wking wking force-pushed the gcp-token-poke-debug branch from 2336b88 to 8cbef5e Compare December 3, 2019 00:35
@smarterclayton
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 3, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 782eb7d into openshift:master Dec 3, 2019
@openshift-ci-robot
Copy link
Contributor

@wking: Updated the following 3 configmaps:

  • prow-job-cluster-launch-installer-e2e configmap in namespace ci-stg at cluster default using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
  • prow-job-cluster-launch-installer-e2e configmap in namespace ci at cluster ci/api-build01-ci-devcluster-openshift-com:6443 using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
  • prow-job-cluster-launch-installer-e2e configmap in namespace ci at cluster default using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
Details

In response to this:

Add some echos to the pokes that initially landed in 0ec2cd9 (#5720) to make it easier to rule out that code when debugging mysterious failures like:

Container setup exited with code 6, reason Error
---
Lease acquired, installing...
Installing from release registry.svc.ci.openshift.org/ci-op-r6dy480t/release@sha256:284ff92845dbfc3ca1be73159acc58b36cbfe03aed05d0f79582ea4207035da9
---

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the gcp-token-poke-debug branch December 3, 2019 01:06
@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/rehearse/openshift/cloud-credential-operator/master/e2e-gcp 8cbef5e link /test pj-rehearse
ci/rehearse/openshift/cloud-credential-operator/master/e2e-azure 8cbef5e link /test pj-rehearse
ci/prow/pj-rehearse 8cbef5e link /test pj-rehearse

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking
Copy link
Member Author

wking commented Dec 3, 2019

Hrm, recent, mysterious exit 6 here. Did I miss something here?

wking added a commit to wking/ci-tools that referenced this pull request Dec 20, 2019
Bringing over a number of changes which have landed in
ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
as of openshift/release@016eb4ed27 (Merge pull request
openshift/release#6505 from hongkailiu/clusterReaders, 2019-12-19).
One series was improved kill logic:

* openshift/release@9cd158adf3 (template: Use a more correct kill
  command, 2019-12-03, openshift/release#6223).
* openshift/release@d0744e520d (exit with 0 even if kill failed,
  2019-12-09, openshift/release#6295)

Another series was around AWS instance console logs:

* openshift/release@e102a16d89
  (ci-operator/templates/openshift/installer/cluster-launch-installer-e2e:
  Gather node console logs on AWS, 2019-12-02,
  openshift/release#6189).
* openshift/release@26fde70045
  (ci-operator/templates/openshift/installer/cluster-launch-installer-e2e:
  Set AWS_DEFAULT_REGION, 2019-12-04, openshift/release#6249).

And there was also:

* openshift/release@cdf97164aa (templates: Add large and xlarge
  variants, 2019-11-25, openshift/release#6081).
* openshift/release@8cbef5e4a7
  (ci-operator/templates/openshift/installer/cluster-launch-installer-e2e:
  Error-catching for Google OAuth pokes, 2019-12-02,
  openshift/release#6190).
* openshift/release@ad29eda8dd (template: Gather the prometheus target
  metadata during teardown, 2019-12-12, openshift/release#6379).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants