Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Aug 24, 2018

This pull request builds on #171; I'll rebase after that lands, and reviewers can feel free to wait on that before reviewing this PR.

I'm not authorized to assume roles in the teamcoreservices account, and the assume-role call errors out with:

An error occurred (AccessDenied) when calling the AssumeRole operation: Not authorized to perform sts:AssumeRole

That's fine though; I can still launch the cluster with my usual access. This pull request makes the role assumption optional. I've made setting up the AWS_* access variables conditional on successful role assumption, because setting them based on an empty $RES wouldn't work ;).

Using the && chain with a terminal || keeps the script from dying on this assume-role failure. From the set -e docs:

The shell does not exit if the command that fails is ... part of any command executed in a && or || list except the command following the final && or ||...

I'm also only setting iamRoleName configs if assume-role succeeded. We've been setting iamRoleName since the script landed in a2405e4 (coreos/tectonic-installer#3284) and possibly before that since 82daae1 (coreos/tectonic-installer#3074). I don't see anything in those commits or PRs to motivate the iamRoleName entries, but I'd guess they, like the tf-tectonic-installer role, are specific to the Jenkins setup. I've tied them together with CONFIGURE_AWS_ROLES based on that similarity, although in theory you may be able to toggle the iamRoleName settings independently of assume-role success.

Even though the &&/|| chain sets CONFIGURE_AWS_ROLES=False when assume-role failes, I'm using ${CONFIGURE_AWS_ROLES:-False} in the Python script. That way, future versions of this script that support libvirt (or other backends) won't need to bother setting CONFIGURE_AWS_ROLES and will still get valid Python here. The :- syntax is specified in POSIX, and my expansion defaults to False if CONFIGURE_AWS_ROLES is unset or empty.

I've also included an unrelated commit to move AWS_REGION and adjust its default fallbacks. Details in that commit message. I'm happy to spin this out into its own PR if it would make review easier.

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 24, 2018
@wking wking force-pushed the smoke-run-allow-failed-assume-role branch from 5f17cca to bcf2cde Compare August 24, 2018 19:35
@wking
Copy link
Member Author

wking commented Aug 24, 2018

retest this please

@abhinavdahiya
Copy link
Contributor

This script is supposed to be used by CI. Does it make sense to make is generic enough for users ???

@wking
Copy link
Member Author

wking commented Aug 24, 2018

Does it make sense to make is generic enough for users ???

It's easier for me to debug CI failures like I'm seeing in #94 if I can run the tests locally. These changes make it easier for me to adjust the script so I can launch a cluster in the smoke-test-approved way, run the smoke tests multiple times against that cluster (as I iterate on the test Go), and then finally tear the cluster back down. I could accomplish pretty much the same thing by walking through the README steps to create my cluster; but it's convenient to use the almost-works-for-me-out-of-the-box script ;).

@wking
Copy link
Member Author

wking commented Aug 24, 2018

There's a conflict with the just-landed #169. I'll wait until #171 lands and then rebase this one.

@wking wking force-pushed the smoke-run-allow-failed-assume-role branch from bcf2cde to 1121c0b Compare August 24, 2018 21:33
@wking
Copy link
Member Author

wking commented Aug 24, 2018

Rebased around #169 with bcf2cde -> 1121c0b.

wking added 2 commits August 24, 2018 15:06
I'm not authorized to assume roles in the teamcoreservices account,
and the assume-role call errors out with:

  An error occurred (AccessDenied) when calling the AssumeRole
  operation: Not authorized to perform sts:AssumeRole

That's fine though; I can still launch the cluster with my usual
access.  This commit makes the role assumption optional.  I've made
setting up the AWS_* access variables conditional on successful role
assumption, because setting them based on an empty $RES wouldn't work
;).

Using the && chain with a terminal || keeps the script from dying on
this assume-role failure.  From the 'set -e' docs [1]:

  The shell does not exit if the command that fails is ... part of any
  command executed in a && or || list except the command following the
  final && or ||...

I'm also only setting iamRoleName configs if assume-role succeeded.
We've been setting iamRoleName since the script landed in a2405e4
(run smoke tests with bash script, 2018-06-18,
coreos/tectonic-installer#3284) and possibly before that since
82daae1 (tests: add etcd role, 2018-03-14,
coreos/tectonic-installer#3074).  I don't see anything in those
commits or PRs to motivate the iamRoleName entries, but I'd guess
they, like the tf-tectonic-installer role, are specific to the Jenkins
setup.  I've tied them together with CONFIGURE_AWS_ROLES based on that
similarity, although in theory you may be able to toggle the
iamRoleName settings independently of assume-role success.

Even though the &&/|| chain sets CONFIGURE_AWS_ROLES=False when
assume-role failes, I'm using ${CONFIGURE_AWS_ROLES:-False} in the
Python script.  That way, future versions of this script that support
libvirt (or other backends) won't need to bother setting
CONFIGURE_AWS_ROLES and will still get valid Python here.  The :-
syntax is specified in [2], and my expansion defaults to False if
CONFIGURE_AWS_ROLES is unset or empty.

[1]: https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
This collects our AWS configuration to make it easier to toggle it
on/off all of our AWS-specific stuff, which we'll need as we add
support for libvirt (and potentially other smoke-test backends) in the
future.

I've also added an `aws configure get region` fallback, so the logic
is:

1. If the user has set AWS_REGION, use what they set.  Otherwise...
2. If we can extract a region from the user's config (with
   `aws configure get region`), use that.  Otherwise...
3. Fall back to eu-west-1.

Step 2 is new in this commit, and makes life a bit more convenient for
callers who have a preferred region (because they have their
preference respected without having to bother setting `AWS_REGION`).

The ${parameter:-[word]} syntax is specified in [1].

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
@wking wking force-pushed the smoke-run-allow-failed-assume-role branch from 1121c0b to 0ad03ed Compare August 24, 2018 22:08
@wking
Copy link
Member Author

wking commented Aug 24, 2018

The smoke-test error was:

 Unpacking artifacts...
 Exiting... Destroying Tectonic...
tectonic: error: path 'ci-pr-172-7fbe2' does not exist, try --help

I'm guessing that's because aws configure get region exited non-zero. I've pushed 1121c0b -> 0ad03ed which should address that.

@wking
Copy link
Member Author

wking commented Aug 24, 2018

All green, but see above for @abhinavdahiya's "do we want to do this?".

@wking
Copy link
Member Author

wking commented Aug 29, 2018

/retest all

^ exercising the just-landed openshift/release#1289.

@wking
Copy link
Member Author

wking commented Aug 29, 2018

Hmm, maybe /retest all isn't the right incantation.

/test all

@wking
Copy link
Member Author

wking commented Aug 30, 2018

The e2e-aws error was:

Waiting for API at https://ci-op-wj52qf1g-68485-api.origin-ci-int-aws.dev.rhcloud.com:6443 to respond ...
Interrupted

/retest

Copy link
Contributor

@yifan-gu yifan-gu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 30, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking, yifan-gu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit d94f87a into openshift:master Aug 31, 2018
@wking wking deleted the smoke-run-allow-failed-assume-role branch August 31, 2018 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants