Skip to content

OCPBUGS-2197: OCPBUGS-2122: update: Inject proxy data for firstboot#3370

Merged
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
cgwalters:firstboot-proxy-unit
Oct 13, 2022
Merged

OCPBUGS-2197: OCPBUGS-2122: update: Inject proxy data for firstboot#3370
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
cgwalters:firstboot-proxy-unit

Conversation

@cgwalters
Copy link
Member

The least invasive fix for ensuring we get the proxy setup in the unit.

Closes: https://issues.redhat.com/browse/OCPBUGS-2197

@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 12, 2022
@openshift-ci-robot
Copy link
Contributor

@cgwalters: This pull request references Jira Issue OCPBUGS-2197, which is invalid:

  • expected the bug to target the "4.12.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

The least invasive fix for ensuring we get the proxy setup in the unit.

Closes: https://issues.redhat.com/browse/OCPBUGS-2197

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 12, 2022
@cgwalters
Copy link
Member Author

Not tested manually, but this seems sufficiently obvious to just toss up directly and let CI cover the non-proxy paths. If I correctly read the incantation from the cluster-bot spellbook (i.e. "test upgrade #3370 quay.io/openshift-release-dev/ocp-release:4.12.0-ec.4-x86_64 proxy"), then https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws-modern/1580175797228933120 will test an upgrade with a proxy enabled from this.

@cgwalters
Copy link
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 12, 2022
@openshift-ci-robot
Copy link
Contributor

@cgwalters: This pull request references Jira Issue OCPBUGS-2197, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.0) matches configured target version for branch (4.12.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @rioliu-rh

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from rioliu-rh October 12, 2022 12:38
@sinnykumari
Copy link
Contributor

This should also fix https://issues.redhat.com/browse/OCPBUGS-2122 and https://issues.redhat.com/browse/OCPBUGS-2245 . Running proxy job that we have in payload to get ci coverage.
/payload-job periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 12, 2022

@sinnykumari: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d36a5440-4a30-11ed-8a6d-c80c3c04d9f0-0

@sinnykumari sinnykumari changed the title OCPBUGS-2197: update: Inject proxy data for firstboot OCPBUGS-2197: OCPBUGS-2122: update: Inject proxy data for firstboot Oct 12, 2022
@sinnykumari
Copy link
Contributor

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@sinnykumari: This pull request references Jira Issue OCPBUGS-2197, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.0) matches configured target version for branch (4.12.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @rioliu-rh

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member Author

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 12, 2022
@sinnykumari
Copy link
Contributor

another attempt to also include OCPBUGS-2122 (after fixing target version)
/jira refresh

@openshift-ci-robot
Copy link
Contributor

@sinnykumari: This pull request references Jira Issue OCPBUGS-2197, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.0) matches configured target version for branch (4.12.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @rioliu-rh

Details

In response to this:

another attempt to also include OCPBUGS-2122 (after fixing target version)
/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam
Copy link
Member

/payload-job periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 12, 2022

@stbenjam: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cc239600-4a40-11ed-9565-03f2d9f9e442-0

@cgwalters
Copy link
Member Author

Ah no, we actually need to inject the proxy environment into the container, not the systemd unit.

The least invasive fix for ensuring we get the proxy setup
in the unit.

Closes: https://issues.redhat.com/browse/OCPBUGS-2197
@cgwalters cgwalters force-pushed the firstboot-proxy-unit branch from 1d7cf67 to 9f87ef8 Compare October 12, 2022 15:41
@cgwalters
Copy link
Member Author

/payload-job periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 12, 2022

@cgwalters: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/73e432c0-4a44-11ed-8d4a-c499e8ef7fb4-0

@sinnykumari
Copy link
Contributor

unit test failed, periodic job failure could be related to that.

@stbenjam
Copy link
Member

stbenjam commented Oct 12, 2022

Logs are in the installer log-bundle, it shows this:

Oct 12 16:36:57 ip-10-0-57-66 machine-config-daemon[2132]: W1012 16:36:57.096500    2132 
firstboot_complete_machineconfig.go:46] error: failed to update OS to registry.build01.ci.openshift.org/ci-op-km61h217
/stable@sha256:67e996c9886db89d3b8b66fed715806ebd91a7d598564bc0d33b75be2d5f9d75 : error running
 rpm-ostree rebase --experimental ostree-unverified-registry:registry.build01.ci.openshift.org/ci-op-km61h217
/stable@sha256:67e996c9886db89d3b8b66fed715806ebd91a7d598564bc0d33b75be2d5f9d75: error: remote error: pinging 
container registry registry.build01.ci.openshift.org: Get "https://registry.build01.ci.openshift.org/v2/": dial tcp 54.243.147.229:443: i/o timeout

Is that the right place? Looks like rpm-ostree rebase is run in pkg/daemon/update.go - is that getting the proxy settings?

@cgwalters
Copy link
Member Author

Is that the right place? Looks like rpm-ostree rebase is run in pkg/daemon/update.go - is that getting the proxy settings?

We actually now have three places this needs to be handled confusingly:

  • old machine-os-content flow (all works)
  • firstboot (and 4.11 upgrade hack) machine-config-daemon-update-rpmostree-via-container.service 🆕
  • Default (happy path) rpm-ostreed.service 🆕

Both the latter two should be fixed now, let's see.

/payload-job periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 12, 2022

@cgwalters: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7e6a16a0-4a59-11ed-8c70-d303a279425c-0

@sinnykumari
Copy link
Contributor

latest proxy job failed during m-c-d-firstboot:

Oct 12 18:45:27 ip-10-0-48-255 machine-config-daemon[2134]: W1012 18:45:27.469507    2134 firstboot_complete_machineconfig.go:46] error: failed to update OS to registry.build01.ci.openshift.org/ci-op-pf8311f2/stable@sha256:67e996c9886db89d3b8b66fed715806ebd91a7d598564bc0d33b75be2d5f9d75 : error running rpm-ostree rebase --experimental ostree-unverified-registry:registry.build01.ci.openshift.org/ci-op-pf8311f2/stable@sha256:67e996c9886db89d3b8b66fed715806ebd91a7d598564bc0d33b75be2d5f9d75: error: remote error: pinging container registry registry.build01.ci.openshift.org: Get "https://registry.build01.ci.openshift.org/v2/": dial tcp 54.243.147.229:443: i/o timeout
Oct 12 18:45:27 ip-10-0-48-255 machine-config-daemon[2134]: : exit status 1

@yuqi-zhang
Copy link
Contributor

/test e2e-hypershift

@cgwalters
Copy link
Member Author

/payload-job periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy
I don't know why this isn't working; been fighting trying to get a debug shell in a reproducer environment. Added some debug information in the hope that will help. But I think what may work better is to spawn a cluster in the proxy env, then do an upgrade from it. Will look at that tomorrow.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2022

@cgwalters: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2752b7d0-4a8d-11ed-869b-464ef0df4424-0

@cgwalters
Copy link
Member Author

OK, hopping into a 4.11 cluster's node that is set up with a proxy, and manually overriding rpm-ostree+ostree+skopeo, I can verify that injecting the EnvironmentFile=/etc/mco/proxy.env is indeed sufficient to let us pull containers via the skopeo proxy (as expected!).

@cgwalters
Copy link
Member Author

Ohh...I think I see the problem, somehow we're not getting the new drop-in. It may have to do with the fact that we have two separate drop-ins for rpm-ostreed? Digging

In the new OS update happy path, it's rpm-ostreed.service which is
pulling the container image, not the MCD.  We need to inject the
proxy env into that service, in the same way we were doing in
the past for `pivot.service` etc.
@cgwalters cgwalters force-pushed the firstboot-proxy-unit branch from 708a6f3 to 53da993 Compare October 13, 2022 12:56
@cgwalters
Copy link
Member Author

/payload-job periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2022

@cgwalters: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-ovn-proxy

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7f8a8c70-4af6-11ed-881f-c35a64ef7d02-0

@MaysaMacedo
Copy link
Contributor

Thanks for working on this! It also helps openstack installations with proxy. I tested it manually

@cgwalters
Copy link
Member Author

Yep, looking at the payload job, we're now past the install phase! 🎉

Man...this not working on the first try because that obscure template inheritance issue just completely undermined my confidence, and I'm glad the problem was just something silly and not anything fundamental.

@cgwalters
Copy link
Member Author

/hold cancel
since the install phase covers what we care about here

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 13, 2022
@sinnykumari
Copy link
Contributor

yep, looking at install log,

time="2022-10-13T13:48:10Z" level=info msg="Install complete!"
time="2022-10-13T13:48:10Z" level=info msg="To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/tmp/installer/auth/kubeconfig'"

This was tricky, thanks for getting this fixed!
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 13, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cgwalters
Copy link
Member Author

/override ci/prow/e2e-aws
Failed during deprovisioning, not related to this change, and we have previous passes with basically the same code.
/override ci/prow/e2e-agnostic-upgrade
Ditto on deprovisioner

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 73c5c86 and 2 for PR HEAD 53da993 in total

@cgwalters
Copy link
Member Author

/override ci/prow/e2e-aws
/override ci/prow/e2e-agnostic-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2022

@cgwalters: Overrode contexts on behalf of cgwalters: ci/prow/e2e-agnostic-upgrade, ci/prow/e2e-aws

Details

In response to this:

/override ci/prow/e2e-aws
/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit e46a003 into openshift:master Oct 13, 2022
@openshift-ci-robot
Copy link
Contributor

@cgwalters: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-2197 has been moved to the MODIFIED state.

Details

In response to this:

The least invasive fix for ensuring we get the proxy setup in the unit.

Closes: https://issues.redhat.com/browse/OCPBUGS-2197

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2022

@cgwalters: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws 53da993 link false /test okd-scos-e2e-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants