Skip to content

Conversation

@stbenjam
Copy link
Member

@stbenjam stbenjam commented Jan 16, 2020

The installer creates a manifest for proxy configuration, automatically
adding specific addresses to NO_PROXY depending on the platform. One of
those addresses is the metadata service, hosted at 169.254.169.254. The
installer assumes this must be done for all platforms other than None or
vSphere, whereas the cluster-network-operator has an explicit list of
platforms:

https://github.com/openshift/cluster-network-operator/blob/adaf257b4d63661726443ab2b059a9b4209a02d1/pkg/util/proxyconfig/no_proxy.go#L67-L69

When using a proxy with baremetal IPI, the installer adds this address,
however when the CNO comes up, it does not, causing the rendered
machine configs to differ, and installation to fail, with MCO reporting
errors like:

pool master has not progressed to latest configuration: configuration
status for pool master is empty: pool is degraded because nodes fail
with "3 nodes are reporting degraded status on sync": "Node master-1 is
reporting: \"machineconfig.machineconfiguration.openshift.io
\\\"rendered-master-982b8698753da7e31b5f902aa4dc135e\\\" not found\""

This needs a better, longer term solution to ensure the installer and
CNO are not creating conflicting proxy objects, however as a short-term
fix that is easily backportable to 4.3 to ensure proxies work on
baremetal, this syncs the two lists between the installer and CNO.

The installer creates a manifest for proxy configuration, automatically
adding specific addresses to NO_PROXY depending on the platform. One of
those addresses is the metadata service, hosted at 169.254.169.254. The
installer assumes this must be done for all platforms other than None of
vSphere, whereas the cluster-network-operator has an explicit list of
platforms:

https://github.com/openshift/cluster-network-operator/blob/adaf257b4d63661726443ab2b059a9b4209a02d1/pkg/util/proxyconfig/no_proxy.go#L67-L69

When using a proxy with baremetal IPI, the installer adds this address,
however when the CNO comes up, it does not, causing the rendered
machine configs to differ, and installation to fail, with MCO reporting
errors like:

```
pool master has not progressed to latest configuration: configuration
status for pool master is empty: pool is degraded because nodes fail
with "3 nodes are reporting degraded status on sync": "Node master-1 is
reporting: \"machineconfig.machineconfiguration.openshift.io
\\\"rendered-master-982b8698753da7e31b5f902aa4dc135e\\\" not found\""
```

This needs a better, longer term solution to ensure the installer and
CNO are not creating conflicting proxy objects, however as a short-term
fix that is easily backportable to 4.3 to ensure proxies work on
baremetal, this syncs the two lists between the installer and CNO.
@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 16, 2020
@cgwalters
Copy link
Member

Ahh, that's the bug behind openshift/machine-config-operator#1376 ?

It feels like the cleanest fix here is to add the CNO into the bootstrap phase explicitly, rather than duplicate what it's doing in Terratform?

@cgwalters
Copy link
Member

Excellent commit message (as usual) BTW!

And just for completeness here's the exact code in the CNO: https://github.com/openshift/cluster-network-operator/blob/adaf257b4d63661726443ab2b059a9b4209a02d1/pkg/util/proxyconfig/no_proxy.go#L67

@stbenjam
Copy link
Member Author

Ahh, that's the bug behind openshift/machine-config-operator#1376 ?

It feels like the cleanest fix here is to add the CNO into the bootstrap phase explicitly, rather than duplicate what it's doing in Terratform?

That sounds like the right solution, however given that this affects 4.3, and we have some users affected by this, do you think there's a chance we could sync the two lists of platforms for now, and then work on a longer term fix?

@cgwalters
Copy link
Member

/approve

@stbenjam
Copy link
Member Author

/cherry-pick release-4.3

@openshift-cherrypick-robot

@stbenjam: once the present PR merges, I will cherry-pick it on top of release-4.3 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member

do you think there's a chance we could sync the two lists of platforms for now, and then work on a longer term fix?

Yep, I 100% agree with landing this fix as is now.

@stbenjam stbenjam changed the title proxy: use explicit list of platforms for metadata addresses Bug 1791993: proxy: use explicit list of platforms for metadata addresses Jan 16, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jan 16, 2020
@openshift-ci-robot
Copy link
Contributor

@stbenjam: This pull request references Bugzilla bug 1791993, which is invalid:

  • expected the bug to target the "4.4.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1791993: proxy: use explicit list of platforms for metadata addresses

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam
Copy link
Member Author

4.3 BZ is 1791995

@stbenjam
Copy link
Member Author

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@stbenjam: This pull request references Bugzilla bug 1791993, which is invalid:

  • expected the bug to target the "4.4.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam
Copy link
Member Author

I didn't hit save 🤦‍♂️

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jan 16, 2020
@openshift-ci-robot
Copy link
Contributor

@stbenjam: This pull request references Bugzilla bug 1791993, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

I didn't hit save 🤦‍♂️

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jan 16, 2020
@wking
Copy link
Member

wking commented Jan 16, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 16, 2020
@wking
Copy link
Member

wking commented Jan 16, 2020

/approve

Maybe we need explicit approval now?

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 16, 2020
@stbenjam
Copy link
Member Author

/test e2e-aws-upgrade

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

9 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@abhinavdahiya
Copy link
Contributor

Hmm there are more clouds than none/baremetal so this metadata ip being added to default and none/baremetal opting out seems like a maintainable solution imo.

@stbenjam
Copy link
Member Author

Hmm there are more clouds than none/baremetal so this metadata ip being added to default and none/baremetal opting out seems like a maintainable solution imo.

I am fine with either approach, but this change has the benefit of only requiring 1 change instead of 2.

What did you think of @cgwalters suggestion? IMHO I agree the best approach is not having the same thing done in two different codebases and avoid introducing this risk of conflict at all. #2939 (comment)

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@stbenjam
Copy link
Member Author

/test e2e-aws

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 17, 2020

@stbenjam: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-ovirt f92b14d link /test e2e-ovirt
ci/prow/e2e-openstack f92b14d link /test e2e-openstack
ci/prow/e2e-libvirt f92b14d link /test e2e-libvirt

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit ae1de17 into openshift:master Jan 17, 2020
@openshift-ci-robot
Copy link
Contributor

@stbenjam: All pull requests linked via external trackers have merged. Bugzilla bug 1791993 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1791993: proxy: use explicit list of platforms for metadata addresses

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@stbenjam: new pull request created: #2944

Details

In response to this:

/cherry-pick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam stbenjam deleted the proxy branch January 17, 2020 21:21
@openshift-ci-robot
Copy link
Contributor

@stbenjam: Bugzilla bug 1791993 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state.

Details

In response to this:

Bug 1791993: proxy: use explicit list of platforms for metadata addresses

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants