-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Bug 1791993: proxy: use explicit list of platforms for metadata addresses #2939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The installer creates a manifest for proxy configuration, automatically adding specific addresses to NO_PROXY depending on the platform. One of those addresses is the metadata service, hosted at 169.254.169.254. The installer assumes this must be done for all platforms other than None of vSphere, whereas the cluster-network-operator has an explicit list of platforms: https://github.com/openshift/cluster-network-operator/blob/adaf257b4d63661726443ab2b059a9b4209a02d1/pkg/util/proxyconfig/no_proxy.go#L67-L69 When using a proxy with baremetal IPI, the installer adds this address, however when the CNO comes up, it does not, causing the rendered machine configs to differ, and installation to fail, with MCO reporting errors like: ``` pool master has not progressed to latest configuration: configuration status for pool master is empty: pool is degraded because nodes fail with "3 nodes are reporting degraded status on sync": "Node master-1 is reporting: \"machineconfig.machineconfiguration.openshift.io \\\"rendered-master-982b8698753da7e31b5f902aa4dc135e\\\" not found\"" ``` This needs a better, longer term solution to ensure the installer and CNO are not creating conflicting proxy objects, however as a short-term fix that is easily backportable to 4.3 to ensure proxies work on baremetal, this syncs the two lists between the installer and CNO.
|
Ahh, that's the bug behind openshift/machine-config-operator#1376 ? It feels like the cleanest fix here is to add the CNO into the bootstrap phase explicitly, rather than duplicate what it's doing in Terratform? |
|
Excellent commit message (as usual) BTW! And just for completeness here's the exact code in the CNO: https://github.com/openshift/cluster-network-operator/blob/adaf257b4d63661726443ab2b059a9b4209a02d1/pkg/util/proxyconfig/no_proxy.go#L67 |
That sounds like the right solution, however given that this affects 4.3, and we have some users affected by this, do you think there's a chance we could sync the two lists of platforms for now, and then work on a longer term fix? |
|
/approve |
|
/cherry-pick release-4.3 |
|
@stbenjam: once the present PR merges, I will cherry-pick it on top of release-4.3 in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Yep, I 100% agree with landing this fix as is now. |
|
@stbenjam: This pull request references Bugzilla bug 1791993, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
4.3 BZ is 1791995 |
|
/bugzilla refresh |
|
@stbenjam: This pull request references Bugzilla bug 1791993, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
I didn't hit save 🤦♂️ /bugzilla refresh |
|
@stbenjam: This pull request references Bugzilla bug 1791993, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/lgtm |
|
/approve Maybe we need explicit approval now? |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-aws-upgrade |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
9 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
Hmm there are more clouds than none/baremetal so this metadata ip being added to default and none/baremetal opting out seems like a maintainable solution imo. |
I am fine with either approach, but this change has the benefit of only requiring 1 change instead of 2. What did you think of @cgwalters suggestion? IMHO I agree the best approach is not having the same thing done in two different codebases and avoid introducing this risk of conflict at all. #2939 (comment) |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/test e2e-aws |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@stbenjam: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@stbenjam: All pull requests linked via external trackers have merged. Bugzilla bug 1791993 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@stbenjam: new pull request created: #2944 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@stbenjam: Bugzilla bug 1791993 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The installer creates a manifest for proxy configuration, automatically
adding specific addresses to NO_PROXY depending on the platform. One of
those addresses is the metadata service, hosted at 169.254.169.254. The
installer assumes this must be done for all platforms other than None or
vSphere, whereas the cluster-network-operator has an explicit list of
platforms:
https://github.com/openshift/cluster-network-operator/blob/adaf257b4d63661726443ab2b059a9b4209a02d1/pkg/util/proxyconfig/no_proxy.go#L67-L69
When using a proxy with baremetal IPI, the installer adds this address,
however when the CNO comes up, it does not, causing the rendered
machine configs to differ, and installation to fail, with MCO reporting
errors like:
This needs a better, longer term solution to ensure the installer and
CNO are not creating conflicting proxy objects, however as a short-term
fix that is easily backportable to 4.3 to ensure proxies work on
baremetal, this syncs the two lists between the installer and CNO.