OCPBUGS-60772: Reuse instance groups#86
OCPBUGS-60772: Reuse instance groups#86openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
8661974 to
796793d
Compare
|
/test unit |
|
@theobarberbany , this does resolves the internal lb creation issue we were getting for both xpn and non-xpn. |
|
Regression looks good too . /test unit |
|
@miyadav Woop! Amazing news! Now just to clean the code up, and document it thoroughly so the next poor soul who comes across it isn't as confused :) |
5557bc7 to
04ed5c6
Compare
|
/test regression-clusterinfra-cucushift-rehearse-gcp-ipi |
|
@theobarberbany: The specified target(s) for The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test regression-clusterinfra-gcp-ipi-ccm |
116fed0 to
7e86daa
Compare
|
/test regression-clusterinfra-cucushift-rehearse-gcp-ipi |
|
/test unit |
|
/test security |
|
have ignored the security failures in snyk as these were all test files & Other files reported was due to use of _token as username - test should pass now. |
|
The unit test failure is fixed by PR when tested locally , but failing still on main ci edit - passed |
|
Other than ccm cases on xpn cluster looks good ( failure ones are due to being spot instances , validated manually looks good ) Non-xpn non ccm cases result are good ( failed cases due to spot instances , which passed when tested manually) |
nrb
left a comment
There was a problem hiding this comment.
I don't think I see the behavior from https://github.com/openshift/cloud-provider-gcp/pull/66/files captured here, though the pre-merge tests are passing.
Thanks for writing the test for this, and for cleaning up the logic code!
|
|
||
| // If all instances in this external instance group are also in our zone's node list, | ||
| // we can reuse this instance group instead of creating our own internal instance group | ||
| shouldReuse = gceHostNamesInZone.HasAll(instanceNames.UnsortedList()...) |
There was a problem hiding this comment.
I think the logic here is relevant to https://github.com/openshift/cloud-provider-gcp/pull/66/files; the condition that was added was, in English, "or all instanceNames start with the set prefix."
If I'm reading this correctly, that "or" condition isn't here, or perhaps it's somewhere else in the PR?
There was a problem hiding this comment.
Ah yeah, I don't understand the motivation for patrick's change 😢
I think, given regression is passing and QE has not found problems we're fine? Maybe?
Understanding in general, motivation for why things are this way / decisions is hard with this.
Given we're installing the e2es / regression with CAPG, shouldn't this fail?
There was a problem hiding this comment.
What would the additional check buy us?
Here, gceHostNamesInZone is every hostname of every node in the zone. instanceNames is... oh, weird. Maybe that should be simplified?
But still, why would "all instanceNames start with the set prefix" be relevant in addition to this?
There was a problem hiding this comment.
Per Patrick's comment, the bootstrap node on install wasn't joining the proper instance groups, resulting in an attempt to use two IGs for one node.
@patrickdillon are you able to give this a look?
There was a problem hiding this comment.
Ah yeah, I don't understand the motivation for patrick's change 😢
I think, given regression is passing and QE has not found problems we're fine?
The relevant bug is OCPBUGS-35256 and it only occurs on private clusters, so you would want to make sure to test a private install to be confident.
Here, gceHostNamesInZone is every hostname of every node in the zone. instanceNames is... oh, weird. Maybe that should be simplified?
But still, why would "all instanceNames start with the set prefix" be relevant in addition to this?
Because the bootstrap instance is not a node in the cluster, if we only look at node hostnames, the bootstrap node will not get joined to the correct instance group.
There was a problem hiding this comment.
you would want to make sure to test a private install to be confident.
Private clusters without XPN, correct?
There was a problem hiding this comment.
Correct. It may also affect private clusters with XPN--I'm not sure.
|
|
||
| // If all instances in this external instance group are also in our zone's node list, | ||
| // we can reuse this instance group instead of creating our own internal instance group | ||
| shouldReuse = gceHostNamesInZone.HasAll(instanceNames.UnsortedList()...) |
There was a problem hiding this comment.
What would the additional check buy us?
Here, gceHostNamesInZone is every hostname of every node in the zone. instanceNames is... oh, weird. Maybe that should be simplified?
But still, why would "all instanceNames start with the set prefix" be relevant in addition to this?
|
/test regression-clusterinfra-cucushift-rehearse-gcp-ipi |
|
/test unit |
1 similar comment
|
/test unit |
|
The units are very flakey, and should be helped by #88. /override ci/prow/unit |
|
@theobarberbany: Overrode contexts on behalf of theobarberbany: ci/prow/unit DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| UseMetadataServer bool | ||
| AlphaFeatureGate *AlphaFeatureGate | ||
| StackType string | ||
| ExternalInstanceGroupsPrefix string |
There was a problem hiding this comment.
Nit, if you left a blank line between this and the upstream code, wouldn't it prevent the indentation change and make future merges easier?
| metricsCollector: newLoadBalancerMetrics(), | ||
| projectsBasePath: getProjectsBasePath(service.BasePath), | ||
| stackType: StackType(config.StackType), | ||
| externalInstanceGroupsPrefix: config.ExternalInstanceGroupsPrefix, |
There was a problem hiding this comment.
Same nit, what happens if you leave a blank line intentionally to minimise indentation changes
|
/lgtm I think Joel's nits are relevant, but I also don't think we should hold up builds for it right now. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nrb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/label acknowledge-critical-fixes |
|
@nrb: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/label acknowledge-critical-fixes-only |
|
/retest |
|
/override e2e-gcp-ovn we failed on deprovision. |
|
@theobarberbany: /override requires failed status contexts, check run or a prowjob name to operate on.
Only the following failed contexts/checkruns were expected:
If you are trying to override a checkrun that has a space in it, you must put a double quote on the context. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override ci/prow/e2e-gcp-ovn |
|
@theobarberbany: Overrode contexts on behalf of theobarberbany: ci/prow/e2e-gcp-ovn DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@theobarberbany: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/tide refresh |
f940e72
into
openshift:main
|
@theobarberbany: Jira Issue OCPBUGS-60772: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-60772 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This PR rewrites 49f5389.
Work around GCP internal load balancer restrictions for multi-subnet clusters.
GCP internal load balancers have specific restrictions that prevent
straightforward load balancing across multiple subnets:
For clusters with nodes across multiple subnets, the previous implementation
would fail to create internal load balancers. This change implements a
two-pass approach:
that contain ONLY cluster nodes and reuse them for the backend service
external groups
This ensures compliance with GCP restrictions while enabling multi-subnet
load balancing for Kubernetes clusters.
References: