Bug 1961925: UPSTREAM: <carry>: Does not prevent pod creation because of no nodes reason when it runs under the regular cluster#756
Conversation
|
@cynepco3hahue: This pull request references Bugzilla bug 1961925, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@cynepco3hahue: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
|
dhellmann
left a comment
There was a problem hiding this comment.
The logic looks right. I'm not familiar with the informer pattern, so I'll leave that for the API team to comment on. I have one suggestion about the wording for the warning annotation, but that's not a blocker to approving this.
Thanks!
There was a problem hiding this comment.
| pod.Annotations[workloadAdmissionWarning] = "only single node clusters are supported" | |
| pod.Annotations[workloadAdmissionWarning] = "only single-node clusters support workload partitioning" |
|
LGTM |
There was a problem hiding this comment.
I'd like a measure of how long until the first pod is created before and after this change. Our install is likely be to sensitive to this. It should be fast, but I'd like to know how quickly this resource is available.
There was a problem hiding this comment.
Will it be enough to check how long takes installation with and without the PR?
There was a problem hiding this comment.
@deads2k I checked the installer time under the GCP for two jobs that include the PR and first one was ~29m and the second one ~31, and I checked the installer time under the GCP for jobs without the PR(checked 10 jobs), and the installer time always inside of interval 28-32 minutes
|
/approve structure looks ok. I'd like this measurement to be sure we aren't trading one problem for another (#756 (comment)) before the hold is released. If it is more than one minute, let's discuss before merge. If it's less than one minute, simply recording how much longer it is here when you release the hold is sufficient. |
|
/retest |
ae1f6d7 to
59a40f5
Compare
|
@cynepco3hahue: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
|
|
/retest |
1 similar comment
|
/retest |
|
/hold cancel |
|
Don't mix vendor changes with code changes. This make it hard during rebases. Make it two commits. |
59a40f5 to
093d51d
Compare
|
@cynepco3hahue: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
|
There was a problem hiding this comment.
what does nil mean? When can it be niil?
There was a problem hiding this comment.
this races. You can't write unprotected state.
There was a problem hiding this comment.
Why do you store this value at all? It is from a lister, i.e. has O(1) lookup as there is only one object. So no need to cache the value. Get rid of it inicluding the race.
…reason when it runs under the regular cluster Check the `cluster` infrastructure resource status to be sure that we run on top of a SNO cluster and in case if the pod runs on top of regular cluster, exit before node existence check. Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
093d51d to
727f445
Compare
|
@cynepco3hahue: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
|
| } | ||
|
|
||
| nodes, err := a.nodeLister.List(labels.Everything()) | ||
| clusterInfra, err := a.infraConfigLister.Get(infraClusterName) |
There was a problem hiding this comment.
can one delete the infra resource? Would that brick the cluster?
There was a problem hiding this comment.
@dhellmann @browsell Does it expected that an admin can delete the infrastructure object? What additional components in a cluster relay on it?
There was a problem hiding this comment.
@sttts if it's deletable, it's a bug. Admission shoudl be coded to prevent deletion of config.openshift.io objects.
There was a problem hiding this comment.
Lots of things rely on that resource. It's entirely possible that someone could delete it. There is also a period of time during bootstrapping where it won't exist yet. So, yes, we need to cope with it not existing and assume a default. Unfortunately, the default won't work for single node because it won't enable partitioning.
That race condition makes me think we need something other than an API resource to turn the feature on, since we need all partitioning annotations processed the same way from the beginning of the life of the cluster. I'm not sure what options we have. Elsewhere I would say use an environment variable or config file. Are those options in the API server @stts & @deads2k ?
There was a problem hiding this comment.
Nevermind, I misunderstood some things about the ordering during bootstrapping. It should be safe to assume the infrastructure resource exists when it's safe to create regular pods through the API.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
727f445 to
374f6f0
Compare
|
@cynepco3hahue: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
|
|
/retest |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cynepco3hahue, deads2k, sttts The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest |
|
@cynepco3hahue: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@cynepco3hahue: All pull requests linked via external trackers have merged: Bugzilla bug 1961925 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Check the
clusterinfrastructure resource status to be sure that we run on top of an SNO cluster and in case if the pod runs on top of the regular cluster, exit before node existence check.Signed-off-by: Artyom Lukianov alukiano@redhat.com