Skip to content

Bug 1961925: UPSTREAM: <carry>: Does not prevent pod creation because of no nodes reason when it runs under the regular cluster#756

Merged
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
cynepco3hahue:admission_plugin_sno_imrovement
May 27, 2021
Merged

Bug 1961925: UPSTREAM: <carry>: Does not prevent pod creation because of no nodes reason when it runs under the regular cluster#756
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
cynepco3hahue:admission_plugin_sno_imrovement

Conversation

@cynepco3hahue
Copy link

Check the cluster infrastructure resource status to be sure that we run on top of an SNO cluster and in case if the pod runs on top of the regular cluster, exit before node existence check.

Signed-off-by: Artyom Lukianov alukiano@redhat.com

@openshift-ci-robot openshift-ci-robot added the backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. label May 19, 2021
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels May 19, 2021
@openshift-ci
Copy link

openshift-ci bot commented May 19, 2021

@cynepco3hahue: This pull request references Bugzilla bug 1961925, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @wangke19

Details

In response to this:

Bug 1961925: UPSTREAM: : Does not prevent pod creation because of no nodes reason when it runs under the regular cluster

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from wangke19 May 19, 2021 18:01
@openshift-ci-robot
Copy link

@cynepco3hahue: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

@openshift-ci openshift-ci bot requested review from deads2k and sttts May 19, 2021 18:01
Copy link

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks right. I'm not familiar with the informer pattern, so I'll leave that for the API team to comment on. I have one suggestion about the wording for the warning annotation, but that's not a blocker to approving this.

Thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pod.Annotations[workloadAdmissionWarning] = "only single node clusters are supported"
pod.Annotations[workloadAdmissionWarning] = "only single-node clusters support workload partitioning"

@sjenning
Copy link

LGTM

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like a measure of how long until the first pod is created before and after this change. Our install is likely be to sensitive to this. It should be fast, but I'd like to know how quickly this resource is available.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be enough to check how long takes installation with and without the PR?

Copy link
Author

@cynepco3hahue cynepco3hahue May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k I checked the installer time under the GCP for two jobs that include the PR and first one was ~29m and the second one ~31, and I checked the installer time under the GCP for jobs without the PR(checked 10 jobs), and the installer time always inside of interval 28-32 minutes

@deads2k
Copy link

deads2k commented May 19, 2021

/approve
/hold

structure looks ok. I'd like this measurement to be sure we aren't trading one problem for another (#756 (comment)) before the hold is released. If it is more than one minute, let's discuss before merge. If it's less than one minute, simply recording how much longer it is here when you release the hold is sufficient.

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 19, 2021
@cynepco3hahue
Copy link
Author

/retest

@cynepco3hahue cynepco3hahue force-pushed the admission_plugin_sno_imrovement branch from ae1f6d7 to 59a40f5 Compare May 20, 2021 09:21
@openshift-ci-robot
Copy link

@cynepco3hahue: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

@cynepco3hahue
Copy link
Author

/retest

1 similar comment
@cynepco3hahue
Copy link
Author

/retest

@deads2k
Copy link

deads2k commented May 24, 2021

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2021
@sttts
Copy link

sttts commented May 25, 2021

Don't mix vendor changes with code changes. This make it hard during rebases. Make it two commits.

@cynepco3hahue cynepco3hahue force-pushed the admission_plugin_sno_imrovement branch from 59a40f5 to 093d51d Compare May 25, 2021 11:25
@openshift-ci-robot
Copy link

@cynepco3hahue: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does nil mean? When can it be niil?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this races. You can't write unprotected state.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you store this value at all? It is from a lister, i.e. has O(1) lookup as there is only one object. So no need to cache the value. Get rid of it inicluding the race.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Thanks!

…reason when it runs under the regular cluster

Check the `cluster` infrastructure resource status to be sure that we run on top of a SNO cluster
and in case if the pod runs on top of regular cluster, exit before node existence check.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
@cynepco3hahue cynepco3hahue force-pushed the admission_plugin_sno_imrovement branch from 093d51d to 727f445 Compare May 25, 2021 11:46
@openshift-ci-robot
Copy link

@cynepco3hahue: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

}

nodes, err := a.nodeLister.List(labels.Everything())
clusterInfra, err := a.infraConfigLister.Get(infraClusterName)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can one delete the infra resource? Would that brick the cluster?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhellmann @browsell Does it expected that an admin can delete the infrastructure object? What additional components in a cluster relay on it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sttts if it's deletable, it's a bug. Admission shoudl be coded to prevent deletion of config.openshift.io objects.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of things rely on that resource. It's entirely possible that someone could delete it. There is also a period of time during bootstrapping where it won't exist yet. So, yes, we need to cope with it not existing and assume a default. Unfortunately, the default won't work for single node because it won't enable partitioning.

That race condition makes me think we need something other than an API resource to turn the feature on, since we need all partitioning annotations processed the same way from the beginning of the life of the cluster. I'm not sure what options we have. Elsewhere I would say use an environment variable or config file. Are those options in the API server @stts & @deads2k ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I misunderstood some things about the ordering during bootstrapping. It should be safe to assume the infrastructure resource exists when it's safe to create regular pods through the API.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
@cynepco3hahue cynepco3hahue force-pushed the admission_plugin_sno_imrovement branch from 727f445 to 374f6f0 Compare May 25, 2021 12:28
@openshift-ci-robot
Copy link

@cynepco3hahue: the contents of this pull request could not be automatically validated.

The following commits could not be validated and must be approved by a top-level approver:

@cynepco3hahue
Copy link
Author

/retest

@sttts
Copy link

sttts commented May 26, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 26, 2021
@openshift-ci
Copy link

openshift-ci bot commented May 26, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cynepco3hahue, deads2k, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sttts sttts removed the backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. label May 27, 2021
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@cynepco3hahue
Copy link
Author

/retest

@openshift-ci
Copy link

openshift-ci bot commented May 27, 2021

@cynepco3hahue: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-csi 374f6f0 link /test e2e-aws-csi

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 4b2b6ff into openshift:master May 27, 2021
@openshift-ci
Copy link

openshift-ci bot commented May 27, 2021

@cynepco3hahue: All pull requests linked via external trackers have merged:

Bugzilla bug 1961925 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1961925: UPSTREAM: : Does not prevent pod creation because of no nodes reason when it runs under the regular cluster

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. vendor-update Touching vendor dir or related files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants