OCPBUGS-58313: Admit sysctls based on the worker node kernel version instead of the current node kernel version by jubittajohn · Pull Request #151 · openshift/apiserver-library-go

jubittajohn · 2025-09-29T15:07:13Z

The sysctls should be admitted based on the worker node kernel version instead of the current node kernel version(kernel version of the machine running api server), to avoid SCC admitting a pod with a sysctl param that is unsafe on the worker's kernel.

openshift-ci · 2025-09-29T15:07:35Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-09-29T15:09:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jubittajohn
Once this PR has been reviewed and has the lgtm label, please assign ibihim for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/securitycontextconstraints/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/securitycontextconstraints/sysctl/mustmatchpatterns.go

benluddy · 2025-11-04T20:18:19Z

pkg/securitycontextconstraints/sysctl/mustmatchpatterns.go

+// The kernel version here refers to that of the worker nodes, since sysctl
+// settings apply to pods running on worker nodes rather than control plane nodes


What guarantees are there that a pod admitted by SCC will only be scheduled to a worker node?

I have changed it to compute the minimum kernel version across all the nodes in the cluster. Is that what was intended?

I want to confirm what was intended in the original comment:
Was the goal to compute the minimum kernel version across all nodes in the cluster rather than just the worker nodes, or was the question about how SCC-admitted pods are guaranteed to run only on worker nodes?

My understanding is that SCC does not control scheduling, and that placement is determined by schedulers using taints/tolerations. Could you clarify which interpretation is correct?

openshift-ci-robot · 2025-11-05T18:44:50Z

@jubittajohn: This pull request references Jira Issue OCPBUGS-58313, which is invalid:

expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is Verified instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

The sysctls should be admitted based on the worker node kernel version instead of the current node kernel version(kernel version of the machine running api server), to avoid SCC admitting a pod with a sysctl param that is unsafe on the worker's kernel.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jubittajohn · 2025-11-05T19:21:48Z

@benluddy
I have a question regarding the kerenel check. The check was added only for the newly added sysctls in the apiserver-library-go(to avoid regressions . related comment: #148 (comment)), but the kubelet performs kernel checks for older sysctls as well(https://github.com/openshift/kubernetes/blob/master/pkg/kubelet/sysctl/safe_sysctls.go#L35-L72). Could this be a problem, since we are allowing the older sysctls here irrespective of the supported kernel version?

ibihim · 2025-11-26T11:06:51Z

pkg/securitycontextconstraints/sccadmission/admission.go

 func (c *constraint) SetExternalKubeInformerFactory(informers informers.SharedInformerFactory) {
 	c.namespaceLister = informers.Core().V1().Namespaces().Lister()
+	c.nodeLister = informers.Core().V1().Nodes().Lister()
 	c.listersSynced = append(c.listersSynced, informers.Core().V1().Namespaces().Informer().HasSynced)


Suggested change

c.listersSynced = append(c.listersSynced, informers.Core().V1().Namespaces().Informer().HasSynced)

c.listersSynced = append(

c.listersSynced,

informers.Core().V1().Namespaces().Informer().HasSynced,

informers.Core().V1().Nodes.().Informer().HasSynced,

)

You must add the node-lister to the list of listers that we need to wait for, before computing admission.

ibihim · 2025-11-26T11:20:38Z

pkg/securitycontextconstraints/sccadmission/admission.go

 	}

-	providers, errs := sccmatching.CreateProvidersFromConstraints(ctx, a.GetNamespace(), constraints, c.namespaceLister)
+	providers, errs := sccmatching.CreateProvidersFromConstraints(ctx, a.GetNamespace(), constraints, c.namespaceLister, c.nodeLister)


From a code / maintenance perspective, I must say that handing the nodeLister 6 levels down doesn't look good. We are coupling all those functions with the nodeLister. E.g.:

func NewSimpleProvider( scc *securityv1.SecurityContextConstraints, nodeLister corev1listers.NodeLister, ) (SecurityContextConstraintsProvider, error)

It reads well if you transform a scc into a provider, but now you have a scc and a nodeLister?!

Couldn't we check the legit sysctls and extend the constraints.AllowedUnsafeSysctls or adjust the SimpleProvider to hold those specificly, so that we don't pass c.nodeLister down? It would read better like so:

func NewSimpleProvider( scc *securityv1.SecurityContextConstraints, availableSysCtls []string, ) (SecurityContextConstraintsProvider, error)

or

// Though this taints the clear meaning of what is defined by the user and what is possible by the system. NewSimpleProvider( updateSysctlsBasedOnSafeWhitelist(scc), )

Everything else might be more effort, like some Factory or so.

WDYT?

Thank you for suggestion.

I've refactored NewSimpleProvider to take availableSysCtls []string instead of nodeLister.

…s instead of the current node kernel version Signed-off-by: jubittajohn <jujohn@redhat.com>

openshift-ci · 2025-12-01T21:18:05Z

@jubittajohn: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

ibihim · 2026-01-06T17:08:34Z

pkg/securitycontextconstraints/sccmatching/matcher.go


 	// Create the provider
-	provider, err := NewSimpleProvider(constraint)
+	provider, err := NewSimpleProvider(constraint, sysctl.SafeSysctlAllowlist(nodeLister))


This calculation doesn't change per request, right? So we could move it out of the for-loop, right?

Due to the fact that CreateProviderFromConstraint is in a for loop that iterates on sccs, we have:

O(SCCs * Nodes)

But we could have

O(SCCs) + O(Nodes)

If we move the check out of the for-loop.

It would be good to figure out how many nodes big clusters have as we could improve the performance drastically on clusters with plenty of Nodes.
If we cache the result and add an event listener that is being updated on Add, Update and Delete events of the nodeInformer.

This would create a O(SCCs) + O(1) situation per request.

With 10k Nodes and 10 SCCs those numbers change drastically:

Current: 100k

Outside the for-loop: 10,010

Cache: 10 + per Node change with "per Node change" being a lot smaller than "per Pod admission"

ibihim · 2026-01-06T19:05:26Z

pkg/securitycontextconstraints/sysctl/mustmatchpatterns.go

+	}
+
+	if minVersion == nil {
+		return nil, fmt.Errorf("no worker nodes found")


This could happen, if parsing the kernal for all nodes fail, right?
We check also all nodes, not just "worker nodes".

So we could add a check for the len(nodes) != 0 in the error handling to distinguish between "no nodes found" and "couldn't parse kernel version(s)"

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 29, 2025

jubittajohn mentioned this pull request Sep 29, 2025

OCPBUGS-61679: pkg:add missing safe sysctls to list of SafeSysctlAllowlist #149

Open

jubittajohn force-pushed the fix-sysctl-kernel-mismatch branch from 1e59723 to 64fa496 Compare November 3, 2025 21:04

jubittajohn mentioned this pull request Nov 3, 2025

OCPBUGS-58313: apiserver library go bump openshift/kubernetes#2506

Draft

benluddy reviewed Nov 4, 2025

View reviewed changes

jubittajohn force-pushed the fix-sysctl-kernel-mismatch branch from 64fa496 to 0805e19 Compare November 5, 2025 18:43

jubittajohn changed the title ~~Admit sysctls based on the worker node kernel version instead of the current node kernel version~~ OCPBUGS-58313: Admit sysctls based on the worker node kernel version instead of the current node kernel version Nov 5, 2025

jubittajohn requested a review from benluddy November 7, 2025 18:28

jubittajohn marked this pull request as ready for review November 18, 2025 17:08

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2025

openshift-ci bot requested review from ibihim and liouk November 18, 2025 17:16

jubittajohn force-pushed the fix-sysctl-kernel-mismatch branch from 0805e19 to b605621 Compare November 20, 2025 21:01

ibihim reviewed Nov 26, 2025

View reviewed changes

jubittajohn force-pushed the fix-sysctl-kernel-mismatch branch 2 times, most recently from 929529d to fedc70e Compare December 1, 2025 20:44

Admit sysctls based on the minimum kernel version across all the node…

a39660d

…s instead of the current node kernel version Signed-off-by: jubittajohn <jujohn@redhat.com>

jubittajohn force-pushed the fix-sysctl-kernel-mismatch branch from fedc70e to a39660d Compare December 1, 2025 21:07

jubittajohn requested a review from ibihim December 2, 2025 06:44

ibihim reviewed Jan 6, 2026

View reviewed changes

		// The kernel version here refers to that of the worker nodes, since sysctl
		// settings apply to pods running on worker nodes rather than control plane nodes

Conversation

jubittajohn commented Sep 29, 2025

Uh oh!

openshift-ci bot commented Sep 29, 2025

Uh oh!

openshift-ci bot commented Sep 29, 2025

Uh oh!

Uh oh!

benluddy Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jubittajohn Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

jubittajohn Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Nov 5, 2025

Uh oh!

jubittajohn commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ibihim Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

ibihim Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

ibihim Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jubittajohn Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Dec 1, 2025

Uh oh!

ibihim Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ibihim Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jubittajohn commented Nov 5, 2025 •

edited

Loading

ibihim Nov 26, 2025 •

edited

Loading

ibihim Jan 6, 2026 •

edited

Loading