Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change/add params to support CIS scan #1295

Merged
merged 3 commits into from
Apr 14, 2021

Conversation

felipeac
Copy link
Contributor

Summary

Issue number #1294

This PR update configs to achieve good results on a CIS scan using kube-bench.

These were my results:

After update the kubelet location on kube-bench
Issue opened on kube-bench: aquasecurity/kube-bench#808

== Summary node ==
12 checks PASS
2 checks FAIL
1 checks WARN
0 checks INFO

== Summary total ==
12 checks PASS
2 checks FAIL
1 checks WARN
0 checks INFO

After building a new Bottlerocket AMI with the changes requested in this PR

== Summary node ==
14 checks PASS
0 checks FAIL
1 checks WARN
0 checks INFO

== Summary total ==
14 checks PASS
0 checks FAIL
1 checks WARN
0 checks INFO

Testing done:
I've only tested it by deploying a new image and running the CIS scan, but I'm more than happy to run any required tests.

Important: I've only validated those changes in an EKS cluster running Kubernetes 1.18. Eventually, if this PR gets approved, I'll apply and validate the same change for different versions.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

PS: Thanks to @gregdek for the support

@jhaynes jhaynes added the status/needs-triage Pending triage or re-evaluation label Jan 28, 2021
@tjkirch tjkirch requested a review from bcressey January 28, 2021 17:35
kernel.panic_on_oops = 1

# Overcommit handling mode - 1: Always overcommit
vm.overcommit_memory = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any information on why the overcommit mode would need to be changed from the default of 0 to 1?

I don't know if this would be appropriate for all variants. In particular, this would be an unexpected change for the aws-ecs-1 variant, as the AL2-based ECS-optimized AMIs inherit AL2's default setting of 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any information on why the overcommit mode would need to be changed from the default of 0 to 1?

Per kubernetes/kubernetes#14935 it's done to disable the heuristics that could let a "best effort" container cause malloc to fail for a "guaranteed" container.

I don't know if this would be appropriate for all variants.

If ECS doesn't have different QoS classes then it may not apply. I tend to want consistent behavior across variants here, though. The kernel's heuristics can introduce some surprising behaviors - running containers as root or not can give different results; applications can see unexpected malloc failures even if they're within their limits.

Allowing containers to exceed their reservation and killing them when they do is often much easier for operators to reason about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ECS doesn't have different QoS classes then it may not apply.

ECS does not have different QoS classes; there are no "best effort" or "guaranteed" containers. Instead, ECS controls memory limits on containers via cgroups; every container has at least a memory limit or a memory reservation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcressey and I dug into this a bit more together. The kernel behavior for OVERCOMMIT_GUESS (0, the default) changed in Linux 5.2. The new behavior is that allocations only fail if the requested virtual memory allocation exceeds the physical ram + swap space rather than using more complicated heuristics to guess. This brings the behavior closer to OVERCOMMIT_ALWAYS (1), except that "wild allocations" will still fail.

Bottlerocket's kernel is 5.4, meaning that it includes this change. With that said, my opinion is that we should leave the kernel configured with the default behavior (OVERCOMMIT_GUESS) in all variants; the QoS issues described in kubernetes/kubernetes#14935 shouldn't happen in Bottlerocket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samuelkarp found that my understanding is out-of-date, and the kernel documentation is no longer accurate since torvalds/linux@8c7829b. There's no special treatment of root or processes with CAP_SYS_ADMIN since kernel 5.2+.

The difference is now:

  • overcommit_memory = 0 only blocks requests for more memory than is physically present in ram + swap
  • overcommit_memory = 1 allows all requests

I doubt the Kubernetes check is still relevant for newer kernels. It could be relaxed upstream to allow either 0 or 1 - but the application code would need to account for the running kernel version which is not recommended as a general practice.

The two paths I see are to patch our kubelets so they expect 0 instead of 1, or ship a sysctl snippet for kubelet so that we can apply the expected value, and take advantage of the other checks for protected settings. I prefer the second approach.

@bcressey
Copy link
Contributor

bcressey commented Feb 2, 2021

@felipeac - can you split this into a few commits based on discussion?

  1. "release: update kernel panic settings" (for the panic tuning)
  2. "kubernetes: adjust kernel overcommit settings" (for the overcommit tuning)
  3. "kubernetes: set readOnlyPort and protectKernelDefaults" (for the kubelet config changes)

The second one would add packages/kubernetes-*/kubelet-sysctl.conf and include it in all the spec files, and the third one would modify packages/kubernetes-*/kubelet-config with the new values.

@bcressey bcressey removed the status/needs-triage Pending triage or re-evaluation label Feb 6, 2021
aidy added a commit to cookpad/terraform-aws-eks that referenced this pull request Feb 9, 2021
This should be fixed properly with
aquasecurity/kube-bench#808
and
bottlerocket-os/bottlerocket#1295

But, as these are known failures, this should not be a failure in our
CI.

Workaround by patching kube-bench config for bottlerocket config paths,
and allowing for a known number of failures.
aidy added a commit to cookpad/terraform-aws-eks that referenced this pull request Feb 9, 2021
This should be fixed properly with
aquasecurity/kube-bench#808
and
bottlerocket-os/bottlerocket#1295

But, as these are known failures, this should not be a failure in our
CI.

Workaround by patching kube-bench config for bottlerocket config paths,
and allowing for a known number of failures.
@gregdek
Copy link
Contributor

gregdek commented Feb 9, 2021

Hi @felipeac just checking in to see if you need any help with the split.

@felipeac
Copy link
Contributor Author

Hi @felipeac just checking in to see if you need any help with the split.
Thanks, @gregdek . I've been busy at work. I'll split this PR later today.
CC: @bcressey

packages/kubernetes-1.15/kubelet-sysctl.conf Outdated Show resolved Hide resolved
packages/kubernetes-1.15/kubernetes-1.15.spec Outdated Show resolved Hide resolved
packages/kubernetes-1.15/kubernetes-1.15.spec Outdated Show resolved Hide resolved
@bcressey
Copy link
Contributor

bcressey commented Apr 2, 2021

@felipeac - Thanks again for your contribution here!

You've had the bad luck of touching some files - the various Kubernetes specs - that have seen more churn over the last few weeks as we've added new features.

If it helps, I'm happy to do the rebase to solve the merge conflicts, and make that last tweak from %{_cross_templatedir} to %{_cross_sysctldir}. I'd open a new pull request for it to avoid pushing to your fork, but you'd still get the credit.

Let me know if you have any objections, otherwise I'll plan to pick this up next week.

@felipeac
Copy link
Contributor Author

felipeac commented Apr 5, 2021

@bcressey - Sorry about the delay. No worries. I'll push those changes tomorrow. Thanks!

@bcressey
Copy link
Contributor

bcressey commented Apr 9, 2021

@felipeac - thanks! This looks good to me.

My lingering concern is around kernel.panic_on_oops=1 in combination with the known oops in #1435, where we might convert a soft, "harmless" failure into a more permanent one.

I need to do some more testing to see if this is going to be a problem. Worst case, this would end up blocked on the backport of the newer BPF helpers to the 5.4 kernel, which is currently in progress.

@bcressey
Copy link
Contributor

My lingering concern is around kernel.panic_on_oops=1 in combination with the known oops in #1435, where we might convert a soft, "harmless" failure into a more permanent one.

Ah, that issue is a kernel "warning" so it'd be controlled by panic_on_warn instead. I've tested locally and confirmed that the warning doesn't cause a panic.

@jhaynes jhaynes added this to the next milestone Apr 12, 2021
@jhaynes jhaynes added the area/kubernetes K8s including EKS, EKS-A, and including VMW label Apr 12, 2021
@jhaynes jhaynes added priority/p0 type/enhancement New feature or request labels Apr 12, 2021
@bcressey bcressey merged commit fa9b3ef into bottlerocket-os:develop Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubernetes K8s including EKS, EKS-A, and including VMW type/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants