Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Quickstart for Go-based Operators, kind=Memcached fails #4364

Closed
bentito opened this issue Jan 5, 2021 · 9 comments · Fixed by #4535
Closed

docs: Quickstart for Go-based Operators, kind=Memcached fails #4364

bentito opened this issue Jan 5, 2021 · 9 comments · Fixed by #4535
Assignees
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.
Milestone

Comments

@bentito
Copy link
Contributor

bentito commented Jan 5, 2021

https://sdk.operatorframework.io/docs/building-operators/golang/quickstart/

The underlying operatorsdk --kind=Memcached called out in the Quickstart docs, Memcached, fails to deploy with an error like this:

│   Warning  FailedCreate  18s (x13 over 39s)  replicaset-controller  Error creating: pods "memcached-operator-controller-manager-74cd5fb996-" is forbidden: unable to validate │
│  against any security context constraint: [spec.containers[0].securityContext.runAsUser: Invalid value: 65532: must be in the ranges: [1000600000, 1000609999] spec.cont │
│ ainers[1].securityContext.runAsUser: Invalid value: 65532: must be in the ranges: [1000600000, 1000609999]]

This is likely due to changes in the gcr.io/distroless/static:nonroot image. Was able to fix the instantiated file config/manager/manager.yaml by removing runAsUser and adding runAsNonRoot: true to the spec.securityContext.

I'm not sure this is really a docs issue so much as at this point I'm researching where operator-sdk goes for "kinds" such that I might file an issue on the Memcached kind there instead. But it is true that the Quickstart doc currently doesn't produce a working example due to the problem.

@camilamacedo86
Copy link
Contributor

camilamacedo86 commented Jan 6, 2021

Hi @bentito,

The error faced "Invalid value: 65532: must be in the ranges: [1000600000, 1000609999]]" is specific to the OCP environment.

Then, by applying your solution "by removing runAsUser" you are raising security concerns.

Could you please try to change the user id in the Dockerfile and manager manifests for a value in the range [1000600000, 1000609999]] and let us know?

If it works well as expected, we might need to push this change to upstream https://github.com/kubernetes-sigs/kubebuilder/search?q=65532.

c/c @estroz @jmrodri

@bentito
Copy link
Contributor Author

bentito commented Jan 6, 2021

I tried changing the UID, and I agree on security concern of my proposed workaround, but the UID range changes each time, it's a moving target, you can't guess it correctly every time. I think the problem must lie in the gcr.io/distroless/static:nonroot image assuming this memcached example used to work.

@camilamacedo86
Copy link
Contributor

camilamacedo86 commented Jan 6, 2021

Hi @bentito,

Sorry. I missed that you are using runAsNonRoot: true which shows enough to solve the security concern over it. So, your workaround shows OK.

The reason for the specific used ID be set shows to be because of the K8s security police issue "container has runAsNonRoot and the image has non-numeric user (nonroot), cannot verify user is non-root" raised by the k8s code implementation. More info: "https://stackoverflow.com/questions/49720308/kubernetes-podsecuritypolicy-set-to-runasnonroot-container-has-runasnonroot-and

It was introduced in upstream in the PR: kubernetes-sigs/kubebuilder#1635. However, as @joelanford spoke with me shows that "specifying a specific UID doesn't work in OCP (and possibly other non-upstream k8s distros)".

By checking the K8S code implementation is the master branch the check still in place. Because of this, I do not think that we should change the upstream/kubebuilder implementation. However, I think that we can:

  • To help others that might face the same issue we can add your workaround in the FAQ. See https://sdk.operatorframework.io/docs/faqs/. Would you like to contribute with and push a PR?
  • Check why OCP does not work in the same way and what would be the recommendation or if its a bug for then check what would be the best approach to dealing with.

WDYT @jmrodri @estroz ?

@bentito
Copy link
Contributor Author

bentito commented Jan 7, 2021

PR adding specific workaround to the FAQ is: #4368

@estroz
Copy link
Member

estroz commented Jan 11, 2021

Before adding this to the FAQ I'd like to see this addressed upstream, i.e. is there another workaround that will work for both OCP and vanilla k8s.

/triage needs-information

@openshift-ci-robot openshift-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jan 11, 2021
@estroz estroz added this to the Backlog milestone Jan 11, 2021
@estroz estroz modified the milestones: Backlog, v1.5.0 Jan 11, 2021
@jmrodri
Copy link
Member

jmrodri commented Jan 27, 2021

Before adding this to the FAQ I'd like to see this addressed upstream, i.e. is there another workaround that will work for both OCP and vanilla k8s.

My understanding is if you specify a specific user on OpenShift it will attempt to honor that setting. So a runAsUser set to 65532 will cause OpenShift to try and use user id 65532 which does not exist. So we have to NOT set it so that OpenShift will automatically use a random user id.

My understanding was for upstream we needed to add runAsUser to avoid running as root upstream. The best solution would be an OpenShift plugin that would fix this when targeting OpenShift.

@estroz
Copy link
Member

estroz commented Jan 27, 2021

Yeah that sounds correct. My comment is referring to runAsUser: 65532 being redundant, since the default value is taken from container metadata, which sets this value to 65532 already upstream. Therefore we should be able to replace runAsUser: 65532 with runAsNonRoot: true upstream and not have to add this workaround.

@estroz
Copy link
Member

estroz commented Jan 27, 2021

@estroz
Copy link
Member

estroz commented Feb 3, 2021

The migration guide for #4402 should include a section for changing this value, since it is fixed upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants