-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write audit policy file for GCE/GKE configuration #46897
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. It'll be great for other deployment tools to reference when they go to turn this on.
cluster/gce/gci/configure-helper.sh
Outdated
- group: "" # core | ||
resources: ["endpoints", "services"] | ||
- level: None | ||
users: ["system:unsecured"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
system:unsecured is high-volume? Isn't that the insecure port?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps 'high-volume or low-risk'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment and filed #46983. If you'd prefer I remove this until the issue is fixed I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW - This specific request is very high volume. I left an empty cluster sitting over the weekend, and this was the 5th highest request category:
$ sqlite3 -header ~/logs/audit-policy/super-audit-2.db "SELECT SUM(count) AS count, user, verb, apigroup, namespace, resource, name, subresource FROM audit WHERE response != '403' GROUP BY user, verb, apigroup, resource, name, subresource ORDER BY count DESC LIMIT 5;" | column -tn -s "|"
count user verb apigroup namespace resource name subresource
89069 system:kube-controller-manager update core kube-system endpoints kube-controller-manager
89067 system:kube-controller-manager get core kube-system endpoints kube-controller-manager
89063 system:kube-scheduler update core kube-system endpoints kube-scheduler
89061 system:kube-scheduler get core kube-system endpoints kube-scheduler
71446 system:unsecured get core kube-system configmaps ingress-uid
(Using https://github.com/timstclair/kube-contrib/blob/devel/devel/scripts/auditdb.go for the database)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster/gce/gci/configure-helper.sh
Outdated
- group: "" # core | ||
resources: ["configmaps"] | ||
- level: None | ||
users: ["kubelet"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With TLS bootstrapping that will start giving kubelet credentials unique usernames, maybe target the kubelet group instead of the username?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Left this for legacy upgrades though.
- group: "" # core | ||
resources: ["events"] | ||
|
||
# Secrets & ConfigMaps can contain sensitive & binary data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to eventually log configmap data for auditing config pushes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a couple problems with logging configmap data:
- It could be arbitrarily large, which might not play nice with the auditing backend
- It could be binary data. Not explicitly a problem, but makes the audit logs more difficult to process
- Could contain arbitrary user data, such as PII
resources: ${known_apis} | ||
# Default level for known APIs | ||
- level: RequestResponse | ||
resources: ${known_apis} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I have to be reminded but this works with the rule above it because events go through this list in order until they match something right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ericchiang
cluster/gce/gci/configure-helper.sh
Outdated
- group: "" # core | ||
resources: ["endpoints", "services"] | ||
- level: None | ||
users: ["system:unsecured"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment and filed #46983. If you'd prefer I remove this until the issue is fixed I can do that.
cluster/gce/gci/configure-helper.sh
Outdated
- group: "" # core | ||
resources: ["configmaps"] | ||
- level: None | ||
users: ["kubelet"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Left this for legacy upgrades though.
resources: ${known_apis} | ||
# Default level for known APIs | ||
- level: RequestResponse | ||
resources: ${known_apis} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
/lgtm |
FYI, here is some data on the dropped requests. This comes from an unloaded 3-node cluster running for 3 days, plus a single default-suite e2e run. The logs were analyzed using https://github.com/timstclair/kube-contrib/blob/devel/devel/scripts/auditdb.go (only looks at Total requests: 1,294,914
And for fun:
I.e. 6612 probing requests from the anonymous internet |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ericchiang, mikedanese, timstclair The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Squashed. Reapplying LGTM. |
# TODO(#46983): Change this to the ingress controller service account. | ||
users: ["system:unsecured"] | ||
namespaces: ["kube-sytem"] | ||
verbs: ["get"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ingress controller also creates a config map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The create is low-volume (I only saw a single request) and also has higher security implications, so I'd rather not drop it here.
@k8s-bot pull-kubernetes-e2e-kops-aws test this |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
@timstclair: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Automatic merge from submit-queue (batch tested with PRs 46897, 46899, 46864, 46854, 46875) |
Setup the audit policy configuration for GCE & GKE. Here is the high level summary of the policy:
Metadata
RequestResponse
Request
Metadata
/version
, swagger or healthchecksIn addition to the above, I spent time analyzing the noisiest lines in the audit log from a cluster that soaked for 24 hours (and ran a batch of e2e tests). Of those top requests, those that were identified as low-risk (all read-only, except update kube-system endpoints by controllers) are dropped.
I suspect we'll want to tweak this a bit more once we've had a time to soak it on some real clusters.
For kubernetes/enhancements#22
/cc @sttts @ericchiang