Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write audit policy file for GCE/GKE configuration #46897

Merged
merged 1 commit into from
Jun 6, 2017

Conversation

timstclair
Copy link

@timstclair timstclair commented Jun 3, 2017

Setup the audit policy configuration for GCE & GKE. Here is the high level summary of the policy:

  • Default logging everything at Metadata
  • Known write APIs default to RequestResponse
  • Known read-only APIs default to Request
  • Except secrets & configmaps are logged at Metadata
  • Don't log events
  • Don't log /version, swagger or healthchecks

In addition to the above, I spent time analyzing the noisiest lines in the audit log from a cluster that soaked for 24 hours (and ran a batch of e2e tests). Of those top requests, those that were identified as low-risk (all read-only, except update kube-system endpoints by controllers) are dropped.

I suspect we'll want to tweak this a bit more once we've had a time to soak it on some real clusters.

For kubernetes/enhancements#22

/cc @sttts @ericchiang

@timstclair timstclair added area/platform/gce area/provider/gcp Issues or PRs related to gcp provider release-note-none Denotes a PR that doesn't merit a release note. labels Jun 3, 2017
@timstclair timstclair added this to the v1.7 milestone Jun 3, 2017
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 3, 2017
@k8s-ci-robot k8s-ci-robot requested review from sttts and ericchiang June 3, 2017 01:21
@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 3, 2017
Copy link
Contributor

@ericchiang ericchiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. It'll be great for other deployment tools to reference when they go to turn this on.

- group: "" # core
resources: ["endpoints", "services"]
- level: None
users: ["system:unsecured"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

system:unsecured is high-volume? Isn't that the insecure port?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps 'high-volume or low-risk'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment and filed #46983. If you'd prefer I remove this until the issue is fixed I can do that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW - This specific request is very high volume. I left an empty cluster sitting over the weekend, and this was the 5th highest request category:

$ sqlite3 -header ~/logs/audit-policy/super-audit-2.db "SELECT SUM(count) AS count, user, verb, apigroup, namespace, resource, name, subresource FROM audit WHERE response != '403' GROUP BY user, verb, apigroup, resource, name, subresource ORDER BY count DESC LIMIT 5;" | column -tn -s "|"
count  user                                                     verb    apigroup                   namespace                     resource                name                                subresource
89069  system:kube-controller-manager                           update  core                       kube-system                   endpoints               kube-controller-manager             
89067  system:kube-controller-manager                           get     core                       kube-system                   endpoints               kube-controller-manager             
89063  system:kube-scheduler                                    update  core                       kube-system                   endpoints               kube-scheduler                      
89061  system:kube-scheduler                                    get     core                       kube-system                   endpoints               kube-scheduler                      
71446  system:unsecured                                         get     core                       kube-system                   configmaps              ingress-uid                         

(Using https://github.com/timstclair/kube-contrib/blob/devel/devel/scripts/auditdb.go for the database)

Copy link
Contributor

@ericchiang ericchiang Jun 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Audit logging already paying off!

Let's just wait for #46983

EDIT. I mean let's just merge as is and have the fix for #46983 change it.

- group: "" # core
resources: ["configmaps"]
- level: None
users: ["kubelet"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With TLS bootstrapping that will start giving kubelet credentials unique usernames, maybe target the kubelet group instead of the username?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Left this for legacy upgrades though.

- group: "" # core
resources: ["events"]

# Secrets & ConfigMaps can contain sensitive & binary data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to eventually log configmap data for auditing config pushes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a couple problems with logging configmap data:

  1. It could be arbitrarily large, which might not play nice with the auditing backend
  2. It could be binary data. Not explicitly a problem, but makes the audit logs more difficult to process
  3. Could contain arbitrary user data, such as PII

resources: ${known_apis}
# Default level for known APIs
- level: RequestResponse
resources: ${known_apis}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I have to be reminded but this works with the rule above it because events go through this list in order until they match something right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

@ikehz
Copy link
Contributor

ikehz commented Jun 5, 2017

lgtm

Copy link
Author

@timstclair timstclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ericchiang

- group: "" # core
resources: ["endpoints", "services"]
- level: None
users: ["system:unsecured"]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment and filed #46983. If you'd prefer I remove this until the issue is fixed I can do that.

- group: "" # core
resources: ["configmaps"]
- level: None
users: ["kubelet"]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Left this for legacy upgrades though.

resources: ${known_apis}
# Default level for known APIs
- level: RequestResponse
resources: ${known_apis}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

@ericchiang
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 5, 2017
@timstclair
Copy link
Author

timstclair commented Jun 5, 2017

FYI, here is some data on the dropped requests. This comes from an unloaded 3-node cluster running for 3 days, plus a single default-suite e2e run. The logs were analyzed using https://github.com/timstclair/kube-contrib/blob/devel/devel/scripts/auditdb.go (only looks at RequestComplete stage)

Total requests: 1,294,914
Requests caught by rules:

  • kube-proxy rule: 2,388
  • ingress (unsecured) rule: 74,430
  • kubelet rule: 71,409
  • controller-manager/scheduler/endpoint-controller: 412,174
  • apiserver rule: 24,169
    Total dropped requests: 584,570 (45% of total)

And for fun:

sqlite> SELECT SUM(count) AS count FROM audit WHERE user = 'system:anonymous' AND response IN ('401', '403');
count
6612

I.e. 6612 probing requests from the anonymous internet

@mikedanese
Copy link
Member

/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ericchiang, mikedanese, timstclair

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2017
@timstclair
Copy link
Author

Squashed. Reapplying LGTM.

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 5, 2017
@timstclair timstclair added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 5, 2017
# TODO(#46983): Change this to the ingress controller service account.
users: ["system:unsecured"]
namespaces: ["kube-sytem"]
verbs: ["get"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ingress controller also creates a config map.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create is low-volume (I only saw a single request) and also has higher security implications, so I'd rather not drop it here.

@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 5, 2017
@timstclair
Copy link
Author

@k8s-bot pull-kubernetes-e2e-kops-aws test this

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-ci-robot
Copy link
Contributor

@timstclair: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-unit 00d52ed link @k8s-bot pull-kubernetes-unit test this
pull-kubernetes-e2e-gce-etcd3 00d52ed link @k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 46897, 46899, 46864, 46854, 46875)

@k8s-github-robot k8s-github-robot merged commit ea4764b into kubernetes:master Jun 6, 2017
@ericchiang ericchiang mentioned this pull request Jul 26, 2017
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/platform/gce area/provider/gcp Issues or PRs related to gcp provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants