Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sig-api-machinery: Add scale targets to CRDs to GA KEP #1015

Merged
merged 3 commits into from
Jul 29, 2019

Conversation

jpbetz
Copy link
Contributor

@jpbetz jpbetz commented Apr 29, 2019

This proposes that our scale targets for CRD to GA be based on API call latency SLIs/SLOs and kubernetes thresholds.

For details on scalability data gathered, see: https://docs.google.com/document/d/1tEstPQvzGvaRnN-WwGUWx1H9xHPRCy_fFcGlgTkB3f8/edit#

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 29, 2019
@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 29, 2019

@roycaihw @liggitt @sttts We're still gathering some data, but early feedback welcome!

@liggitt
Copy link
Member

liggitt commented Apr 29, 2019

cc @wojtek-t @kubernetes/sig-scalability-misc

@k8s-ci-robot k8s-ci-robot added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label Apr 29, 2019
@deads2k deads2k mentioned this pull request Apr 29, 2019
keps/sig-api-machinery/20180415-crds-to-ga.md Outdated Show resolved Hide resolved
keps/sig-api-machinery/20180415-crds-to-ga.md Outdated Show resolved Hide resolved
keps/sig-api-machinery/20180415-crds-to-ga.md Outdated Show resolved Hide resolved
keps/sig-api-machinery/20180415-crds-to-ga.md Outdated Show resolved Hide resolved
@liggitt
Copy link
Member

liggitt commented Apr 30, 2019

the feedback from @wojtek-t was helpful. I think the approach should be:

  • align with native SLI/SLO
  • make it clear the overhead of the conversion webhook is either explicitly excluded or capped at Xms per call (and make sure we can measure/compensate for that in the scale tests that ensure we hit our targets)
  • make the variables for our tests and claimed scale for a particular custom resource type:
    • the resource size
    • max number of custom resource instances per namespace
    • max number of custom resource instances cluster-wide

@liggitt
Copy link
Member

liggitt commented Apr 30, 2019

I'd still like to see real-world feedback inform those max instance size and count-per-namespace and count-per-cluster numbers.

In the absence of that, setting up initial scale targets along the lines of O(64KB), O(1000) per namespace, O(10000) per cluster seems reasonable in a hand-wavy way, given other count targets in https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md

@jpbetz
Copy link
Contributor Author

jpbetz commented May 2, 2019

I've rewritten this section to complement https://github.com/kubernetes/community/blob/master/sig-scalability/slos/api_call_latency.md and https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md. The actual thresholds and max resource size are not finalized, but I've penciled in "O(10KB), O(2000) per namespace, O(10000) per cluster" and we can substitute in better value once we've nailed them down.

I agree read world feedback would be valuable, maybe there is a group we could survey? I'll ask around.

@jpbetz jpbetz force-pushed the ga-kep-scalability-links branch 3 times, most recently from 0620d46 to edb4659 Compare May 2, 2019 18:49
@smarterclayton
Copy link
Contributor

Some known CRD scale factors:

  1. Clusters with 100-200 CRDs (fixed base), many of which have 1-5 CR instances.
  2. Average size of CRs 2KB, max 150KB (very large CRs which contain structured info that might represent interconnected components and chunks of text)
  3. Some high churn CRDs like ProwJobs on a cluster with 10k-30k instances in a single namespace, frequent lists, infrequent gets and updates.
  4. Large numbers of watches on many different CRDs for common / shared config. Example, cluster configuration represented as 30-40 CRD, of which there are 20+ controllers each having single item CR watches on some of those.

@liggitt
Copy link
Member

liggitt commented May 2, 2019

  1. Clusters with 100-200 CRDs (fixed base), many of which have 1-5 CR instances.

That's a good dimension (number of distinct custom resource types) we forgot to represent. The way the custom resource server is structured, each custom resource handler is independent, so this is no problem for the server to support, but is good to explicitly create/exercise O(100s) of CRDs in parallel in a test. The main impact I see for that would actually be on clients that pull down discovery docs for all group/version/resources, but that's not really specific to CRDs.

@smarterclayton
Copy link
Contributor

yeah any O(N) CRD things should be caught quick. Or where there is a large constant factor per CRD (like openapi).

@smarterclayton
Copy link
Contributor

smarterclayton commented May 2, 2019

For this cluster (reasonably representative of high CRD use):

$ kubectl get crd | wc -l
62
$ kubectl get crd -o json | jq -c '.' | wc -c
493222

Most of those CRDs have openapi specs with field values for explain and validation. I think it could get higher given that some may not have all the docs they need defined yet

@jpbetz jpbetz force-pushed the ga-kep-scalability-links branch from edb4659 to e191d18 Compare May 2, 2019 22:18
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the high-level, it looks fine. I would like to think more about the details, but in the worst case, I believe we could change them later too (I'm a bit overloaded now, and I don't want to block this).

keps/sig-api-machinery/20180415-crds-to-ga.md Outdated Show resolved Hide resolved
@jpbetz jpbetz force-pushed the ga-kep-scalability-links branch 3 times, most recently from 2ca9f99 to e618668 Compare July 29, 2019 22:08
@liggitt
Copy link
Member

liggitt commented Jul 29, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2019
@liggitt
Copy link
Member

liggitt commented Jul 29, 2019

/assign @lavalamp @deads2k

can be found by taking the (median) object size of the custom resource and finding
the the matching row in this table:

| Object size | Suggested Maximum Limit: scope=namespace (5s p99 SLO) | Suggested Maximum Limit: scope=cluster (30s p99 SLO) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the cluster scope have a longer SLO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to be consistent with how the SLOs for native types are defined (https://github.com/kubernetes/community/blob/master/sig-scalability/slos/api_call_latency.md#api-call-latency-slisslos-details). @wojtek-t, do you know the background?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's connected with the number of objects - listing objects is basically some constant time + something proportional to number of objects processed. And within single namespace (scope=namespace) we offficially allow much smaller number of objects.

@jpbetz jpbetz force-pushed the ga-kep-scalability-links branch from e618668 to eda69e7 Compare July 29, 2019 23:08
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2019

The above object size and suggested maximum limits in the Custom Resources per
Definition table applies to these conversion webhook SLOs. For example, for a
list request for 1500 custom resource objects that are 10k in size, the resource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the table trying to say "return a single object in 50ms, up to 1500 10k objects in 1s, 10000 10k objects in 6 seconds"?

If so, that mathematically doesn't make much sense, the latter implies that a single object must be processed in .6ms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've inlined the object count details in the webhook SLOs to make it easier to understand. The 50ms for a single object is higher to account for a request serving constant factor. I've added a note.

@jpbetz jpbetz force-pushed the ga-kep-scalability-links branch from eda69e7 to 0582af5 Compare July 29, 2019 23:30
@jpbetz jpbetz force-pushed the ga-kep-scalability-links branch from 0582af5 to 762a118 Compare July 29, 2019 23:40
@lavalamp
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jpbetz, lavalamp, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2019
@k8s-ci-robot k8s-ci-robot merged commit b636feb into kubernetes:master Jul 29, 2019
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

can be found by taking the (median) object size of the custom resource and finding
the the matching row in this table:

| Object size | Suggested Maximum Limit: scope=namespace (5s p99 SLO) | Suggested Maximum Limit: scope=cluster (30s p99 SLO) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's connected with the number of objects - listing objects is basically some constant time + something proportional to number of objects processed. And within single namespace (scope=namespace) we offficially allow much smaller number of objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.