-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify all health checks to be specified via enums #2078
Conversation
The set of health checks to be executed were dependent on a combination of check enums and boolean options. This change modifies the health checks to be governed strictly by a set of enums. Next steps: - tightly couple category IDs to names - tightly couple checks to their parent categories - programmatic control over check ordering Signed-off-by: Andrew Seigner <[email protected]>
Signed-off-by: Andrew Seigner <[email protected]>
The `linkerd check` command organized the various checks via loosely coupled category IDs, category names, and checkers themselves, all with ordering defined by consumers of this code. This change removes category IDs in favor of category names, groups all checkers by category, and enforces ordering at the `HealthChecker` level. Part of #1471, depends on #2078. Signed-off-by: Andrew Seigner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐️ This is great! Much more easy to reason about now that those boolean variables are gone.
}, | ||
}) | ||
|
||
// TODO: refactor with LinkerdPreInstallSingleNamespaceChecks | ||
roleType := "ClusterRole" | ||
roleBindingType := "ClusterRoleBinding" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that's you've split the RBAC checks into multiple separate methods, I think it's clearer to hardcode everything, rather than worrying about code reuse. I'm inclined to just remove these local vars. Something like:
diff --git a/pkg/healthcheck/healthcheck.go b/pkg/healthcheck/healthcheck.go
index 24b9722e..31c99ce2 100644
--- a/pkg/healthcheck/healthcheck.go
+++ b/pkg/healthcheck/healthcheck.go
@@ -316,23 +316,19 @@ func (hc *HealthChecker) addLinkerdPreInstallClusterChecks() {
},
})
- // TODO: refactor with LinkerdPreInstallSingleNamespaceChecks
- roleType := "ClusterRole"
- roleBindingType := "ClusterRoleBinding"
-
hc.checkers = append(hc.checkers, &checker{
category: LinkerdPreInstallClusterCategory,
- description: fmt.Sprintf("can create %ss", roleType),
+ description: "can create ClusterRoles",
check: func() error {
- return hc.checkCanCreate("", "rbac.authorization.k8s.io", "v1beta1", roleType)
+ return hc.checkCanCreate("", "rbac.authorization.k8s.io", "v1beta1", "ClusterRole")
},
})
hc.checkers = append(hc.checkers, &checker{
category: LinkerdPreInstallClusterCategory,
- description: fmt.Sprintf("can create %ss", roleBindingType),
+ description: "can create ClusterRoleBindings",
check: func() error {
- return hc.checkCanCreate("", "rbac.authorization.k8s.io", "v1beta1", roleBindingType)
+ return hc.checkCanCreate("", "rbac.authorization.k8s.io", "v1beta1", "ClusterRoleBinding")
},
})
Same goes for the checks in the addLinkerdPreInstallSingleNamespaceChecks
func.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh, i did exactly that in the next PR: https://github.com/linkerd/linkerd2/pull/2080/files#diff-d4056ff163bcf2aeacefb2a34164563cR270
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, yep, carry on!
The linkerd check command organized the various checks via loosely coupled category IDs, category names, and checkers themselves, all with ordering defined by consumers of this code. This change removes category IDs in favor of category names, groups all checkers by category, and enforces ordering at the HealthChecker level. Part of #1471, depends on #2078. Signed-off-by: Andrew Seigner <[email protected]>
The linkerd check command organized the various checks via loosely coupled category IDs, category names, and checkers themselves, all with ordering defined by consumers of this code. This change removes category IDs in favor of category names, groups all checkers by category, and enforces ordering at the HealthChecker level. Part of #1471, depends on #2078. Signed-off-by: Andrew Seigner <[email protected]>
In 2.13, the default inbound and outbound HTTP request queue capacity decreased from 10,000 requests to 100 requests (in PR #2078). This change results in proxies shedding load much more aggressively while under high load to a single destination service, resulting in increased error rates in comparison to 2.12 (see #11055 for details). This commit changes the default HTTP request queue capacities for the inbound and outbound proxies back to 10,000 requests, the way they were in 2.12 and earlier. In manual load testing I've verified that increasing the queue capacity results in a substantial decrease in 503 Service Unavailable errors emitted by the proxy: with a queue capacity of 100 requests, the load test described [here] observed a failure rate of 51.51% of requests, while with a queue capacity of 10,000 requests, the same load test observes no failures. Note that I did not modify the TCP connection queue capacities, or the control plane request queue capacity. These were previously configured by the same variable before #2078, but were split out into separate vars in that change. I don't think the queue capacity limits for TCP connection establishment or for control plane requests are currently resulting in instability the way the decreased request queue capacity is, so I decided to make a more focused change to just the HTTP request queues for the proxies. [here]: #11055 (comment) --- * Increase HTTP request queue capacity (linkerd/linkerd2-proxy#2449) Signed-off-by: Eliza Weisman <[email protected]>
In 2.13, the default inbound and outbound HTTP request queue capacity decreased from 10,000 requests to 100 requests (in PR #2078). This change results in proxies shedding load much more aggressively while under high load to a single destination service, resulting in increased error rates in comparison to 2.12 (see #11055 for details). This commit changes the default HTTP request queue capacities for the inbound and outbound proxies back to 10,000 requests, the way they were in 2.12 and earlier. In manual load testing I've verified that increasing the queue capacity results in a substantial decrease in 503 Service Unavailable errors emitted by the proxy: with a queue capacity of 100 requests, the load test described [here] observed a failure rate of 51.51% of requests, while with a queue capacity of 10,000 requests, the same load test observes no failures. Note that I did not modify the TCP connection queue capacities, or the control plane request queue capacity. These were previously configured by the same variable before #2078, but were split out into separate vars in that change. I don't think the queue capacity limits for TCP connection establishment or for control plane requests are currently resulting in instability the way the decreased request queue capacity is, so I decided to make a more focused change to just the HTTP request queues for the proxies. [here]: #11055 (comment) --- * Increase HTTP request queue capacity (linkerd/linkerd2-proxy#2449)
In 2.13, the default inbound and outbound HTTP request queue capacity decreased from 10,000 requests to 100 requests (in PR #2078). This change results in proxies shedding load much more aggressively while under high load to a single destination service, resulting in increased error rates in comparison to 2.12 (see #11055 for details). This commit changes the default HTTP request queue capacities for the inbound and outbound proxies back to 10,000 requests, the way they were in 2.12 and earlier. In manual load testing I've verified that increasing the queue capacity results in a substantial decrease in 503 Service Unavailable errors emitted by the proxy: with a queue capacity of 100 requests, the load test described [here] observed a failure rate of 51.51% of requests, while with a queue capacity of 10,000 requests, the same load test observes no failures. Note that I did not modify the TCP connection queue capacities, or the control plane request queue capacity. These were previously configured by the same variable before #2078, but were split out into separate vars in that change. I don't think the queue capacity limits for TCP connection establishment or for control plane requests are currently resulting in instability the way the decreased request queue capacity is, so I decided to make a more focused change to just the HTTP request queues for the proxies. [here]: #11055 (comment) --- * Increase HTTP request queue capacity (linkerd/linkerd2-proxy#2449)
Modify all health checks to be specified via enums
The set of health checks to be executed were dependent on a combination
of check enums and boolean options.
This change modifies the health checks to be governed strictly by a set
of enums. This change does not add or remove any checks, but rather
moves checks into more granular categories, such that any set of checks
that are toggle-able are defined together under a single category.
This is a first step in cleaning up the
linkerd check
code, and moving towards #1471.Next steps:
Signed-off-by: Andrew Seigner [email protected]