-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Condition guidance #4521
Changes from 1 commit
3a2c17b
f0fdf1c
5cab966
511866a
49d3a5a
5f972cf
305d8d8
dd082f4
0a9d1eb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -308,17 +308,47 @@ response reduces the complexity of these clients. | ||||||||||||||||||||||
|
|||||||||||||||||||||||
##### Typical status properties | |||||||||||||||||||||||
|
|||||||||||||||||||||||
**Conditions** represent the latest available observations of an object's | |||||||||||||||||||||||
state. They are an extension mechanism intended to be used when the details of | |||||||||||||||||||||||
an observation are not a priori known or would not apply to all instances of a | |||||||||||||||||||||||
given Kind. For observations that are well known and apply to all instances, a | |||||||||||||||||||||||
regular field is preferred. An example of a Condition that probably should | |||||||||||||||||||||||
have been a regular field is Pod's "Ready" condition - it is managed by core | |||||||||||||||||||||||
controllers, it is well understood, and it applies to all Pods. | |||||||||||||||||||||||
**Conditions** provide a standard mechanism for higher-level status reporting | |||||||||||||||||||||||
from a controller. They are an extension mechanism which allows tools and other | |||||||||||||||||||||||
controllers to collect summary information about resources without needing to | |||||||||||||||||||||||
understand resource-specific status details. Conditions should complement more | |||||||||||||||||||||||
detailed information about the observed status of an object written by a | |||||||||||||||||||||||
controller, rather than replace it. For example, the "Available" condition of a | |||||||||||||||||||||||
Deployment can be determined by examining `readyReplicas`, `replicas`, and | |||||||||||||||||||||||
other properties of the Deployment. However, the "Available" condition allows | |||||||||||||||||||||||
other components to avoid duplicating the availability logic in the Deployment | |||||||||||||||||||||||
controller. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Objects may report multiple conditions, and new types of conditions may be | |||||||||||||||||||||||
added in the future or by 3rd party controllers. Therefore, conditions are | |||||||||||||||||||||||
represented using a list/slice, where all have similar structure. | |||||||||||||||||||||||
represented using a list/slice of objects, where each condition has a similar | |||||||||||||||||||||||
structure. Conditions are most useful when they follow some consistent | |||||||||||||||||||||||
conventions: | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* Conditions should be added to explicitly convey properties that users and | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really agree. I think there's room in the world for two kinds of conditions:
This assumes some sort of bridge controller which understands all the reportables and can summarize them into the consumables. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we allow Condition types to be namespaced (Tim's idea), it would be possible to summarize all namespaced conditions with a consistent polarity using a "bridge controller". Existing Conditions would need some sort of annotation to indicate their expected polarity, since they are generally mixed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there's two ideas here?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm lost on this thread - I don't know what a "bridge controller" is for? Regarding polarity, we could have an indicator, as Daniel suggests, so it is self-describing. But "good/bad" is a pretty vague proclamation. And we'd have to make it optional ANYWAY, so ... If the goal is to have machines understand arbitrary conditions, we need an indicator of normal or abnormal polarity. Is that the goal? Or do we expect that most consumers of conditions are either a) humans who can read the name and infer polarity; b) programs coded by humans to look for specific conditions with a priori known polarity? We can't simply say that Conditions are "summarizations" of other info, because they are ALSO the only extension point for external status. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to reach the point where machines can manage arbitrary Conditions. Evolving that from the current state may require additional indicator fields, depending on implementation strategy. If there are existing status contributors with mixed polarity within an object, we'll definitely need a "Polarity" field. If we perform the status aggregation in the "main" controller for the object, we can do this object-by-object and might be able to avoid Polarity. I interpreted @lavalamp 's idea of a "bridge controller" as a single controller which could operate across all Kinds and perform this aggregation based on mechanical rules. Doing this aggregation within the object's status avoids duplicating the aggregation logic across many clients; doing this in a "bridge controller" avoids duplicating the aggregation in many controllers (which may or may not be as much of an issue). I'm not sure how many aggregated conditions are needed (vs creating additional objects for "consumable" measurements that are composite). We got away with one in Knative, but singletons always come back to greet you. The advantage of codifying the condition aggregation of "aspect of" in a controller is that it allows new controllers to contribute observations without needing to modify the logic of the main controller to recognize the new conditions. Playing with an experiment we did in Knative, it would be possible to add "Severity" and "Polarity" fields:
🗡️ - For legacy conditions, we might want to prioritize updating those that should be treated as "Error" to set a polarity, then those that should be prioritized as "Warning". If we proceed object-at-a-time, the Polarity field might be able to be dropped, but this would probably require a certain amount of explicit handling of legacy conditions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not inherently against more self-describing, extensible status, but tat's a significant change from Conditions today. Ready is a condition of its own and we don't have "virtual" conditions. So clients would need to be taught this new structure, which is obviously backwards-incompatible. CF the design around Pod There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In other words, if you want something like this, I think it needs to be totally additive (a new concept) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we update the API conventions to reflect current actual usage (encompassing core and out of core) and then suggest new usage in a separate thread? I think this PR does a good job of the former. |
|||||||||||||||||||||||
components care about rather than requiring those properties to be inferred | |||||||||||||||||||||||
from other observations. Once defined, the meaning of a Condition can not be | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe worth expanding on "meaning"? Specifically roll-up sorts of signals are valuable because consumers do not have to do look at N raw signals and make the same inference AND because that set of N raw signals can evolve into M raw signals, and the roll-up is handled automatically. See for example pod readiness gates - the raw signals got more complex, but consumers only had to worry about the "Ready" condition. In that case the internal composition of "Ready" changed, but the external semantic did not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't really enforce this in code, so users can't really rely on this. I'd therefore phrase it as a recommendation rather than requirement. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think "meaning" here is the "intent" of the Condition -- it may become possible to more accurately measure the intent. For example, adding "is there enough disk space" to the set of conditions that indicate "this Node is able to accept work" or "the Pods on this Node are likely to function properly" -- but it would not be acceptable to swap "able to accept work" intent for an "Pods function properly" intent; the latter would need to be a new Condition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My original comment was that the text could be clearer about what "meaning" means - how abstract that is vs how it is literally derived. But I think it also unclear if you mean that conditions should exclusively be used as "summaries" of other information or that "summaries" are one of several reasonable uses of conditions? And I am still looking for answers to my question on principle. Why is a "ready" pronouncement better a a condition than as a field? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My read is that conditions have consistent metadata attached to them (at least as of #1624) as to when and why they last transitioned states, including now the |
|||||||||||||||||||||||
changed arbitrarily - it becomes part of the API, and has the same backwards- | |||||||||||||||||||||||
and forwards-compatibility concerns of any other part of the API. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* Conditions should have a consistent polarity. **A "positive" polarity where | |||||||||||||||||||||||
"True" indicates normal operation and "False" indicates a failure is | |||||||||||||||||||||||
recommended.** (Note that this is an explicit reversal from earlier | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clarify this is a guide for choosing condition names and polarity, not a guide for consuming arbitrary conditions. As written, it sounds like I can write a client that assumes all conditions with a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would like to reach that point sometime in the future, but you're correct that we can't do this for a long time, if ever. Adjusted this to suggest that consistency within a resource was most important consideration, and positive polarity should be chosen if there isn't already a consistent polarity. |
|||||||||||||||||||||||
recommendations of "abnormal-true" polarity. Experience indicates that humans | |||||||||||||||||||||||
are better at interpreting the "positive" polarity.) | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* Condition names should clearly indicate what "True" and "False" mean. This | |||||||||||||||||||||||
may require adjusting condition type names, for example changing the type | |||||||||||||||||||||||
name from "QuotaExhausted" to "ResourcesAvailable". | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* The `Unknown` status is the default, and typically indicates that | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The term
|
|||||||||||||||||||||||
reconciliation has not yet finished (or that the resource state may not | |||||||||||||||||||||||
yet be observable). | |||||||||||||||||||||||
|
|||||||||||||||||||||||
* It's helpful to have a common top-level condition which summarizes more | |||||||||||||||||||||||
detailed conditions. Simple consumers may simply query the top-level | |||||||||||||||||||||||
condition. The `Ready` and `Succeeded` condition types may be used for | |||||||||||||||||||||||
long-running and bounded-execution objects, respectively. | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this also needs to be clearly oriented at the condition producer; as phrased it could be misunderstood by a simple consumer to mean that they can consistently expect to be able to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, this is the long-term situation that I'd like to steer the world towards, but you're correct that this could be read by many consumers to demand more from API implementers than is likely in the short term. Amended, PTAL. |
|||||||||||||||||||||||
|
|||||||||||||||||||||||
The `FooCondition` type for some resource type `Foo` may include a subset of the | |||||||||||||||||||||||
following fields, but must contain at least `type` and `status` fields: | |||||||||||||||||||||||
|
@@ -347,20 +377,10 @@ Use of the `Reason` field is encouraged. | ||||||||||||||||||||||
Use the `LastHeartbeatTime` with great caution - frequent changes to this field | |||||||||||||||||||||||
can cause a large fan-out effect for some resources. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Conditions should be added to explicitly convey properties that users and | |||||||||||||||||||||||
components care about rather than requiring those properties to be inferred from | |||||||||||||||||||||||
other observations. Once defined, the meaning of a Condition can not be | |||||||||||||||||||||||
changed arbitrarily - it becomes part of the API, and has the same backwards- | |||||||||||||||||||||||
and forwards-compatibility concerns of any other part of the API. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Condition status values may be `True`, `False`, or `Unknown`. The absence of a | |||||||||||||||||||||||
condition should be interpreted the same as `Unknown`. How controllers handle | |||||||||||||||||||||||
`Unknown` depends on the Condition in question. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Condition types should indicate state in the "abnormal-true" polarity. For | |||||||||||||||||||||||
example, if the condition indicates when a policy is invalid, the "is valid" | |||||||||||||||||||||||
case is probably the norm, so the condition should be called "Invalid". | |||||||||||||||||||||||
|
|||||||||||||||||||||||
The thinking around conditions has evolved over time, so there are several | |||||||||||||||||||||||
non-normative examples in wide use. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
|
@@ -374,9 +394,9 @@ and should assume an Open World. | ||||||||||||||||||||||
An example of an oscillating condition type is `Ready` (despite it running | |||||||||||||||||||||||
afoul of current guidance), which indicates the object was believed to be fully | |||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if this changes the guidance around positive/negative conditions, we can drop the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for catching this, updated. |
|||||||||||||||||||||||
operational at the time it was last probed. A possible monotonic condition | |||||||||||||||||||||||
could be `Failed`. A `True` status for `Failed` would imply failure with no | |||||||||||||||||||||||
retry. An object that was still active would generally not have a `Failed` | |||||||||||||||||||||||
condition. | |||||||||||||||||||||||
could be `Succeeded`. A `True` status for `Succeeded` would imply completion | |||||||||||||||||||||||
and that the resource was no longer active. An object that was still active | |||||||||||||||||||||||
would generally not have a `Succeeded` condition. | |||||||||||||||||||||||
|
|||||||||||||||||||||||
Some resources in the v1 API contain fields called **`phase`**, and associated | |||||||||||||||||||||||
`message`, `reason`, and other status fields. The pattern of using `phase` is | |||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this really inverts the previous guidance, which was sort of agonized over (and with which I happen to agree). Available (or Ready in the previous edit) are omnipresent and (I think?) are never "unknown".
Why are those correct as Conditions rather than fields? What principal guides one to decide between those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have only one argument to make - it's this one. As written, many status fields should just become conditions, and I don't think that's right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were going to have a distinct subresource for conditions, which we could RBAC distinctly, then I could see an argument forming around it. That doesn't answer the question for "core" conditions like ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Tim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at Deployment, the current set of Conditions are a bit subtle for a newcomer:
Oddly, a deployment which does not have any changes in flight is "Progressing=True", which suggests a positive polarity, but maybe a poor name choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd need to look at those more carefully before I could agree that Deployment's conditions are a good example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this speaks to the question of principle - what is the litmus test that helps me decide - field or condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose the following guidance:
Expose the underlying measurements used to make decisions as fields.
If the outcome of that decision is interesting to a human (roughly: does it affect the behavior or functionality of the object in an externally observable way), provide a single Condition which summarizes whether that aspect of the controller algorithm is operating correctly ("within the envelope").
If a controller is adding status to an object and cannot modify the object to set fields, it must summarize the decision and record it in a Condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditions summarize characteristics of object lifecycle that must be generic across multiple types, have a need to convey both state and human message, and can be extended by third parties to add more nuance to existing lifecycle.
An object in Kube is a distributed state machine. The state specified to that object should always be fields. Some objects have consistent fields that describe status of the state machine. One of those is observedGeneration. Another one is conditions. Some conditions define generic, multi-object state machine characteristics like Available, Progressing, Ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part about "must be generic across multiple types" may be true for core kubernetes types, but it does not appear to be true in the general cases of non-core types.
Since people use these guidelines for non-core types, I think we need to acknowledge how they are used outside of core.
I've looked at 364 different .go files and found that most conditions are only used in a single file. By inspection it appears that many of those are specific to a single kind.
Here is a list:
https://gist.github.com/erictune/47d5bfb0fd0c1a7273ecaa5a9fcf5d52#file-topconditions-txt
Details on the method at https://gist.github.com/erictune/47d5bfb0fd0c1a7273ecaa5a9fcf5d52