Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide recommended .status.conditions schema #1624

Merged
merged 3 commits into from
May 8, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 144 additions & 0 deletions keps/sig-api-machinery/1623-standardize-conditions/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# KEP-1623: Standardize Conditions.

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Noteworthy choices](#noteworthy-choices)
- [Graduation Criteria](#graduation-criteria)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
<!-- /toc -->

## Release Signoff Checklist

<!--
**ACTION REQUIRED:** In order to merge code into a release, there must be an
issue in [kubernetes/enhancements] referencing this KEP and targeting a release
milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)
of the targeted release**.

For enhancements that make changes to code or processes/procedures in core
Kubernetes i.e., [kubernetes/kubernetes], we require the following Release
Signoff checklist to be completed.

Check these off as they are completed for the Release Team to track. These
checklist items _must_ be updated for the enhancement to be released.
-->

- [ ] Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] KEP approvers have approved the KEP status as `implementable`
- [ ] Design details are appropriately documented
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [ ] Graduation criteria is in place
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

<!--
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
-->

[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://git.k8s.io/enhancements
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

## Summary

While many Kuberentes APIs have `.status.conditions`, the schema of `condition` varies a lot between them.
There is very little commonality at the level of serialization, proto-encoding, and required vs optional.
Conditions are central enough to the API to make a common golang type with a fixed schema.
The schema can be a strong recommendation to all API authors.

## Motivation

Allow general consumers to expect a common schema for `.status.conditions` and share golang logic for common Get, Set, Is for `.status.conditions`.
The pattern is well-established and we have a good sense of the schema we now want.

### Goals

1. For all new APIs, have a common type for `.status.conditions`.
2. Provide common utility methods for `HasCondition`, `IsConditionTrue`, `SetCondition`, etc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very excited to see this proposal!

As the list of the common utility methods is still in discussion, wanted to list a couple of condition util methods we found to be very frequently used at Rancher - callbacks DoUntilTrue/Once/Do and ReasonAndMessageFromError (method setting condition fields from error obj): https://github.com/rancher/norman/blob/146f45dafa623a34864cbeeb3a39d5903daa2995/condition/condition.go#L131

3. Provide recommended defaulting functions that set required fields and can be embedded into conversion/default functions.

### Non-Goals

1. Update all existing APIs to make use of the new condition type.

## Proposal

Introduce a type into k8s.io/apimachinery/pkg/apis/meta/v1 for `Condition` that looks like
```go
type Condition struct {
// Type of condition in CamelCase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've documented CamelCase, but we've also told people to use conditions as an extension mechanism. To do that safely, we should probably support or endorse company.com/label style names...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect multiple controllers to be updating the same object's status with different Conditions?

If we assume that a single controller owns the status of the object, then I don't think we need namespaced labels for the types (since the controller should be able to ensure uniqueness).

If we expect multiple controllers to be updating status, then this is more of a problem -- I'd be concerned about other implications (controller fighting) in that design, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect multiple controllers to be updating the same object's status with different Conditions?

Yes. NPD, kubelet, and the node controller all set node conditions.

If we expect multiple controllers to be updating status, then this is more of a problem -- I'd be concerned about other implications (controller fighting) in that design, though.

The entire k8s API system is about making it safe for humans and controllers (and multiple controllers) work together on particular objects, both on their status and spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A while back we documented Conditions a bit more. The use of conditions has evolved a lot from "a single controller" to "how 3rd parties add extension status".

Case in point - pod readinessGates expect conditions to be populated to indicate "readiness" from an external POV (eg. LB programming).

I raised the prefixing at that time, but we didn't really resolve it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've documented CamelCase, but we've also told people to use conditions as an extension mechanism. To do that safely, we should probably support or endorse company.com/label style names...

That seems reasonable. company.com/CamelCase and CamelCase for first-class conditions? I do think that some are realistically more first class than others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define "first-class" ?

labels are GENERALLY not using CamelCase, but in this case I won't balk.

// +required
Type string `json:"type" protobuf:"bytes,1,opt,name=type"`
// Status of the condition, one of True, False, Unknown.
// +required
Status ConditionStatus `json:"status" protobuf:"bytes,2,opt,name=status"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're normalizing, should we consider bigger changes - like making this an optional bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're normalizing, should we consider bigger changes - like making this an optional bool?

I think we've been happy with this as an enumerated string to cover Unknown as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is happy? All I see are calls to ParseBool. "Unknown" as a value is just weird because we also have to handle a non-specified condition, which is truly unknown.

TL;DR I am not super happy with Conditions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You called this out as unresolved.

I guess I'm arguing for a larger, deeper, not really compatible change.

I think the acknowledgment of unknown has worked well. It gives an option for acknowledging awareness of a condition without the ability to inspect the current value. For instance, an inability to reach an external system results in an Unknown status for the "is-it-installed" condition. You cannot do that with the absence of a condition. I think the concern about ParseBool is well handled by an IsConditionTrue and IsConditionFalse which comments here indicate are commonly used.

If we were rebooting Condition, I would argue harder for *bool as the totally unambiguous representation. Given that you're not rebooting, just normalizing, I will waive my objection. I just wanted it on the record, I guess. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that everyone is already using "Unknown" as a value, I think we have to keep it. :/

Documentation should make clear that "Unknown" is a fact about the writer of the condition, and not a claim about the object.

Good: {Ready, Unknown} to mean: "it might or might not be ready, we don't know."

Good: {IPAssigned, Unknown} to mean: "it might or might not have gotten an IP address, we don't know."

Very bad: {IPAssigned, Unknown} to mean: "it has an IP address, but we don't know what it is."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

But also <no key "IPAssigned"> is equivalent to {IPAssigned, Unknown} for most intents.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{IPAssigned, Unknown} gives a hint to other participants that the "IPAssigned" condition may still be reconciled, while <no key "IPAssigned"> makes it more difficult for other participants to know whether IP assignment is applicable in this case. But this is mostly about the case for *bool vs bool, rather than the string enum that we have vs what we wish.

Would it be reasonable to fill in status: "" as status: "Unknown" somewhere in the apiserver side for Conditions (read side or write side)? My gut says "no", but my heart says "please".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think status: "" should be canonicalized as status: "Unknown".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Default to", or just "assumed to mean"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable to fill in status: "" as status: "Unknown" somewhere in the apiserver side for Conditions (read side or write side)? My gut says "no", but my heart says "please".

I would require condition writers to say what they mean and specify the status value.

I'd much rather have a condition-adding patch that typos "statuss":"True" get rejected with a "status is required" error than get accepted with status defaulted to "Unknown"

// Last time the condition transitioned from one status to another.
// This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
// +required
LastTransitionTime metav1.Time `json:"lastTransitionTime" protobuf:"bytes,3,opt,name=lastTransitionTime"`
liggitt marked this conversation as resolved.
Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A side note (from a project that standardized this ourselves for our CRDs):

We found that using metav1.Time added a non-trivial amount of bookkeeping using a condition update semantic of "compute the new status, then update if any conditions changed value". To avoid this, we did the following:

  1. Sort the Conditions by Type
  2. Compare the slices with equality.Semantic, ignoring LastTransitionTime
  3. Only update Conditions which were not equal.

If you're curious, see knative/serving#1663 and https://github.com/knative/pkg/blob/master/apis/volatile_time.go

// The reason for the condition's last transition in CamelCase.
// The specific API may choose whether or not this field is considered a guaranteed API.
// +required
Reason string `json:"reason" protobuf:"bytes,4,opt,name=reason"`
// A human readable message indicating details about the transition.
// This field is never considered a guaranteed API and may be empty/missing.
// +optional
Message string `json:"message,omitempty" protobuf:"bytes,5,opt,name=message"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's moot discussion, but for optional field don't we want to make the type *string?
@liggitt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's moot discussion, but for optional field don't we want to make the type *string?
@liggitt

I really dislike pointers in golang code. I'll spell this however is required to allow omiting to be acceptable and empty string to be valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Types like string "work" both as primitives and pointers for omitempty. The distinction is that without a pointer, you can't tell if the user said "" or didn't say anything at all. It probably doesn't matter here.

The last argument, then, is consistency. we have documented that optional fields be pointers. If you are saying this is a required field, but "" is allowed, that's fine, but then omitempty is probably wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last argument, then, is consistency. we have documented that optional fields be pointers. If you are saying this is a required field, but "" is allowed, that's fine, but then omitempty is probably wrong.

Ok. I'll remove omit empty and make it required with "" as allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end result here is really confusing to me. The combination of This field may be empty and +required does not make a lot of sense to me. It seems like our standard has been to use pointers for optional fields. Although I agree that pointers can be a pain to work with, I think an optional field is easier to understand than a "required" field that can be empty.

}
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to define a type for metav1.Conditions []metav1.Condition, to support adding helper methods for operating on the set of conditions.

Cribbing from this interface, we've found the following useful:

type ConditionManager interface {
	// SetCondition sets or updates the Condition on Conditions for Condition.Type.
	// If there is an update, Conditions are stored back sorted.
	SetCondition(new Condition)

	// ClearCondition removes the non terminal condition that matches the ConditionType
	ClearCondition(t ConditionType) error

	// MarkTrue sets the status of t to true, and then marks the happy condition to
	// true if all dependents are true.
	MarkTrue(t ConditionType)

	// MarkTrueWithReason sets the status of t to true with the reason, and then marks the happy
	// condition to true if all dependents are true.
	MarkTrueWithReason(t ConditionType, reason, messageFormat string, messageA ...interface{})

	// MarkUnknown sets the status of t to Unknown and also sets the happy condition
	// to Unknown if no other dependent condition is in an error state.
	MarkUnknown(t ConditionType, reason, messageFormat string, messageA ...interface{})

	// MarkFalse sets the status of t and the happy condition to False.
	MarkFalse(t ConditionType, reason, messageFormat string, messageA ...interface{})
}

The ConditionManager interface also codifies patterns around having a single top-level "happy" Condition (aka "Ready" or "Succeeded") and supports feeding dependent conditions into the top-level condition. Note that the MarkUnknown and MarkFalse methods both require reason and message fields, but MarkTrue does not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great candidate for client-go and/or controller helper libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great candidate for client-go and/or controller helper libraries.

Yeah, probably something very similar to that interface above like https://github.com/openshift/library-go/blob/master/pkg/operator/v1helpers/helpers.go#L49-L108

Copy link
Member

@MikeSpreitzer MikeSpreitzer May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evan: does your suggestion require the helper methoda to be defined in the metav1 package? If so, then it does not suffice to define an interface; we would need to also write the implementations for metav1.Conditions in metav1.


This is not strictly compatible with any of our existing conditions because of either proto ordinals,
required vs optional, or omitEmpty or not.
However, it encapsulates the best of what we've learned and will allow new APIs to have a unified type.

### Noteworthy choices
1. `lastTransitionTime` is required.
Some current implementations allow this to be missing, but this makes it difficult for consumers.
By requiring it, the actor setting the field can set it to the best possible value instead of having clients try to guess.
2. `reason` is required.
The actor setting the value should always describe why the condition is the way it is, even if that value is, unknown unknowns.
No other actor has the information to make a better choice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to mention whether reason needs to be non-empty when the condition is True. (For example, if Ready=True, the reason may be superfluous.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to mention whether reason needs to be non-empty when the condition is True. (For example, if Ready=True, the reason may be superfluous.)

It should always be non-empty. I'll update to be more clear.

3. `lastHeartbeatTime` is removed.
This field caused excessive write loads as we scaled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, other resource types named this field differently, such as lastProbeTime. I agree that the cost has outweighed the benefit, though.

If an API needs this concept, it should codify it separately and possibly using a different resource.

### Graduation Criteria

Because meta/v1 APIs are necessarily v1, this would go direct to GA.
Using a meta/v1beta1 isn't a meaningful distinction since this type is embedded into other types which own their own versions.
liggitt marked this conversation as resolved.
Show resolved Hide resolved

### Upgrade / Downgrade Strategy

This KEP isn't proposing that existing types be changed.
This means that individual upgrade/downgrade situations will be handled discretely.
By providing recommended defaulting functions, individual APIs will be able to more easily transition to the new condition type.

### Version Skew Strategy

Standard defaulting and conversion will apply.
APIs which have extra values for this type may have to go through an intermediate version that drops them or accept
that certain optional fields of their conditions will be dropped.
Depending on the individual APIs and when their extra fields are deprecated, this could be acceptable choice.

## Implementation History

## Drawbacks

1. There may be some one-time pain when new versions are created for APIs that wish to consume this common schema.
Switching is not strictly required, but it is encouraged.

## Alternatives

1. We could recommend a schema and not provide one. This doesn't seem very nice to consumers.

19 changes: 19 additions & 0 deletions keps/sig-api-machinery/1623-standardize-conditions/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
title: KEP Template
kep-number: 1623
authors:
- "@deads2k"
owning-sig: sig-apimachinery
participating-sigs:
- sig-apimachinery
- sig-apps
- sig-architecture
status: implementable
creation-date: 2020-03-23
reviewers:
- "@lavalamp"
- "@smarterclayton"
approvers:
- "@derekwaynecarr"
- "@liggitt"
see-also:
replaces: