Skip to content

Conversation

gravufo
Copy link

@gravufo gravufo commented Jan 17, 2025

Description of your changes

This PR will fix the issue where the status sub-object would be cleared when adding the Finalizer. This would cause Observe-only resources to wait until the next SyncPeriod/PollPeriod to become Ready=true even though it should have been set at the first loop.

Fixes #791

Note: I am really unsure if these are the right places to do those changes, so please don't hesitate to leave comments pointing to the right locations.

I have:

  • Read and followed Crossplane's [contribution process].
  • Run earthly +reviewable to ensure this PR is ready for review.
  • Added or updated unit tests.
  • Linked a PR or a [docs tracking issue] to [document this change].
  • Added backport release-x.y labels to auto-backport this PR.

@gravufo gravufo requested a review from a team as a code owner January 17, 2025 22:14
@gravufo gravufo requested a review from turkenh January 17, 2025 22:14
Copy link
Member

@turkenh turkenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gravufo for the PR and sorry for the late response here!

The PR looks good to me in general, except adding Statused to the managed interface feels a bit off since there is ConditionedStatus which also controls part of status and related to why unit test not working as expected (see my comment below).

If we want to proceed with the proposed approach, we will also need a controller tools PR so that all managed resources satisfy the extended interface.

Another solution could be, instead of changing managed reconciler, we could fix APIFinalizer's AddFinalizer() method such that it won't override the status. A simple deepcopy and using copied object in the Update call would be helpful, also eliminating the need to update managed interface/reconciler.

func (a *APIFinalizer) AddFinalizer(ctx context.Context, obj Object) error {
	if meta.FinalizerExists(obj, a.finalizer) {
                return nil
        }
        meta.AddFinalizer(obj, a.finalizer)
-       return errors.Wrap(a.client.Update(ctx, obj), errUpdateObject)
+       // Copy object to ensure we don't modify the original object, e.g. not to lose status.
+       o := obj.DeepCopyObject().(Object)
+       return errors.Wrap(a.client.Update(ctx, o), errUpdateObject)
 }

While this change looks simpler (and wouldn't require a crossplane tools), I am not comfortable since it will have a bigger blast radius due to APIFinalizer has other usages. If there are callers of that method assuming they got the latest object after the add finalizer call, they might get broken. For example, the next update call will fail since the object don't have the latest resource version any more 🤔 We could patch status in flight to latest meta+spec, but again things are getting weird as we are working with Objects (no schema).

Sooo, probably the direction in this PR is the best way to go providing a very scoped fix to the problem.

@@ -0,0 +1,29 @@
/*
Copyright 2019 The Crossplane Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
Copyright 2019 The Crossplane Authors.
Copyright 2025 The Crossplane Authors.

// See https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties

// A Status reflects the observed status of a resource.
type Status map[string]runtime.RawExtension
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply runtime.RawExtension? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to be able to reach in and get/set conditions directly?

},
want: want{result: reconcile.Result{Requeue: true}},
},
"AddFinalizerShouldNotClearStatus": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to comment out managed.SetStatus(managedPreOp.GetStatus()) and expected this test to fail. But it passed again which implies that we don't actually test what we want. Upon further investigation, issue seems to be related how our mock stores the conditions (i.e. not changes status at all).

@gravufo
Copy link
Author

gravufo commented May 30, 2025

Thanks @gravufo for the PR and sorry for the late response here!

No problem, I didn't push too much on this one, because I am very hesitant with my proposed fix. The problem is that this fixes only a single use-case, but doesn't really fix all the other use-cases (wanting to modify status during a Create/Update for instance). Basically, any call that will update the object spec makes us lose the status sub-object modifications. In this case it's a little worse because we literally lose the full sub-object since it's a brand new MR which had no status inside kubernetes yet.

I really don't know what the proper way forward would be to fix all use-cases. Fixing them one by one with a "hack" kind of feels weird and also it would be easy to reintroduce the problem if there are any new calls done during this flow in the future.

I am open to ideas :)

// GetStatus of this Composed resource.
func (cr *Unstructured) GetStatus() xpv1.Status {
status := xpv1.Statused{}
if err := fieldpath.Pave(cr.Object).GetValueInto("status", &status); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about status.conditions and .status being stored independently.


// A Statused may have status set or retrieved. Status is typically
// a struct containing various runtime status properties.
type Statused interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we need this... perhaps we just need Pave.SyncStatus(old, new)?

// The AddFinalizer call may overwrite the managed object with what is
// currently in the kube API which would make us lose information that may
// have been populated by the Observe call.
managed.SetStatus(managedPreOp.GetStatus())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to make a new pave function that copies the original pre-op status into managed rather than introduce the .Set/GetStatus?

I think it has the same effect and does not spread out the API

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely don't mind, I simply followed initial advice provided on the issue in this comment.
Either way, I'm really unsure about this fix overall.

@turkenh
Copy link
Member

turkenh commented Jun 2, 2025

No problem, I didn't push too much on this one, because I am very hesitant with my proposed fix. The problem is that this fixes only a single use-case, but doesn't really fix all the other use-cases (wanting to modify status during a Create/Update for instance). Basically, any call that will update the object spec makes us lose the status sub-object modifications. In this case it's a little worse because we literally lose the full sub-object since it's a brand new MR which had no status inside kubernetes yet.

FWIW, we are good with other cases in the managed reconciler, and this (after observe, patching finalizer) is the only remaining one.

  • It does a kube.Update() after Create for late-initialization, but this triggers a new reconciler and next time the status is persisted properly. See this comment.
  • There is no kube.Update() after external.Update(), so status being persisted successfully.

It is clear what we need to do, the question is simply how :D

  1. Follow the existing convention and define interfaces to work with Status (what this PR doing).
    i. Need crossplane-tools update to generate new interfaces.
    ii. Status and status conditions being managed independently via 2 different getter/setter.
  2. Make an exception and directly patch unstructured object by paving this time.
    i. The first time we will be using pave/unstructured in managed reconciler.
  3. Update APIFinalizer's AddFinalizer() method, so that it doesn't lose un-persisted status.
    i. Possible behavior change for other callers of AddFinalizer().
    ii. We will still need unstructured/pave.

I still lean on (1) but no strong opinions if we chose (2) or (3). @negz, interested to hear your thoughts.

@gravufo
Copy link
Author

gravufo commented Jun 2, 2025

  • It does a kube.Update() after Create for late-initialization, but this triggers a new reconciler and next time the status is persisted properly. See this comment.

Maybe I didn't understand properly, but usually the create is only called once, so if that first time it's called we lose the status, we'll never restore what the Create wanted to save, no?

@turkenh
Copy link
Member

turkenh commented Jun 3, 2025

  • It does a kube.Update() after Create for late-initialization, but this triggers a new reconciler and next time the status is persisted properly. See this comment.

Maybe I didn't understand properly, but usually the create is only called once, so if that first time it's called we lose the status, we'll never restore what the Create wanted to save, no?

You're right. However, in practice, I am not aware of any information that is unique to the Create call and not obtained in an Observe call. The only exception could be sensitive information that external API returning only after Create (e.g. API secret key), but those are stored into secrets as connection details instead of object status. So, the next Observe is expected to form the latest status by making a fresh observation and we're not worried about the status after Create getting lost.

@negz
Copy link
Member

negz commented Jun 7, 2025

@turkenh and I chatted about this a bit the other day. I agree updating resource.Managed to have GetStatus and SetStatus methods would be the most idiomatic way to do it. That'd require a lot of updates across the ecosystem though, and ultimately those methods would need to return and accept opaque data. So maybe we may as well just use fieldpath.Pave?

It kinda kills me that we have to do this (save and restore status) at all though. It seems like an anti-pattern in that it basically works around optimistic concurrency. If we ever added another controller that wanted to update MR status we'd have a race:

  1. MR reconciler saves status
  2. MR reconciler adds finalizer
  3. Other reconciler updates status (bumping resourceVersion)
  4. MR reconciler restores status, undoing 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AddFinalizer clears Status subobject leading to delayed object readiness
4 participants