-
-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sub-]handlers (sort of) break eventual consistency at high load? #641
Comments
Yes, that is a known issue. Sadly, it is difficult to fix. Several (2 or 3) attempts were made in the past, all failed (zalando-incubator/kopf#279 (comment)). My comment from there:
The issue can happen even with the normal handlers, not necessary the sub-handlers — if the object does change before the full handling cycle is complete. In that case, the last seen essence of the object is persisted as a diff-base, as seen after the last successful handler. The preceding handlers, however, could have seen a different state (before the changes) and can end up with an inconsistent end-state. The issue will be addressed for sure. Thanks for refreshing it and providing a repro! In one of the latest attempts (#163), I tried to keep a hash/digest/checksum of the object's essence for each individual handler & subhandler once it is finished. And if, on the next cycle, the hash mismatches with the current essence, it means that the object has changed, and the handler/subhandler should be re-executed with the current-aka-latest state. Sadly, I do not remember why it failed — need to re-read #182 carefully and re-play the whole changeset. Maybe, all the problems that prevented this "eventual consistency" fix from being applied, are solved by now. But the codebase has changed a lot since Aug 2019. |
Thank you for your detailed answer. |
I am pretty sure we are hitting this bug now. Does anyone have suggested workarounds? The only thing I can think of do for now is to put a timer on the handlers in our code that are susceptible to this, so even if a change is missed eventually the timer will trigger the handler. Obviously this is not very satisfying. If anyone has better suggestions I'd love to hear them. |
@hapatrick My workaround was to split watched entities over multiple namespaces and replicate the controller to reduce the load per controller. I think adding timers would make it worse as the problem stems from the fact that the controller does not have enough time to process a handler and its subhandlers before the next change. |
Long story short
When a handler has multiple long-running subhandlers (i.e: external API calls) and the rate of events is below the default batching window it happens that subhandlers are never executed even when the event stream stops.
Description
From my understanding kopf guarantees that handlers and subhandlers will process the latest change soon. This means that it may skip processing events and go to the latest version when the operator is available to do so (event batching).
An interesting side effect of this is that if the final state of a resource does not trigger a handler (or subhandler), the latest event is not guaranteed to be processed.
When a handler has multiple subhandlers (or a single one raising a
TemporaryError
), it will require multiple lifecycles for an event to be processed.In the following scenario, events can come faster than they can be processed:
When this happens the resource may change between the subhandlers lifecycles.
This has two very weird side effect:
This is very different from running a single handler without subhandler and can add so much complexity on high load systems that the only advantage of the subhandlers (idempotence) seems a very weak argument.
I think the documentation warning should be much bigger and scary, because subhandlers may seem an elegant solution initially but may cause your system to fall apart in production when under high load.
Here is a small repro script with a bit of plumbing using the documentation example CRD
Execution output:
[PARENT] 17
↳
[SUB 0] 17
[PARENT] 19
↳
[SUB 1] 19
[PARENT] 19
↳
[SUB 2] 19
[PARENT] 19
↳
[SUB 0] 19
[PARENT] 19
↳
[SUB 1] 19
[PARENT] 19
↳
[SUB 2] 19
Checklist
Keywords
subhandlers, idempotence, eventual consistency
The text was updated successfully, but these errors were encountered: