Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep a live in-memory "view" of all matching resources in the cluster #661

Merged
merged 11 commits into from
Feb 24, 2021

Conversation

nolar
Copy link
Owner

@nolar nolar commented Jan 31, 2021

Indexers automatically maintain in-memory overviews of resources (indices), grouped by keys that are usually calculated based on these resources.

See the detailed description and several recipes in the .rst file in the PR's changes. The essence:

import kopf

@kopf.index('pods', labels={'excluded': kopf.ABSENT})
def pod_phases(namespace, name, status, **_):
    return {(namespace, name): status.get('phase')}

@kopf.timer('kex', interval=5)
def tick(patch, pod_phases: kopf.Index, **_):
    patch.status['children'] = [
        f"{pod_namespace}/{pod_name}"
        for (pod_namespace, pod_name), pod_phase in pod_phases.items()
        if pod_phase == 'Running'
    ]
    print(patch.status['children'])

The indices can be used for cross-resource awareness: e.g., when a resource of kind X is changed, it can get all the information about all resources of kind Y without talking to the Kubernetes API.

The indices are optimized for lookups by keys (O(1) — the same as Python's dicts), though iterating over them is also an option. The updates & deletions are O(k), where "k" is the number of keys per object (not of the overall keys!).

A side-note: The in-memory indices are a step further towards cross-resource handlers (an example: when a child pod is changed, the cross-resource handler should be invoked in the context of its parent, not of the child itself, with all other pods' information at hand). In that case, only one resource is known due to the arrived event, while another resource should be either remembered in-memory or fake-patched with the full info of each child to trigger the handling. The latter way looks stressful to K8s & K8s API, so the in-memory way seems the only viable solution.


Things left to do (can take time):

  • Decide: do we need a key= callback or the key structure must be frozen (mapping to K8s API uniqueness: resource, namespace, name). What if 2+ handlers for the same function have different key structures?
    • No. Custom keys complicate things and give no benefits. A minimalistic (namespace, name) key is enough. Resources are not needed as a part of the key, as they can be replaced by a cache name/id.
  • Decide: are views retriable? What do the TemporaryErrors/PermanentErrors mean? Backoffs? Timeouts?
    • No, not retriable. However, the interpretation of errors has to be decided: do they remove the cache entry on permanent error? Mark it as out-of-date on a temporary error?
  • Decide: can we implement the same with the existing handlers/causes/registries (watching-changing-spawning), or do we need a new type?
    • A new type and a new registry is needed. The cache handlers are executed separately from all other handlers — to ensure that the caches are pre-populated before doing regular handlers. Besides, they might have their own properties: e.g. ttl.
  • Decide: Naming: cacheing (rare), caching (common), indexing, or something else. Formally, it is not caching as a faster memoisation technique.
    • Indexing. Also: indices (the user-facing structures), indexers (internal managers of indices).
  • Implement: proper Cache/Caches classes; currently: defaultdict. Separate read-only (operator) and read-write (framework) modes accessors.
  • Implement: filtering in and filtering out (i.e., not just deletion, but changing labels/annotations/fields/values to become included/excluded).
  • EXCLUDED: Implement: a condition (sync/async) to wait for changes in the daemon — for immediate reaction in another resource, not just time-intervalled. (maybe as a separate enhancement).
  • EXCLUDED: TTL for cache entries — to save memory on long runs. (questionable! no resync is possible!).
  • EXCLUDED: Up-to-date/out-of-sync/stale read-only flags. (see errors interpretation above).
  • Clean up the code.
  • Write the tests.
  • Write the docs.
  • Manually test that all docs examples do actually work.

@nolar nolar added the enhancement New feature or request label Jan 31, 2021
@nolar nolar mentioned this pull request Jan 31, 2021
2 tasks
@lgtm-com

This comment has been minimized.

@nolar nolar force-pushed the live-views branch 2 times, most recently from 8642449 to 8fbadd3 Compare February 6, 2021 12:07
@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@nolar nolar marked this pull request as ready for review February 19, 2021 22:50
@lgtm-com

This comment has been minimized.

@nolar nolar force-pushed the live-views branch 3 times, most recently from 68035e1 to c9df0f0 Compare February 21, 2021 14:22
Signed-off-by: Sergey Vasilyev <[email protected]>
If the resource is non-indexable, there is no need to create the individual toggles neither for the resource kind nor for each individual resource object — such resources do not block the operator's readiness (but can be blocked themselves).

In practice, even if created, those toggles will be turned on and/or dropped as soon as the processing routine is entered and finds that there are no indexing handlers (nothing to do). This is a waste of computing power and memory on startup, which can affect the operators in massive clusters — for no benefit.

Instead, only create resource kind's & resource object's toggles when and if there are indexing handlers to be executed.

As a particular case, when an operator has no indices at all (assumed to be the default and the most common case), create no single toggle, or at most one (for the global operator block until all specialised per-resource-kind blockers are checked & skipped). Without this fix, there will be as many useless toggles as there are resource kinds (usually below ten) PLUS resource objects (can be in hundreds or thousands).

Signed-off-by: Sergey Vasilyev <[email protected]>
@nolar
Copy link
Owner Author

nolar commented Feb 24, 2021

So far, I gave it a couple of days to settle down and check if there are any other after-thoughts. There are none. It can be merged.

The only concern left is that it could be beneficial in the future if other builtin types are specially interpreted similar to the 1st-level dict now. E.g., accept multi-level dict for multi-leve indices, or merge list/set into the indexed store instead of storing them as values. The current interpretation of only the 1st-level dict feels assymetrical.

In that case, it would require backward-incompatible changes (now, such values are treated as values).

One solution would be to wrap these types now in something like kopf.literal() to disable their interpretation. Which is overcomplication. Another solution would be to never interpret the builtins but always expect explicit return kopf.Index({key: val}) returned (or return kopf.Index({key: kopf.Store([...])}) for merging the items). This also looks overcomplicated from the API/DSL point of view; and is wordy.

I believe, this forward compatibility cannot be ensured now. This can be extended later with additional options to @kopf.index() or additional wrappers as shown above — to hint how the structural results should be interpreted (indexed/stored/merged); if at all needed.

@nolar nolar merged commit 8a08e0a into master Feb 24, 2021
@nolar nolar deleted the live-views branch February 24, 2021 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant