Skip to content

Investigate reducing memory usage on systems with many resources (secrets, configmaps, pvcs, etc) #766

@tesshuflower

Description

@tesshuflower

Describe the feature you'd like to have.

VolSync is an operator that watches resources across all namespaces - and watches many different resource types. On systems with many resources the memory usage can balloon.

Most likely cause of this are when a system has many secrets (configmaps, pvcs also potential causes).

VolSync currently watches the following:

https://github.com/backube/volsync/blob/main/controllers/replicationdestination_controller.go#L159-L166

but there are other resources too - such as:

  • configmaps

The controller-runtime client will add a watch on these as soon as they are queried for.

What is the value to the end user? (why is it a priority?)

How will we know we have a good solution? (acceptance criteria)

Additional context

Potential solutions:

  • https://sdk.operatorframework.io/docs/best-practices/designing-lean-operators/

    • Option to filter out resources from the cache by label or field
    • Note that the (potentially dangerous) part of this is anything that is filtered out becomes "invisible" to the client.
    • For secrets/configmaps/pvcs we do need to query for user created resources - so this may not be viable unless when querying we bypass the cache (i.e. use a client w/ reader that doesn't go to the cache for these queries)
    • This may be particularly useful for resources that VolSync creates and then watches (jobs, services, serviceaccounts, roles, rolebindings)
      • need to confirm but I don't think we care about watching/querying any of those resources that aren't created by VolSync.
      • potential gotcha here - does this break any createOrUpdate operations for example if a resources with the same name exists but has been filtered out by the cache?).
      • Guessing resources in the list above are less problematic from a memory perspective than secrets, configmaps.
  • Another potential option is to disable caching for certain resource types - see an example here: https://github.com/external-secrets/external-secrets/pull/729/files

  • Can also cache with metadata only (I think this is used in the external secrets update above as well?):
    ✨ metadata-only watches kubernetes-sigs/controller-runtime#1174

All these potentially have issues with causing more API traffic if we're doing a lot of reconciles if we have to get the resources every time.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions