-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce excessive number of running datamovers relative to loadConcurrency values #8344
Comments
This is indeed a problem. |
Some more inputs: Therefore, the current issue could be alleviated somehow if users follow the recommendation --- unless necessary, the resource configuration should not be set. On the other hand, it is not a bad idea to have some overflow for the number of backupPod against loadConcurrency --- the snapshot expose also takes considerable time, it helps to improve the pipeline if we have several backupPods doing the snapshot expose while some others within the loadConcurrency running VGDP. Therefore, a well-designed strategy which considers node-selection, node-concurrency may make the ultimate solution. |
Yes, I noticed. Which is why I didn't say it blocked scheduling, it only skewed it. The existing state model allows datamovers to remain in state Pending without issue when pod limits and node limits are hit. I've rarely seen anyone go through and set priority options. And developers tend not to want to set a default due to user priority will vary. Anything with priority went ahead over datamover Pods in the large applications. Few people bother to set Pod priorities or use PriorityClass (k8s 1.24 required). Tends to be the bottom of the pile for options to implement. And no one wants to go through and set priority for them all, especially in shared environments where admin may not even know what the applications are. Setting a node-agent-config podResources is absolutely required for something resembling sane scheduling right now to get Burstable or Guaranteed QoS. Once kopia starts the memory usage of the datamover will go from a few MBs to sometimes GBs depending on workload and changes since last backups. Using BestEffort with that behavior is just a recipe for evictions. Currently testing with user-created evictions with kubectl evict-pod. Clearing out a hundred or so datamover pods is not desirable from failed state progression. |
@Lyndon-Li I've thought of another thing that might help with scheduling besides PriorityClass and priority options. Currently the node-agent-config podResources have to be set sufficiently high to account for the largest workload in the backup or restore and get Burstable or Guaranteed QoS. Being able to set resource profiles for sets of volume would make scheduling less disruptive in Burstable and Guaranteed QoS by setting resource profiles more accurately. This was already an issue with the node-agent backups in Velero 1.14 but you can't exactly swap out the resource value in the middle of a running Pod. Datamover Pods don't have that inherent restriction. |
Actually, the default behavior is
Could you share an example of how would the profiles are set? Are you talking about the case when the resource policy is not |
Describe the problem/challenge you have
During backup with EnableCSI and snapshot-move-data enabled there is a large number of datamovers started after the initial resource backup finishes.
Some of our applications have upwards of 100 PVCs resulting in nearly a hundred datamover pods.
The existing design model creates the datamovers but only lets them perform the backup up to the loadConcurrency configuration in parallel. While the waiting datamover pods use little in actual resources, the resource limits has to be set to what is required to backing up the largest in resource usage and skews scheduling of other workloads.
The loadConcurrency cannot be increased beyond a certain point or kopia begins to run out memory from the nodes.
This example debug logs illustrate a small example. When loadConcurrency is constrained to 1 as well and the loadAffinity to a single node with 10 PVCs. The result is 10 datamovers only one of which executes.
bundle-2024-10-23-15-47-30.tar.gz
Describe the solution you'd like
Do not deploy datamover pods until load concurrency policy allows datamovers to run.
Anything else you would like to add:
Proposed Solution:
Instead of deploying the datamover during Prepared. Move the concurrency check and the datamover pod creation into this state.
Largely remains unchanged but without the datamover pod creation.
Copy the VolumeSnapshot, VolumeSnapshotContents, and create the PVC. On restore, the changes are just create the PVC from the application PVC spec. Do not create the datamover container.
Prepared -> DatamoverDeploying:
The concurrency check occurs on this state. On success, create the datamover Pod.
The DataUpload should move into the Prepared state when datamover pod has phase Running as is.
Environment:
velero version
):Client:
Version: main
Git commit: c53ab20-dirty
Server:
Version: main
Only changes are to Makefile and modify Dockerfile.ubi from github.com/openshift/velero to make a velero container compatible with OpenShift. Dockerfile from Velero will not work with OpenShift due permissions denied creating folders. I am aware of my less than stellar reputation. I do not have access to generic Kubernetes at all nor allowed to deploy generic Kubernetes. I have no choice but to use an alternate Dockerfile to get Velero in a container without Red Hat OADP additions and internal modifications.
A copy of the changes to Makefile and added Dockerfile.ubi are available at my branch at https://github.com/msfrucht/openshift-velero/tree/velero_in_openshift
kubectl version
):Client Version: 4.15.12
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.16.8
Kubernetes Version: v1.29.7+4510e9c
/etc/os-release
): Red Hat Enterprise Linux CoreOS 416.94.202408132101-0"Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: