-
Notifications
You must be signed in to change notification settings - Fork 166
Open
Labels
lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.Indicates that an issue or PR should not be auto-closed due to staleness.
Description
Driver Load Testing
Key
Controller Purple: Total
Node Light Blue: Total/Total Non-Evictable
Node Orange: Total Evictable (weird spike)
Configuration
- GKE cluster with 1 n1-standard-32 node
- Stable driver v0.5.1
- Disks are 6Gi
Timeline
3:35: I create 50 pods on one node
3:45: Delete 50 pods on one node
3:53 Start 50 pods again
3:59 Delete 50 pods again
4:05 Start 100 pods on one node
4:15 Delete 100 pods
Test Results
Analysis
Test Limitations
- Only used one node so probably not the max stress possible on the controller. Looks like we didn't hit the limit in terms of controller ops with the max pods per node because the CPU was still scaling linearly.
- Node backoff is probably not optimized so we probably didn't hit the max stress possible on the node (however, with the current version of the driver/csi-sidecars with its current backoff settings I believe we hit the max stress)
- 6Gi disks probably smaller than many production disks, larger disks may take longer/more cpu/mem to format
Conclusions
- Controller memory seems to stay stable at
~50MiBregardless of load- Setting pod request of
60MiBseems reasonable
- Setting pod request of
- Node memory seems to stay stable at
~40MiBregardless of load- Setting pod request of
50MiBseems reasonable - Setting pod limit too since this runs on user node
150MiBseems reasonable
- Setting pod request of
Next Steps
- Test with 100 pods on one node with "larger" PD's to stress node more (harder/longer to format?)
- Test with more pods across more nodes to stress controller more (we want to hit a point where the CPU no longer increases with larger pod startup sizes which implies some other component of the system is limiting or parallelism limit is reached)
Metadata
Metadata
Assignees
Labels
lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.Indicates that an issue or PR should not be auto-closed due to staleness.



