Skip to content

Driver Load Testing: Set pod CPU & mem Requests for driver Container & Node components #356

@davidz627

Description

@davidz627

Driver Load Testing

Key

Controller Purple: Total
Node Light Blue: Total/Total Non-Evictable
Node Orange: Total Evictable (weird spike)

Configuration

  • GKE cluster with 1 n1-standard-32 node
  • Stable driver v0.5.1
  • Disks are 6Gi

Timeline

3:35: I create 50 pods on one node
3:45: Delete 50 pods on one node
3:53 Start 50 pods again
3:59 Delete 50 pods again
4:05 Start 100 pods on one node
4:15 Delete 100 pods

Test Results

GKE Container - CPU usage for Controller

GKE Container - Memory usage for Controller

GKE Container - CPU usage for NODE

GKE Container - Memory usage for NODE

Analysis

Test Limitations

  • Only used one node so probably not the max stress possible on the controller. Looks like we didn't hit the limit in terms of controller ops with the max pods per node because the CPU was still scaling linearly.
  • Node backoff is probably not optimized so we probably didn't hit the max stress possible on the node (however, with the current version of the driver/csi-sidecars with its current backoff settings I believe we hit the max stress)
  • 6Gi disks probably smaller than many production disks, larger disks may take longer/more cpu/mem to format

Conclusions

  • Controller memory seems to stay stable at ~50MiB regardless of load
    • Setting pod request of 60MiB seems reasonable
  • Node memory seems to stay stable at ~40MiB regardless of load
    • Setting pod request of 50MiB seems reasonable
    • Setting pod limit too since this runs on user node 150MiB seems reasonable

Next Steps

  • Test with 100 pods on one node with "larger" PD's to stress node more (harder/longer to format?)
  • Test with more pods across more nodes to stress controller more (we want to hit a point where the CPU no longer increases with larger pod startup sizes which implies some other component of the system is limiting or parallelism limit is reached)

/cc @verult @msau42

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions