Optimize the speed of concurrent get of pytorch models #1884

dashanji · 2024-05-07T03:27:52Z

Describe your problem

Currently, getting a pytorch module at high concurrency is very slow as follows. The test machines's max bandwidth are both 30Gbps.

Vineyard

Concurrencies	Time of getting	Observed Network Bandwith from Dstat
1	2.57s	around 2000Mi
6	7.73s	around 3800Mi
13	14.58s	around 3800Mi
27	29.32s	around 3800Mi

Iperf

Concurrencies	Observed Network Bandwith from Dstat	Total Network bandwidth
1	around 1470Mi	12Gbits/s (1500Mib/s)
6	around 3700Mi	31.1Gbit/s (3888Mib/s)
13	around 3650Mi	30.9Gbit/s (3863Mib/s)
27	around 3650Mi	30.9Gbit/s (3863Mib/s)

Solution

In the actual scenery, the pytorch models used to be loaded in the machine with GPU, which always have high- performance networks. Thus, the bandwidth of vineyardd instance is the bottleneck. We can distribute the PyTorch model blobs among different Vineyard instances to increase network bandwidth.

…#1891) Fixes #1884 Signed-off-by: Ye Cao <[email protected]>

dashanji added the performance Issues that related to the performance of vineyardd and vineyard SDKs. label May 7, 2024

dashanji self-assigned this May 10, 2024

dashanji added this to Vineyard Releases May 10, 2024

github-project-automation bot moved this to Todo in Vineyard Releases May 10, 2024

dashanji moved this from Todo to In Progress in Vineyard Releases May 10, 2024

dashanji mentioned this issue May 15, 2024

Support to put a torch module into all vineyard instances dispersedly #1891

Merged

sighingnow closed this as completed in #1891 Jun 5, 2024

sighingnow pushed a commit that referenced this issue Jun 5, 2024

Support to put a torch module into all vineyard instances dispersedly (…

3e9ff47

…#1891) Fixes #1884 Signed-off-by: Ye Cao <[email protected]>

github-project-automation bot moved this from In Progress to Done in Vineyard Releases Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the speed of concurrent get of pytorch models #1884

Optimize the speed of concurrent get of pytorch models #1884

dashanji commented May 7, 2024 •

edited

Loading

Optimize the speed of concurrent get of pytorch models #1884

Optimize the speed of concurrent get of pytorch models #1884

Comments

dashanji commented May 7, 2024 • edited Loading

Describe your problem

Solution

dashanji commented May 7, 2024 •

edited

Loading