You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, getting a pytorch module at high concurrency is very slow as follows. The test machines's max bandwidth are both 30Gbps.
Vineyard
Concurrencies
Time of getting
Observed Network Bandwith from Dstat
1
2.57s
around 2000Mi
6
7.73s
around 3800Mi
13
14.58s
around 3800Mi
27
29.32s
around 3800Mi
Iperf
Concurrencies
Observed Network Bandwith from Dstat
Total Network bandwidth
1
around 1470Mi
12Gbits/s (1500Mib/s)
6
around 3700Mi
31.1Gbit/s (3888Mib/s)
13
around 3650Mi
30.9Gbit/s (3863Mib/s)
27
around 3650Mi
30.9Gbit/s (3863Mib/s)
Solution
In the actual scenery, the pytorch models used to be loaded in the machine with GPU, which always have high- performance networks. Thus, the bandwidth of vineyardd instance is the bottleneck. We can distribute the PyTorch model blobs among different Vineyard instances to increase network bandwidth.
The text was updated successfully, but these errors were encountered:
Describe your problem
Currently, getting a pytorch module at high concurrency is very slow as follows. The test machines's max bandwidth are both 30Gbps.
Vineyard
Iperf
Solution
In the actual scenery, the pytorch models used to be loaded in the machine with GPU, which always have high- performance networks. Thus, the bandwidth of vineyardd instance is the bottleneck. We can distribute the PyTorch model blobs among different Vineyard instances to increase network bandwidth.
The text was updated successfully, but these errors were encountered: