-
-
Notifications
You must be signed in to change notification settings - Fork 628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed too slowly if set cpu-limit for pod in Kubernetes. Or high cpu usage with cpu-nolimit. #982
Comments
To clarify, if you don't limit the CPU, it uses 600% but you get about 7x the speed than when you limit it to 100%? |
You probably don't want to set some of those constants so high, particularly max unverified bytes. The request algorithm will try to read ahead much to far and expend a lot of computation excessively parallelizing downloads. More is not better here. On resource constrained systems you actually want to lower those values. Could you let me know how you go with mostly defaults? |
The default storage implementation, file, is not very good and has very high system overhead. If you are caching, consider possum, otherwise mmap, if you are able to control file consistency/availability. |
The CPU profile you submitted makes me think that the request upload routine could do with some optimization, it probably needs to batch a bit. Otherwise it's a pretty healthy looking profile. I think you could mostly improve on that by reducing EstablishedConnsPerTorrent to maybe 25 or so. |
@anacrolix Sorry for reply later. I changed the storage to MMAP:
And i set torrent to MMAP storage:
Limit 1c CPU
No-limit CPU
|
I found a situation with no-limit cpu. My file-transfer model is one that sends to multiple via torrent (1 -> N). The main linux-node generate the torrent, others download the torrent from main-node. Main-node used 5c CPU, others used 0.8-1.5c CPU. @anacrolix Do you have any suggestions for me in this situation? Or is there a distributed cache that can be shared by all clients in a local area network to reduce the CPU usage of the calculation? |
Is c 100%? The upload routines are not highly optimised. Probably the most helpful thing is to send a few CPU profiles when usage is highest. |
Yes, 1c is 1 CPU-core. Wait a minute, I will now construct a cpu-profiles |
@anacrolix I found that uploading took up a lot of CPU resources pprof.bcs-image-proxy.samples.cpu.002.pb.gz |
@anacrolix Is there any way to optimize uploads? :) |
I haven't had a chance to look yet. A few days to develop a comprehensive integration performance test is the best chance, but I or others need sponsorship for that. I'll take a look at the profiles tonight |
I highly suspect this workaround is causing issues: torrent/peer-conn-msg-writer.go Lines 99 to 110 in 3f5ef0b
The buffer size limit is rather small for offloading a lot of data and will be breaking up fast uploads into a significant number of syscalls to unaligned memory. I expect if we fix that we could see 2-3x the throughput easily by correcting that, possibly a lot more, and probably a lot less CPU. There is also some significant memory allocation overhead that I can see in the CPU profile. Could you provide a heap profile too? You should be able to collect that at the end of a long session. alloc_space and alloc_objects are what we want, and those should both be included by a single heap profile. It will be the next large contributor after the above is fixed, unless it is a symptom of the above too. |
@anacrolix I think maybe memory is not an issue, most of the memory is used by cache. This is my cgroup info in pod
I also give you the heap profile: heap.gz |
I also got some problem. When I have a lot of torrents downloading at the same time, I get no-speeds sometimes:
|
@anacrolix There have a newest heap-profile: |
I found that if I adjusted the
But after i generate the torrent and let other-nodes download. At the beginning, there was a long period of time without speed.
I guess there is some performance bottleneck in torrent uploads. But I don't know if there is any way to solve it. Or is there a private version that I can test? |
Just to clarify, memory overhead as in the cost of allocation and management. Not the actual amount of memory used. #982 (comment) remains my strongest suggestions. You can try to disable or remove that WebRTC workaround and see if it makes a difference. |
Should I just delete this paragraph and reference my private
|
Try master where I've removed the limit except for webtorrent peer conns. On a re-read of the code, I had in my head that the limit was much smaller than it actually is. It might not have as much impact as I expected, but your feedback with master will be valuable. |
I used pprof-cpu-20241022.1.pb.gz |
In cpu 2, the CPU usage of |
I want to develop a file distributor based on torrent and deploy it in a Kubernetes cluster using DaemonSet. I deployed the chihaya/chihaya as the tracker.
Every node i have a torrent client:
When i need distribute a file from one node, i'll generate the torrent first
Then every torrent client(running in K8S pod) will start download this torrent. But the speed is too slowly, about 8-9MiB/s.
(The K8S pod is set cpu limit to 1c)
If i set cpu no-limit, the torrent client will use about 6c CPU, speed 60MiB/s. I don't know where the problem is.
pprof file
cpu.pb.gz
The text was updated successfully, but these errors were encountered: