-
-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relatively lower performance than OpenSSH #69
Comments
Hi, thanks for taking time to do this test, very appreciated. I don't have a 40 gigabit ethernet card. Since the first implementation I noticed, testing on localhost, that sftpgo is a bit slower and uses more CPU than openssh so I can confirm what you report but the difference is not so big in my tests (280MB/s OpenSSH, 230MB/s SFTPGo/SFTP, 250MB/s SFTPGo/SCP) In real environments SFTPGo was able to completly saturate my network card, using more CPU than OpenSSH, but my test were limited to 1 gigabit ethernet cards. Can you please test multiple parallel transfers and report if you are able to saturate your 40 gigabit network? Can you repeat the same test using scp? SCP is implemented from scratch inside SFTPGo while for sftp we use pkg/sftp library. If scp will perform better I'll try to see if pkg/sftp can be improved. Also please note that currently golang ssh implementation does not support zlib compression, so if OpenSSH uses compression the real transferred size is smaller than the one transferred using SFTPGo. Anyway I get the same results using both sftpgo and the sample, very basic, implementation here: https://github.com/pkg/sftp/tree/master/examples/go-sftp-server so we need to improve pkg/sftp |
I switched to Ubuntu 19.10 since scp sucks on Windows. I've noticed a huge improvement between Filezilla 3.39 (in repo) and latest 3.46, maybe it's because :
Parallel testThe speed under Windows was 2 streams.
SCPI got 208MB/s vs 235MB/s, and it seems like spaces in path is not correctly handled, I've tried 3 types of escape, but scp returns immediately without transfer.
Me too I doubt it should be golang's sftp package issue, but since I'm not familier with Go I didn't look further. |
Thanks for reporting back, spaces in scp path is now fixed, this command
should work now. I'll try to fix the Regarding the performance issue I'll do some other tests myself (on localhost, I don't have a network as the your) and eventually I'll try to ask upstream. From your test results it seems that SFTPGo can easily saturate a gigabit connection while it has issues if you have more bandwidth, 8 streams are served at 3 gigabit/s |
For gigabit connection it's totally fine. I can help you if you need more tests in high bandwidth condition. |
Hi, I did some profiling and the bottleneck seems the encryption, without specifying a cipher the default on my laptop is
setting
while setting
can you post your results using Can you also better explain your use case? Thanks! |
I've some interesting findings.
Here we go for the
|
Stream | SFTPGo MB/s | SFTPGo CPU% | OpenSSH MB/s | OpenSSH CPU% |
---|---|---|---|---|
1 | 500 | 105 | 360 | 100 |
2 | 950 | 195 | 680 | 195 |
3 | 1350 | 284 | 980 | 285 |
4 | 1650 | 332 | 1100 | 372 |
8 | 2400 | 520 | 1550 | 616 |
SFTP
SFTP also got a huge boost.
Stream | SFTPGo MB/s | SFTPGo CPU% | OpenSSH MB/s | OpenSSH CPU% |
---|---|---|---|---|
1 | 260 | 173 | 340 | 102 |
2 | 420 | 254 | 640 | 208 |
3 | 500 | 297 | 800 | 300 |
4 | 580 | 333 | 1000 | 400 |
8 | 700 | 390 | 1450 | 648 |
Indeed [email protected]
can massively improve the performance, but seems like there are some compatibility issues, Rebex SSH also not working.
In my tests I used SFTP cli, I tested with Filezilla now and I see that it does not support aes gcm.
In my tests on localhost, transferring a 1GB file, filezilla speed against OpenSSH is 190MB/s, agaist SFTPGo (forcing the same cipher, aes-256-ctr) is 160 MB/s
I tested SCP now and I confirm that my simple implementation outperforms OpenSSH if the cipher is not the bootleneck (for example aes128-gcm). This means that pkg/sftp could be optimized in some way, but if we cannot use AES GCM the bottleneck will remain the encryption. I did a quick test, on a virtual machine, using centos8 and the red hat (fips certified) go-toolset that replace golang crypto implementation with openssl but openssh still performs better when using aes ctr based ciphers. I could also try boringssl golang branch but I don't think this will perform much better than go-toolset from Red Hat. Maybe for your use case a load balancer, such as haproxy, could help to balance the load between several backends.
|
I've looked at this issue: https://github.com/golang/go/issues/20967
After applying the patch:
|
Great! And how about sftpgo performance compiled against this patched Go version? Here is a patch that save CPU profiling to
|
Hi, I applied and tested that patch myself, this only give a small performance improvement, the bottleneck is now MAC, when using aes gcm mode the mac is implicit, from sftp -vvvv output:
while for aes ctr:
so we need a way to improve golang sha2 performance. We could try to path crypto/ssh https://github.com/golang/crypto/blob/master/ssh/mac.go#L12 to use, for example, this sha2 implementation: |
Wow we came up to the same thing ! I just finished my test and saw you ended up with the same conclusion.
https://github.com/minio/sha256-simd gave me a decent result:
While the original version is quite slower:
|
Hi, I forked golang crypto and replaced sha256 implementation with sha256-simd but still sha256 is the bottleneck:
my laptop only support AVX2, can you please try it on your hw? Based on the benchmark you posted above it should support SHA extension. Here is a patch to replace the default golang sha256 with sha256-simd
|
Hi,
Combine both the AES and SHA patch, speed is increased by more than 60%.
|
Great! So we are now closer to OpenSSH performances, and what about SCP? thanks! |
SCP out performs OpenSSH in
|
Hi, I did some minor performance improvements in pkg/sftp, you can test them using this diff
Can you also post your results for SCP downloads after applying the following patch?
this should decrease SCP downloads performance and should give us an idea of what we can achieve avoiding to reallocate memory inside pkg/sftp. I retested SCP too and while downloading a file via SCP is as fast as OpenSSH, uploads are slower than OpenSSH on my laptop. Can you please confirm that your benchmark is for SCP downloads? The main difference between SCP uploads and downloads is that for downloads we use sequential file reads, while for uploads we use random access writes. pkg/sftp uses random access for both reads and writes. Can you please post the results for SCP uploads and downloads too (removing the scp patch above)? I suspect that using sequential reads/writes will give more improvements than avoiding to reallocate memory for each packet and I think we'll need to work on this, but I would like to see the results for the above tests on your hardware too before start to write this code. Thanks! |
Hi, SCP patch
Yes my tests are for downloads. SCP uploadDue to zfs cache and overhead it's hard to directly compare upload and download, but it's still a big difference.
With the performance improvements in pkg/sftp I get about 20% speed gain. |
Thanks for the results. In the coming weeks I'll try to write a patch that buffer reads and writes, in memory, to allow to do sequential disk access, I'll try to make configurabile the read and write chunks so we can for example use big chunks such as 1MB or so, we'll see if it improves something |
Hi, I did a really ugly patch that adds an allocator to pkg/sftp, it improves something I posted some results here: maybe I'll push this really ugly patch to my repo in the coming days, actually I don't know how to get more improvements. I tested sequential access vs random access in my scp implementation and it doesn't change anything on my laptop (ssd disk), implementing it in pkg/sftp requires a lot of effort and I'll not write this patch, at least for now, since it seems useless |
Here is my proof of concept allocator https://github.com/drakkan/sftp in my tests (on localhost) now uploads performance is very similar to my scp implementation, downloads improved too but are still slower than scp downloads, I looked at pkg/sftp code several times but for now I don't understand the reason. can you please post the results on your hardware?
Please note that this is ugly code and it should be used for testing purpose only |
Hi, Download
Upload
Result is very promising, especially a huge gain in parallel workload. I'm not sure what's the reason but OpenSSH is quicker than Feb 8th's test even with a slower cipher. But we are very close in single/dual stream now, for stream more than 4 we are equal in speed. However OpenSSH still have a lower CPU usage. I've do a profile when I have time. |
Hi, thanks for you tests, your results are different from mine, in my tests uploads are quicker than downloads, but I have to use a ramfs based filesystem since my laptop's ssd is not quicker enough for these tests, so my results are probably not real. Do your baseline results include these 2 patches? https://github.com/drakkan/sftp/commits/copy I think these could be merged quickly upstream. I added support for proxy protocol, can you please try to balance the loads between two or more instances? For example using an haproxy configuration like this one (tested on ArchLinux with haproxy 2.1.3)
you have to set In these tests you have to use an sql based data provider or different sqlite databases, if you share the same sqlite database between two or more instances you can, randomically have a "database is locked" error as happen, sometime, in this test case: https://github.com/drakkan/sftpgo/blob/master/sftpd/sftpd_test.go#L1093 thanks for your patience |
Hi, Downloadaes128-ctr:
[email protected] 8 streams:
Uploadaes128-ctr:
[email protected] 8 streams:
|
Thanks! If you use the I would like to see the results behind a proxy too so we have all the info. My allocator patch can take a while before being accepted upstream, I need to rewrite it in an acceptable way and this will require a refactoring in pkg/sftp. I would like to summarize all the info and the required patches described in this issue and add them to the performance section of the README, are you interested to submitting a pull request? |
Hi, With HAProxy we have a increased performance, but at high load the CPU usage of HAproxy it self is also high, for example I met CPU bottleneck at 8 streams. Downloadaes128-ctr:
Uploadaes128-ctr:
|
Thanks. This is strange on my laptop I see a very small performance loss using haproxy. So haproxy can help with a better cpu or if the load is balanced between different servers. I would like to summarize all the collected info, and add them to the README. This way an user interested to SFTPGo performance can quickly read the wanted info without reading all the posts here, are you intereseted to send a pull request? |
I'll try to summarize all elements when I have time. What I don't understand is even with only one backend, passing by HAProxy give me a performance improvement for 1-3 streams. It's the PROXY implementation more efficient than normal TCP ? |
no hurry, thanks!
Proxy implementation simply reads the initial proxy header and then is a normal TCP connection, this is what happen on Go side, maybe haproxy itself do some other optimizations. Do you get the same result connecting on localhost? When connecting through haproxy I have a small performance loss (tested on localhost only using sftp CLI) |
Hi, I can confirme that run locally with HAProxy gives me a small performance loss: And download & upload speed are very close. Edit: update to go 1.14 give me about 10MB/s gain. |
ok, so haproxy has some internal optimizations and it can be useful on localhost too if the cpu is not the bottleneck. Regarding go 1.14 I also noticed the performance increase, anyway I will be a bit conservative here, the binaries for the 0.9.6 release (that should happen quite soon) will still be builded using 1.13.x, I have no direct code that should handle EINTR but some depencies could. The optmizations in my copy branch are now merged upstream and I updated sftpgo to use pkg/sftp git master so they are available to anyone using sftpgo git now |
To summarize, to match OpenSSH performance we need:
There is no specific patch for SFTPGo itself |
Hi, have you thought about creating a experimental branch, to get more feedback? |
Hi, now that I have released 0.9.6 I want to try to improve my allocator patch and discuss its inclusion in pkg/sftp, if this cannot happen maybe I'll create an experimental branch, let's see |
Including an experimental release then, so current (edit: and new) users can test the new version without the need to compile from source... |
@HiFiPhile I'm working to submit a PR with an allocator for pkg/sftp, I have 4 different implementations, can you please report your results using my test branches?
these changes should be applied to the Optimized configuration. The internal benchmark here: https://github.com/drakkan/sftp/blob/allocator/allocator_test.go#L54 has a clear winner, but I think you will get very similar results in a real test thanks! |
Hi,
Here is the trace of allocator 2: |
Sorry my bad, the optimized mode is disabled by default, you have to explicitly enable it:
|
Now it looks better :) There are quite similar in speed:
|
thanks! Are these benchmark for AES-CTR? sha256-simd is in git master already, it improves performance on arm64 too (but on arm64 OpenSSH is much faster 70MB/s vs 110MB/s for both uploads and downloads on a jetson nano). Fo the aes patch for Golang I cannot do anything, I'm unable to review that patch and to ensure that it is correct so I'm a bit reluctant to provide packages compiled with a patched Go. I submitted a pull request to pkg/sftp using the allocator1 (based on the internal benchmark it is the fastest one), if it will be merged I'll enable If my patch for pkg/sftp get merged I would like a pull request to add the new results to the performance doc too. EDIT: I filled a bug to add support for GMC ciphers in filezilla, and I sent a pull request to add support for aes256-gcm in golang crypto |
@HiFiPhile, can you please do a last test using current git + AES CTR patch? A pull request with the performance doc update would be really appreciated too. |
@HiFiPhile we could add, to the performance doc, a "Baseline next" configuration (or a better name) which is the current SFTPGo git master. What do you think about? The "Optimized" configuration need to include now only the AES-CTR patch for Go since both sha256-simd and the SFTP allocator are now included in SFTPGo git master. P.S. based on the comment here we didn't use the fastest AES-CTR patch available, anyway I hope that Go developers will include one of these patches in Go 1.15 |
@drakkan how about name it "devel" ? |
No hurry, but since you have now only 2 disks the numbers will be not comparable with the previous ones, we need to document this or to redo the other tests too. For this test I suggest to use go 1.14.2. Thanks! |
If disk speed is relevant, and you have sufficiant RAM, you could use |
Oops, only now read the other thread about ramdisks :-) |
Hi,
Thanks for this great project !
I did some test in my environment and the transfer speed is much lower than OpenSSH.
Under Filezilla I can get 500MB/s with OpenSSH, but only about 200MB/s with sftpgo.
In both case I'm using
AES256-CTR
as cipher andSHA-256
as MAC, I've also triedAES128-CTR
but nothing changes.CPU usage of sftpgo is higher than OpenSSH:
In both case I've got a maximum TCP window size of 4MB.
The text was updated successfully, but these errors were encountered: