-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More optimization #397
More optimization #397
Conversation
Hm… benchmarks are wildly variant. As noted, the delayed benchmarks show vastly bigger improvement on speed, and throughput. Allocation memory used by Writes and ReadFrom are down ~99% and ~91% respectively. For Writes+Delay, and ReadFrom+Delay, ~75% and ~70% respectively. Allocs per op are down -14% for ops <= 32k, but up about 20% for bigger ops. These are nice:
Vs. this PR without concurrency updates: Except delayed operations, throughput all stays pretty similar. Delayed Read/Write/ReadFrom all show more speed, much more throughput, and about 20–25% more allocs/op. ReadFrom without delays uses about 23 times more memory usage on each buffer size. (Could be bad Writes <= 32k show 90% less memory usage, but for bigger writes -50% (less) to +27% (more) memory. (probably due to greater concurrency being enabled as buffer size grows) Throughputs are all relatively unchanged. I think with the throughputs being mostly a wash, memory usage being down, despite a larger number of allocs/op, the tradeoff for better performance on delayed lines is probably a good trade. Well, once I figure out the 23x more memory being used by |
Hi, I did a quick test after applying this patch 0001-replace-sync.Pool-with-our-allocator.zip the results are quite similar:
|
Hi, with this patch applied I have a failure in SFTPGo test cases here. Basically the server calls |
e0c3eed
to
e44df2c
Compare
I have committed at least somewhat of an improvement on the I’m not sure if I want to do that improvement now, or in a follow up. I think the general improvements here have been waiting long enough, that we can commit this, and cut a release, then follow up with a more parallelized P.S.: I mean, after we’ve resolve the apparent regression issue pointed out from @drakkan |
oops sorry, I'm using my branch and I completely forgot about this issue. Are you unable to reproduce the issue and do you want a reproducer or a test case? If so I'll try to find some time, thanks |
Hi, I finally did some tests. As soon as I close the channel from SFTPGo I see this log client side:
but since I'm limiting bandwidth server side, SFTPGo hasn't received 3014656 bytes yet, so the connection stays open (server side) until SFTPGo receives all the pending data. So in my test case I just need to increase the wait timeout, now it takes about 5 seconds while before the pending data was sent in less than 1 second. There are now more parallel writes and therefore more pending data. In short this is not a bug. Sorry for the noise and being so lazy |
Hm, 🤔 I think we would prefer that the If so, there is a certain amount of “maxConcurrency” that we should see. It might be good to tune it and find an appropriate timeout value based on that amount of |
Hi, to understand the issue I used this test program
after the upload starts, from the SFTPGo web interface I select the connection and click "Disconnect", this will call The test program exits printing something like this:
from the SFTPGo web interface I see something like this: so the connection is still alive even if the sftpclient program is now dead. The connection ends when the size is the same one printed from the client. I think these data are accumulated inside a I could be wrong but I don't see any issue here:
|
If I repeat the test using pkg/sftp v1.12 I have this output client side:
so no error is detected, you fixed this, great! Server side I have:
the received bytes (2457600 in this case) always match the ones printed from the client. If I use this branch the received bytes are always a bit smaller, for example:
and the server logs:
do you mean this in the comment above? |
I was just trying to understand the scope of what you were reporting. The fewer bytes on the server side could be from an error happening ahead of the end of the full transfer, which would result in that error prioritizing ahead of the end of the full transfer. As long as you’re good with things, I’m probably good with things. Although, I’ve kind of wanted to build some tests of the parallel execution to make sure that it is actually commiting things in order, but I think it would require reworking the |
Hi, I'm only reporting the difference between v1.12/git master and this branch just in case you want to indagate. I think losing some bytes in error cases is not a problem |
Agree, dropping bytes is not automatically bad in an error state. At least, as long as any Hm. 🤔 But also now that I think about it, it’s probably technically possible that we could write something with this code like Although, without reverting to a sequential writing method, I don’t think there is any way to guarantee that we don’t possibly perform an overwrite like this… But then, callers of I’ll work on drafting some language. Should be something like, “note in the case of an error, |
Hi, please consider the upload/resume use case:
If the written bytes are not sequential the resume will produce a corrupted file. Am I missing something? (I haven't read the code) |
Yes, that’s exactly the case that I’m referring to. The problem is that we cannot just automatically
However, if we’re transferring a file, and there is no pre-existing data then we have:
|
this will probably not work if the client is killed/forcibly terminated or similar. It is a separate/additional step. Please read this part of the specs:
What do you think about making non-sequential writes configurable?
|
I can definitely change the code around to do sequential confirmed writes, but that is also going to kill a lot of the performance gains for high latency connections. On the other hand, it makes the most common case code extremely simplified. |
If we don't want to make sequential writes configurable I think this is the best thing to do. For what I understand sequential writes can leads to unexpected bugs if the client process is forcibly terminated. We still need to allow sequential writes for SFTP servers that ignore the offset in Thank you for all these efforts! |
Do you mean non-sequential writes here? Because sequential writes are kind of the most reliable way to avoid bugs, even if the client process terminates. At least in that case there isn’t a possibility of an out-of-order write to a file. I’m thinking I make two parallel types of functions, one sequential (like My only remaining concern then is the |
yes, sorry for the typo
as long as it is configurable it is good for me
the user have to explictly set a max packet > 32768 to trigger this usage. If so I think a documentation note is enough |
No, this happens any time the write buffer is > max packet, regardless of whatever the user has set explicitly. (My idea on the out-of-order read/write testings was even to set a So by default, if I try and write say, a |
Oh yes
but since io.Copy use a 32768 buffer I still think a doc note could be enough |
Right, but |
OK, default behavior is now to avoid concurrent/out-of-order writes. You must explicitly enable it with I have also written in concurrent reads into I found a bug in my I also realized that there was a race condition accessing and setting I’ve polished all the other code as well. I’ve tried to document all the concurrent code well enough. If we can get some good solid testing on this, and/or eyeballs on the code. 😰 |
@puellanivis thank you for working on this. I did some tests and the issue reported above is fixed. Also SFTPGo test cases have no issues now. I also did some manual testing looking at used file descriptors etc.. and all seems fine. I don't readed the code in depth, at first look it seems ok. I wonder why for packets methods some methods have a pointer receiver and others don't. What do you think about something like this?
If I understand the code correctly the issue reported here is not fixed, am I wrong? |
After some (annoying to setup but) quick benchmarks, yeah, it does look like using pointers here would work slightly better. |
Thank you for confirming, do you want to integrate this change yourself or do you prefer a separate PR? |
0115f12
to
0c6e973
Compare
…es a two-stage write
0c6e973
to
861a8ea
Compare
Thank you! I think this PR should be merged now so that others can test it too. Do you have other pending changes? |
I don’t think there’s anything on my todo list, so it should be good to merge. Then we can get some people to maybe bleeding-edge update to master, and give it a good work out ahead of cutting a release. |
This branch is used in sftpgo main now, thank you |
@puellanivis thanks for this feature, I never thought I'd use concurrent writes, instead they are very useful on high latency networks, I had a performance improvement from 5MB/s to 15MB/s by enabling them, great! |
MarshalBinary
for packets must now include 4 empty bytes at the start for filling in the length of the packet.marshalPacket
that can return both the packet and payload as separate byte-slices, allowing the payload to be written separately without having to copy it into a separate marshaled packet slice first.chan result
in sending packets to the server to guarantee at-most-once delivery on each channel, meaning we can halve the space overhead from allocating a buffered channel (happens nearly every request).request-server_test.go
client_integration_test.go
that were causing integration tests to lock up, be flaky or just not work.WriterAt
ReadAt
,WriteAt
,ReadFrom
.ReadAt
orWriteAt
can be done in one request, short-circuit concurrency and just do the request straightReadFrom
can be done in one request, and short-circuit to a synchronous loop (this is to be absolutely sure theio.Reader
is read toio.EOF
, even though it’s strongly likely that it will only ever run one loop.)WriteTo
efficiently with the Map→Worker→Reduce paradigm. For sure though, it won’t end up as similar as the other three are to each other.