-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
io: Copy leaves file zero bytes in Azure AKS with CIFS — Go1.15 regression #42400
Comments
Thank you for reporting this @alhails! Kindly paging @tklauser @ianlancetaylor @alexbrainman. |
Try |
I've reproduced locally by mounting a CIFS share to my NAS drive(!). I get the same behaviour here ie file is left zero bytes, so maybe the azure kernel is out of the frame? PS I'm using debian buster (kernel 4.19.0-12-amd64) if this is still relevant. |
@randall77 Thanks for the suggestion I still get the same behaviour (locally on the CIFS share) with...
|
With amended copy...
I get the same output for both local disk (where file is written correctly) and on CIFS (where is left empty)
PS my test source.txt just contains "123" PPS There is no error returned |
I agree that it sounds like a problem with the This is problematic, though, since we do want to use CC @acln0 |
Tentatively marking as a release blocker for 1.16, since we don't want this kind of surprising breakage in Go. We can remove the label if we discover that the problem only occurs on some limited kernel range or something. |
I reproduced the failure from source (go1.15.4) and then made the tweak you suggested and the problem went away.
If this confirms it is an issue with |
Change https://golang.org/cl/267958 mentions this issue: |
It's unfortunate that it doesn't seem to work for CIFS, where it could have helped significantly. On my home system, I have a SMB share that I tested the program against. Everything worked correctly. My SMB server is FreeBSD 11.2, and my client is Debian 5.8.0-2-amd64. I don't know that we can draw any conclusion from this. It is also unclear to me how exactly CIFS and SMB differ, so perhaps this test I performed is not informative at all. They do differ at least in magic file system numbers, as far as This looks like a kernel bug, but even if we knew for a fact that this is a kernel bug, and we knew the precise range of affected kernels, trying to detect those specific kernel versions from within As for detecting file systems, I think we could call In any case, I have sent CL 267958 for testing and for discussion. I'm not sure that is the right approach at all, but I think it's worth at least a try and a discussion. @alhails Can you please try https://go-review.googlesource.com/c/go/+/267958? Can you also try it with the arguments in reverse? @ianlancetaylor What do you think about calling |
@acln0 Not an experienced contributor, but I think I did this right... I did a Yes this does seem to fix the issue. I also confirmed that it was still making the |
@acln0 Not quite sure what you meant by...
|
Thank you for testing. Sorry for not being clear enough with "in reverse". If we are to choose the way of detecting file system types inside
The fourth case, local -> local, probably works fine, as far as we know. But as I am typing this, I now realize that since you are on Linux pre-5.3, your initial test case was likely remote -> remote, otherwise The other two cases, where only one of the files is on a remote file system, may also be interesting, because the CL I sent only checks the file system type for the destination file. This may be insufficient in the general case, because the bug may exist when the source file is mounted remotely and the destination file is mounted locally. If there is any way that you could check with your existing CIFS share and with a 5.3+ kernel, that would be very helpful. On my machine, with the SMB setup I describe above (5.8 kernel), everything works fine, but if I check the file system type using this function:
... I get |
Ok I've tested all four scenarios with both my installed go (go1.15.2) and your patch All of the scenarios seem to work apart from...
Upgrading my kernel isn't something I've done before. Is this is something that I can do without rebuilding my machine, or upgrading my distribution from debian buster? Update I see it's possible using backports and had a quick look but I've got some dependency issues... |
Rather than mess around with my setup, I set up an alpine vm (kernel 5.4.72-0-lts) This seems to work fine with the same two sets of four tests as I ran above. So maybe this is an indication that it's a problem with the older kernel? PS I did notice that |
From https://man7.org/linux/man-pages/man2/copy_file_range.2.html
From that, I'd guess this syscall should not be tried on kernels prior to 5.3. |
From what @alhails describes in #42400 (comment) the failure occurs only if the file is copied from CIFS to CIFS. Thus, it looks like we would in the worst case need to check the file system for the source and the destination file, i.e. two Given the man page note and @alhails's confirmation that the failure no longer occurs on kernel 5.4, to me it seems the safest solution is to disable use of @ianlancetaylor @acln0 what do you think? |
Change https://golang.org/cl/268338 mentions this issue: |
Could we do the kernel version check at runtime startup if |
I think this would be a much more invasive change. The current solution checks the kernel version once on the first call to |
There is also a similar check for EXDEV which may no longer apply, given that we have raised the effective requirement for copy_file_range to Linux 5.3 or later. |
@acln0 good point. From quickly skimming through kernel code (check on 5.10-rc3) it indeed looks like we can no longer hit |
Agreed. SGTM for 1.17. I don't think the check can possibly do any harm now. |
@gopherbot please backport to 1.15. This is a regression which may be undetectable. |
Backport issue(s) opened: #42550 (for 1.15). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Bugs in the error handling while uploading a file to the backend could cause incomplete files, e.g. golang/go#42400 which could affect the local backend. Proactively add sanity checks which will treat an upload as failed if the reported upload size does not match the actual file size.
Bugs in the error handling while uploading a file to the backend could cause incomplete files, e.g. golang/go#42400 which could affect the local backend. Proactively add sanity checks which will treat an upload as failed if the reported upload size does not match the actual file size.
Possibly related: docker/for-linux#1015 (comment) |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes (1.15.3)
What operating system and processor architecture are you using?
The issue is reproduced on Azure AKS node (Ubuntu 16.04 kernel 4.15.0-1098-azure)
What did you do?
We have a golang API running in Azure AKS. The initial problem was noticed with file upload via http multipart request. After upgrade to go 1.15, uploaded files were left as zero bytes. Before this, uploads were working fine.
My initial suspicion was that the problem was related to the addition of the copy_file_range syscall in os.File ReadFrom, but also our use of a CIFS file share for our running pod.
I have only been able to reproduce in AKS, not locally.
I've reproduced the issue with a very simple golang code example..
When this is executed on the container file system, the file is copied correctly.
When this is executed on a CIFS volume in the container, the file is created but is left as zero bytes.
I now suspect that the issue is with the kernel or CIFS, so have raised an issue with the team responsible for the azure linux kernel - https://answers.launchpad.net/ubuntu/+source/linux-azure/+question/693837
However I thought I might also raise it here, in case anyone here has any thoughts on this issue.
Thanks
Alastair
The text was updated successfully, but these errors were encountered: