Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File.Copy: Clone file when possible on MacOS #77835

Closed
hamarb123 opened this issue Nov 3, 2022 · 10 comments · Fixed by #79243
Closed

File.Copy: Clone file when possible on MacOS #77835

hamarb123 opened this issue Nov 3, 2022 · 10 comments · Fixed by #79243
Assignees
Labels
area-System.IO os-mac-os-x macOS aka OSX tenet-performance Performance related issue
Milestone

Comments

@hamarb123
Copy link
Contributor

hamarb123 commented Nov 3, 2022

For File.Copy on macOS we're currently doing the following:

return fcopyfile(inFd, outFd, NULL, COPYFILE_ALL) == 0 ? 0 : -1;

If we instead use copyfile() we can use the COPYFILE_CLONE option (with the existing COPYFILE_ALL obviously, which I think the clone option may include all of the flags of anyway), which allows the file to be instantly copied on supported filesystems, and falls back to exactly the same implementation when it's not available (note - this behaves identically (except with symlinks (ie. this copies the symlink itself I think) and when the destination exists, which we can special case), it just allows APFS to immediately set the data location to the existing file, and when it's updated it will be create a copy since APFS is a copy-on-write filesystem). This is obviously much faster (I tested it on a hard drive with a 30GB file and it copied instantly).

We can implement that in this file:

public static void CopyFile(string sourceFullPath, string destFullPath, bool overwrite)

by splitting the OSX-like platforms code, like we have with the code that sets the creation date.

We should be able to implement it with effectively the same behaviour as whatever the current behaviour is with relative ease. Note: apps like finder clone files in this way.

I'd like to implement this in a PR if approved (ideally by .NET 8).

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 3, 2022
@ghost
Copy link

ghost commented Nov 3, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

For File.Copy on macOS we're currently doing the following:

return fcopyfile(inFd, outFd, NULL, COPYFILE_ALL) == 0 ? 0 : -1;

If we instead use copyfile() we can use the COPYFILE_CLONE option (with the existing COPYFILE_ALL obviously), which allows the file to be instantly copied on supported filesystems, and falls back to exactly the same implementation when it's not available (note - this behaves identically (except with symlinks (ie. this copies the symlink itself I think) and when the destination exists, which we can special case), it just allows APFS to immediately set the data location to the existing file, and when it's updated it will be create a copy since APFS is a copy-on-write filesystem). This is obviously much faster (I tested it on a hard drive with a 30GB file and it copied instantly).

We can implement that in this file:

public static void CopyFile(string sourceFullPath, string destFullPath, bool overwrite)

by splitting the OSX-like platforms code, like we have with the code that sets the creation date.

We should be able to implement it with exactly the same behaviour as whatever the current behaviour is with relative ease.

I'd like to implement this in a PR if approved (ideally by .NET 8).

Author: hamarb123
Assignees: -
Labels:

area-System.IO

Milestone: -

@hamarb123
Copy link
Contributor Author

Can someone also tag this with the performance improvement tag (tenet I think).

@adamsitnik adamsitnik added the tenet-performance Performance related issue label Nov 3, 2022
@adamsitnik
Copy link
Member

It was discussed in the past: #26713

This comment is most interesting: #26713 (comment)

Note that we had to remove COPYFILE_CLONE and switch to calling clonefile() API instead since the former had some unintended sideeffects: mono/mono#10020

cc @filipnavara who contributed dotnet/corefx#37583

@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Nov 3, 2022
@adamsitnik adamsitnik added this to the Future milestone Nov 3, 2022
@filipnavara
Copy link
Member

There's more than one closed PR that attempted to do this. The work can be resurrected. The main problem was duplicating logic between the managed side of the code and the native side of the code that dealt with the file sharing emulation.

@hamarb123
Copy link
Contributor Author

The behaviour discovered in mono/mono#10020 is now documented in the man pages:

     COPYFILE_CLONE         Try to clone the file instead.  This is a best try flag i.e. if cloning fails, fallback to copying the file.  This flag is equivalent to (COPYFILE_EXCL | COPYFILE_ACL | COPYFILE_STAT |
                            COPYFILE_XATTR | COPYFILE_DATA | COPYFILE_NOFOLLOW_SRC).  Note that if cloning is successful, progress callbacks will not be invoked.  Note also that there is no support for cloning directories: if
                            a directory is provided as the source and COPYFILE_CLONE_FORCE is not passed, this will instead copy the directory.  Since this flag implies COPYFILE_NOFOLLOW_SRC, symbolic links themselves will be
                            cloned instead of their targets.  Recursive copying however is supported, see below for more information.  (This is only applicable for the copyfile() function.)

@filipnavara
Copy link
Member

I think the last attempt was #33159.

@hamarb123
Copy link
Contributor Author

If you could find links to all of the previous attempts that'd be great (if they all link to each other in the description that is fine as well) so I can see what was there and what went wrong with it.

If you're happy for me to have a go, can you please 'assign' this issue to me, thanks.

@hamarb123
Copy link
Contributor Author

The main problem was duplicating logic between the managed side of the code and the native side of the code that dealt with the file sharing emulation.

Can you please explain to me what the issue with file locking is specifically related to this feature. I read through the issue you commented with and some other linked ones (including the ones about file locking itself), and I don't understand what it has to do with File.Copy.

The solution that I plan to implement is (with appropriate platform checks and error handling):

  1. Attempt to do a COPYFILE_CLONE_FORCE with copyfile() as the very first thing we do (ie. top line in FileSystem.Unix.cs's CopyFile function I linked in an earlier comment)
  2. If it fails, use the existing code since it seems to work fine for copying a file
  3. If it succeeds, check if we made a symlink and delete it and fall back to existing code. This should be quite fast as copying a symlink should be much, much quicker than copying the average file anyway, so the additional check should end up faster for most cases than what we currently have.
  4. If overwrite = false, that already works with COPYFILE_CLONE_FORCE since it won't overwrite without errors
  5. If overwrite = true, we need to check at some point if the file exists, COPYFILE_CLONE_FORCE should create the EEXIST error in the case the file already exists, and we can deal with it then by deleting it and retrying, this saves us the extra syscall for when the file doesn't exist, and isn't that expensive imo when it does (we could also just check if it does exist, but I think this way is better).

COPYFILE_CLONE_FORCE:

     COPYFILE_CLONE_FORCE   Clone the file instead.  This is a force flag i.e. if cloning fails, an error is returned.  This flag is equivalent to (COPYFILE_EXCL | COPYFILE_ACL | COPYFILE_STAT | COPYFILE_XATTR | COPYFILE_DATA
                            | COPYFILE_NOFOLLOW_SRC).  Note that if cloning is successful, progress callbacks will not be invoked.  Note also that there is no support for cloning directories: if a directory is provided as the
                            source, an error will be returned.  Since this flag implies COPYFILE_NOFOLLOW_SRC, symbolic links themselves will be cloned instead of their targets.  (This is only applicable for the copyfile()
                            function.)

@filipnavara
Copy link
Member

.NET uses the POSIX advisory locks to implement the FileShare option of FileStream on Linux/macOS. File.Copy respects these locks on both the source and target file. The current implementation uses managed code to get file handles for both the source and target file, which happens to do all the necessary advisory lock checks.

If clonefile is used you work with paths instead of handles. Moreover, the destination file handle cannot be open and the file cannot exists. Depending on some conditions you may need to fallback to the regular file copy which needs the handles. That results in a tricky situation that doesn't fit well with the current code structure. The previous PRs handled this by duplicating some of the managed logic inside the SystemNative_CopyFile native method. It worked and passed the unit tests but it was not pretty. Alternative option would be to have a new C shim function specifically for clonefile and have all the advisory lock logic, destination file checks, and fallbacks in the managed code.

hamarb123 added a commit to hamarb123/runtime that referenced this issue Dec 5, 2022
• Use copyfile (with COPYFILE_CLONE_FORCE) when possible on macOS to clone the file while still keeping file locking logic intact
• Split common Unix logic into multiple functions that the macOS implementation uses parts of at different times
• Add string version of ResolveLinkTarget to save the allocation since part of the code needs it
• Need to add tests to check the file is actually cloned so we know if it works or not
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 5, 2022
hamarb123 added a commit to hamarb123/runtime that referenced this issue Dec 5, 2022
• I missed setting StringMarshalling on the LibraryImport
• I missed partial on the CopyFile method implementations
hamarb123 added a commit to hamarb123/runtime that referenced this issue Dec 5, 2022
• Apparently 'Both partial method declarations must be unsafe or neither may be unsafe'
hamarb123 added a commit to hamarb123/runtime that referenced this issue Dec 5, 2022
• I love partial methods
• Move unsafe keyword into the OSX method rather than in method declaration
• Fix accessibility modifier of CopyFile in FileSystem.Unix.cs
hamarb123 added a commit to hamarb123/runtime that referenced this issue Dec 5, 2022
• Fix indentation (should have used spaces)
• Import Microsoft.Win32.SafeHandles in FileSystem.CopyFile.OSX.cs file
• Fix for SA1205 'Partial elements should declare an access modifier'
• Fix typo in FileSystem.CopyFile.OtherUnix.cs of 'startedCopyFile'
• Fix missing openDst parameter code in StartCopyFile
• Fix not checking the nullability in StandardCopyFile, which led to 'Possible null reference argument for parameter'
hamarb123 added a commit to hamarb123/runtime that referenced this issue Dec 5, 2022
• Add missing parameter value (openNewFile) to OpenCopyFileDstHandle
• Fix misnamed variable usages throughout some code
• Properly qualify Interop.ErrorInfo
• This should be the last fix for build issues for this set
hamarb123 added a commit to hamarb123/runtime that referenced this issue Dec 5, 2022
• Add missing nullable ? for dstHandle (which obtains the file lock)
hamarb123 added a commit to hamarb123/runtime that referenced this issue Dec 6, 2022
• Test failures were since copying a file onto itself should cause an error, this fixes that
hamarb123 added a commit to hamarb123/runtime that referenced this issue Dec 6, 2022
hamarb123 added a commit to hamarb123/runtime that referenced this issue Dec 6, 2022
…ain)

• The variable in the if statements weren't updated
• Handling for if readlink fails
hamarb123 added a commit to hamarb123/runtime that referenced this issue Dec 6, 2022
…t#77835'

• The source path should have used sourceFullPath rather than destPath
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 6, 2023
@hamarb123
Copy link
Contributor Author

Can someone re-open my PR #79243 please.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 9, 2023
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Mar 1, 2023
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Mar 31, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 13, 2023
@adamsitnik adamsitnik modified the milestones: Future, 8.0.0 Jun 13, 2023
@adamsitnik adamsitnik added the os-mac-os-x macOS aka OSX label Jun 13, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO os-mac-os-x macOS aka OSX tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants