-
Notifications
You must be signed in to change notification settings - Fork 286
Fix macos filename encoding (obsolete PR) #936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This
seems totally implausible. It's saying that when you run |
|
@rfjakob You're right, if I run
I do not presume to understand exactly why macFUSE volumes behave differently than normal volumes on Mac, but since the problem seemed to be established (it certainly fits what I have been experiencing and what is described in issue #850), and since a fix has been found by Cryptomator team, I thought it was useful to implement it for gocryptfs too. |
|
These links all talk about problems sharing files from macos to linux and windows, no? |
|
That is the case for the Cryptomator link, yes. But the problem is broader than sharing files between systems because a drive mounted with gocryptfs is treated by macOS the same way it would treat a drive on a remote Linux machine. I have found on https://eclecticlight.co/2021/05/08/explainer-unicode-normalization-and-apfs/ an explanation why the problem does not appear on a normal Mac volume : "What happens in HFS+ is that, if you try to name an item using Form C, the file system automatically converts it to Form D. So although you may have provided two different names, HFS+ normalises them both to Form D, in which they really are identical. (...) APFS doesn’t normalise, which makes it a bit more efficient, but a normalisation layer in macOS ensures that file and directory names are normalised, so APFS behaves just like HFS+ did. Except that they don’t always. (...) More likely and more serious are the conflicts which can occur when using different methods of accessing non-Mac file systems. Thomas Tempelmann has demonstrated this using a share on a NAS, which he mounted first using NFS, then created a file with a name in Form C. When mounted via SMB on a Mac, as that filename is un-normalised, it can’t be accessed" So I don't find it surprising that a problem occurs when mounting a volume with macFUSE : gocrypts encodes filenames in NFC, but here the normalisation layer which macOS normally applies to HFS+/APFS volumes does not apply. This is why it needs to be separately implemented, like it was in Cryptomator. |
This PR aims at solving the issue described in #850
Since I am no professional programmer, I used Claude Sonnet 4 to try to implement the solution used in Cryptomator to solve a similar problem. In course of this PR, another problem was discovered regarding the implementation of dirstream on macOS, and I tried to solve it also.
For the sake of transparency, here is a detailed summary of the changes implemented in this PR - I hope it's not too verbose but I thought it might help better understand the logic behind the PR and allow you correct it if need be:
macOS Compatibility Improvements for gocryptfs
This document describes the major changes made to improve gocryptfs compatibility on macOS, addressing critical issues with Unicode filename handling and directory listing functionality.
1. Unicode Normalization Fix for Forward Mode
Problem Statement
On macOS, there is a fundamental mismatch between how different applications handle Unicode normalization for filenames containing accented characters (like "café"):
café=c+a+f+é(4 characters, whereéis U+00E9)café=c+a+f+e+́(5 characters, whereé=e+ combining acute accent U+0301)This caused serious usability problems in gocryptfs on macOS:
touch, andcatcouldn't read files created by FinderSolution: Cryptomator-Inspired Approach
The implementation adopts the approach used by Cryptomator, specifically described here:
Core Principles
Algorithm Flow
Implementation Details
Normalization Functions
Added core normalization functions in
internal/fusefrontend/node_dir_ops.go:File Creation Operations
Updated all file creation operations to normalize input to NFC:
internal/fusefrontend/node_open_create.gointernal/fusefrontend/node_dir_ops.gointernal/fusefrontend/node.goLookup with Fallback and Migration
The core innovation is in
internal/fusefrontend/node_prepare_syscall.go. Completely rewrote theprepareAtSyscallfunction to implement Cryptomator's lookup logic:Migration Logic
The
migrateFilenamefunction handles moving NFD files to NFC:Directory Listing
Updated
Readdirininternal/fusefrontend/node_dir_ops.goto return NFD for macOS GUI compatibility:Files Modified
internal/fusefrontend/node_prepare_syscall.go: Core lookup and migration logicinternal/fusefrontend/node_dir_ops.go: Normalization functions and directory operationsinternal/fusefrontend/node_open_create.go: File creation normalizationinternal/fusefrontend/node.go: Symlink/device/hardlink normalization2. Unicode Normalization Fix for Reverse Mode
Problem in Reverse Mode
Reverse mode presents an encrypted view of plaintext files. The Unicode normalization problem affects how encrypted filenames are decrypted back to plaintext names that must match existing files on disk.
After decrypting an encrypted filename, the resulting plaintext name might not match the actual file on disk due to Unicode normalization differences:
café(NFC form)café(NFD form) - created by FinderSolution Approach
Unlike forward mode, reverse mode should not modify the plaintext filesystem. Instead, we implement fallback lookup logic: if the decrypted name doesn't exist on disk, try the alternate Unicode normalization form.
Key Principles
Implementation
Enhanced
rDecryptNameFunctionIn
internal/fusefrontend_reverse/rpath.go, added Unicode normalization fallback logic:Key Differences from Forward Mode
Files Modified
internal/fusefrontend_reverse/rpath.go: Added Unicode imports and enhancedrDecryptName()with fallback lookup logicinternal/fusefrontend_reverse/node_dir_ops.go: Added Unicode imports and helper functions3. Directory Stream Implementation - macOS Compatibility Fix
Problem Discovery
During the implementation of Unicode normalization, we discovered a critical bug in directory listing functionality. While file operations worked correctly, directory listings (via
ls, Finder, etc.) would show empty directories even when files were present. This might be the same problem as the "ghost" mountpoint mentioned in #898Symptoms
ls,find, Finder) showed empty directoriesRoot Cause Analysis
The issue was traced to a presumed incompatibility between the go-fuse library's
NewLoopbackDirStreamFdfunction and macOS/APFS filesystem behavior:os.File.Readdirnames()could successfully read directory entriesfs.NewLoopbackDirStreamFd()consistently returnednilentries immediately, indicating "end of directory"Why This Affects macOS Specifically
Solution: Platform-Specific Directory Stream
Implemented a custom directory stream that is only used on macOS (
runtime.GOOS == "darwin"), while maintaining the standard go-fuse loopback implementation on other platforms.Implementation
Custom Directory Stream Features
The
customDirStreamstruct implements all required interfaces:fs.DirStream- Core directory stream interfacefs.FileReaddirenter- For reading directory entriesfs.FileSeekdirer- For seeking within directory streamsfs.FileReleasedirer- For cleanupfs.FileFsyncdirer- For sync operationsKey implementation details:
os.File.Readdirnames(-1)Files Modified
internal/fusefrontend/file_dir_ops.go: Addedruntimeimport, modifiedOpendirHandle()to use platform-specific directory streams, implementedcustomDirStreamtype4. Platform Compatibility
All implementations are macOS-specific (guarded by
runtime.GOOS == "darwin") because: