-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data loss on OS X in conjunction with hardlinks+symlinks #88
Comments
I can reproduce that OS X can be made to hardlink directories, when you hardlink a symlink that points to a directory. Normally it shouldn't be possible to hardlink directories, but OS X is facepalm-worthily terrible in this regard. In this case it leads us to lose data, because it allows We probably cannot guard against this, as it opens the door to all kinds of race conditions. So it seems to me that the only solution is to stop hardlinking files and copy them instead. (The only case where we might still be able to hardlink is if the source file is in a directory under our control, e.g. in a cache, and we are sure that it's a file,) |
@joliss is this closed now? |
Yes, this is fixed now, and documented in https://github.com/joliss/broccoli/blob/master/docs/hardlink-issue.md. |
@joliss I think that instead of totaly avoid hardlinks, broccoli base read write api should iterate files and create directories instead of simlink them, it's huge loss in term of performance to not use hardlinks. Note: this approach can be activated conditionally when OSX is detected. to avoid iterating on other systems. what do you think ? |
(cc @rjackson) |
Also, we should take in consideration that, the more operations we add to the pipeline the slower broccoli become due to the copy->copy->copy .... cycle, especially with big files 😠 |
We were previously only hardlinking files (as it is not possible to hardlink a directory under Linux), but unfortunately if you hardlink a symlink that itself points to a directory you may have data loss. I completely agree that we should find a way around the copy bottleneck and plan to work on the issue at some point in the near future (as I believe that it will improve Ember's Broccoli build times). |
I've finally managed to reproduce this bug by following the six rules of hardlinking directories. I needed to speed up my build time, so I hacked around it by ignoring anything but regular files, and creating the folder from the dirname of these files. fs.stat always follows a symlink (or chain of symlinks) to its target, so even if we did manage to make link to a directory, it would be ignored. This isn't perfect, as it doesn't replicate the source structure exactly (e.g. empty directories, invalid symlinks), but it has more than halved my build times (~1s down to 400ms). Here's the code I'm currently using - my version of static-compiler / pickFiles uses this instead. Let me know if that's an acceptable workaround and I can submit a pull request / put that into a plugin. |
I know I'm probably missing something, and this has probably already been considered, but why don't we just prevent (or check for) the "hardlink to a symlink pointing to a directory" case? If that works then we can keep using hardlinks, get the fast build times, and avoid the potential data loss problem. |
@aexmachina I'm not sure either. Maybe @joliss can shed some light |
that's the point of my previous comment |
I can't think of a way to do it without race conditions. The file system can always change in arbitrary ways between when you check and when you do the hardlink. That's a short time window so it's pretty rare, but if the result is data loss "pretty rare" is still not good enough. Even if you could do it correctly for some subcases, it's easy to have bugs or race conditions you didn't think about in your code. "Data loss happens if I make a small mistake" isn't a good place to operate in. So sadly it seems we just cannot use hardlinks. The way forward will probably be using symlinks where we used hardlinks before, but it'll need some support in Broccoli core. It's on the radar. :) |
How about replicating the directory structure, hard-linking all regular files, and then copying all symlinks? Only issue I can see with that is it will break symlinks with relative paths, but maybe they can be rewritten. |
Ah I see. Thanks for the clarification. We have full control after a folder is in tmp though, right? One solution is to copy on the first tree, and then hard link on all subsequent ones. Another option is to lock the original file between the stat and the link calls. |
Don't know how much effect it would have on speed, but node-fs-ext has a pretty clean api for locking files. Let me know if there's any interest and I can benchmark that on my implementation of pickFiles. |
@zaius I am doing something like that in broccoli-caching-writer actually. It copies into the cache dir, then links from the cache dir to the dest. So links are only made to items in |
@zaius's suggestion sounds like a good one to me. |
I realise the "solution" approach in #179 seems to be to adopt symlinks, but since the problem is only relevant to OS/X I figured Amit Singh's six rules for hardlinking directories can actually be used to prevent hard-linking to directories from within I have test gist which sets up the structure as per the original gist, but also sets up tmp in a way that makes it impossible to create hard linked directories out of tmp to vendor and application sources. Any attempt will fail with a permission error. This leverages Amit Singh's last rule that states the destination (ie where the hardlink is being created) cannot have an ancestor that is a hard-linked directory. By setting up the structure used to demonstrate the dataloss for I don't see why this couldn't be adopted as the |
@thomasboyt suggests
rm -rf tmp
on OS X might cause files to be deleted outside of thetmp
directory, due to the way some Broccoli plugins use hardlinks combined with idiosyncracies in hardlink & symlink handling on OS X.https://gist.github.com/thomasboyt/9935811
This clearly warrants further investigation.
The text was updated successfully, but these errors were encountered: