Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it fast and easy to do multiple replacements #13

Closed
twpayne opened this issue Feb 1, 2019 · 7 comments
Closed

Make it fast and easy to do multiple replacements #13

twpayne opened this issue Feb 1, 2019 · 7 comments

Comments

@twpayne
Copy link
Contributor

twpayne commented Feb 1, 2019

chezmoi uses renameio to ensure that files in the user's home directory are replaced atomically.

renameio.TempDir takes the name of the directory where the file replacement will take place so that it can determine a suitable location for the interim file with which the final destination file will be replaced.

chezmoi calls renameio.TempDir once based on the destination directory, but this implicitly assumes that the user's home directory is on a single filesystem. If the user's home directory spans multiple filesystems then different temporary directories will be needed. If the user's home directory is on a single filesystem then the same temporary directory can be re-used.

The safe path is to call renameio.TempDir for every file replacement, but this is very slow (chezmoi might change thousands of files, there's no need to create thousands of temporary directories when only one is needed).

I think renameio needs a function like:

// MaybeNewTempDir returns either one of existingTempDirs if a member (any of
// the keys) can be used as a temporary directory for replacing dest, or a new
// temporary directory otherwise. It is the caller's responsibility to add the
// return value to future calls to MaybeNewTempDir if needed, i.e.
// MaybeNewTempDir does not modify existingTempDirs.
func MaybeNewTempDir(dest string, existingTempDirs map[string]struct{}) string {
  // ...
}
@stapelberg
Copy link
Collaborator

My first reaction is that this probably doesn’t belong into the renameio package, but of course we can revisit that later.

As for how to solve the problem, I’d suggest to get the list of mounts (/proc/self/mountinfo on Linux, not sure if there is a package which provides this info in a portable way) and call renameio.TempDir once for all mounts underneath your target directories.

What do you think?

@twpayne
Copy link
Contributor Author

twpayne commented Feb 2, 2019

Hmmm, I think this does fit in the renameio package because it's quite strongly coupled to how renameio works under-the-hood. A different package to renameio could provide the same functionality of atomic file updates but using a different mechanism which would then require a different implementation of MaybeNewTempDir. Therefore, for me, MaybeNewTempDir should be part of renameio.

I don't think it's necessary to parse the mount table. Another way to do it would be to stat(2) the destination and look at the st_dev field and only use a new temporary directory if the value has not been seen before. I'll see if I can rustle something up.

@stapelberg
Copy link
Collaborator

A different package to renameio could provide the same functionality of atomic file updates

I actually don’t see how? AFAIK, the only portable way across unices is what renameio implements. Other packages (which I have surveyed in golang/go#22397 (comment)) use the same approach.

The implementation you’re suggesting in #15 does not seem to require access to any internals of the renameio package. Hence, I’d prefer it if we could keep it in a separate import path (at least) until we know that it’s useful for a number of people.

I don't think it's necessary to parse the mount table. Another way to do it would be to stat(2) the destination and look at the st_dev field and only use a new temporary directory if the value has not been seen before.

That’s true, but parsing the mount table can be done once at startup, whereas using stat requires one additional system call per file. This might not matter much for the specific use-case you’re interested in, but it does matter for the use-case for which renameio was written (updating https://manpages.debian.org/).

@twpayne
Copy link
Contributor Author

twpayne commented Feb 3, 2019

That’s true, but parsing the mount table can be done once at startup, whereas using stat requires one additional system call per file. This might not matter much for the specific use-case you’re interested in, but it does matter for the use-case for which renameio was written (updating manpages.debian.org).

It's only necessary to call stat once per destination directory, not once per file, which is a significant difference. Of course it's up to the code using renameio as to whether it uses the cache or not, so updating manpages.debian.org would be unaffected (unless you wanted it to be).

A problem with parsing the mount table is that you then have to work out the relationship between the destination directory and the mount points. This is a hard problem and simple string prefix matching is definitely insufficient (see the discussion in golang/go#18358).

@stapelberg
Copy link
Collaborator

A problem with parsing the mount table is that you then have to work out the relationship between the destination directory and the mount points. This is a hard problem and simple string prefix matching is definitely insufficient

That’s a fair point. I suppose the stat-based cache is a good solution. Can you make use of it in chezmoi and we’ll move it to renameio once other people need it, too?

@twpayne
Copy link
Contributor Author

twpayne commented Feb 3, 2019

Yup, sounds good.

@twpayne
Copy link
Contributor Author

twpayne commented Feb 4, 2019

FWIW, twpayne/chezmoi#208 should fix this in chezmoi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants