don't clobber an existing file when building the site #1374
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes #1367
Analysis of the issue:
A file is created with the name given in its reference, even if this doesn't match the actual file name (for example because we use a case-insensitive OS, or symlinks).
For example, if we have a reference in
page.md
to<img src=IMAGE.jpg>
and a symlink fromimage.jpg
to an actual file on disk calledhorse.jpg
, we will create_file/IMAGE.xxxx.jpg
, and reference this correctly inpage.html
; the build will not carry any information about the original "true" filename.However, if several references to the same file use inconsistent names that end up being the same file on disk (e.g.
_file/IMAGE.jpg
and_file/image.jpg
on a case-insensitive system), then we might have an issue when serving the site — if the server itself is case sensitive it will only be able to serve one of the references, and will 404 on the other(s).Approach taken in this PR:
Thankfully, the build process is already careful to write each file once. We now enforce this: before writing a file to a destination in the output root, we check that it is not clobbering an existing file. If that is the case, then it means that the destination has not been deduplicated properly, and on a case insensitive OS this is most probably due to inconsistent casing (or maybe inconsistent encoding of unicode characters) across the site.
Since this can might problems when hosting, as described above, we now break the build. We report a "File name conflict over (found path)" with no assumption made about the nature of the error. Hopefully, if it's a user error, the cause will be visible by comparing the paths shown in the log.
I have tested this with
IMAGE.jpg
andimage.jpg
, and alsoÉTÉ.jpg
vsété.jpg
. Since it's enforced in the effects’ copyFile and writeFile methods, it's not limited to “files”—for instance, it also breaks if you are using inconsistent naming in the style front matter option:STYLE.css
vs.style.css
.Notes:
output
directory that we just created—this ensures that there are no issues with symlinks. (Also it's only used to report the error to the user, not to detect the situation.)It's very difficult to create unit tests that will run on a case sensitive ubuntu; should I create a test to make sure that we fail on darwin (local testing) and win32 (CI testing)?
previous branches:
supersedes #1369
supersedes #1373