Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug when including symbolic link files to the package's checksum. #937

Merged
merged 3 commits into from
Sep 10, 2021

Conversation

bobeff
Copy link
Contributor

@bobeff bobeff commented Sep 7, 2021

When a file is a symbolic link to some other file the proper behavior when calculating the checksum of a package is to use the link path instead of the content of the pointed file.

When cloning a Git repository on Windows Git puts the link file path as the content of an ordinary file. When downloading tarball the symbolic link files cannot be properly established and Tar writes empty files instead. This leads to different packages checksums compared to the case when the package is downloaded via Git or downloaded on Linux either via Git or via tarball.

To fix the above problem we parsed the content of the tarball and extract the link paths from there. After that, we write a link file with the link path in it overriding the empty file extracted by Tar.

The getPackageFileListWithoutVcs procedure is fixed to return also the symbolic link files and to return files with / separator even on Windows (in order to achieve the same checksum).

bobeff and others added 2 commits September 7, 2021 17:46
When a file is a symbolic link to some other file the proper behavior
when calculating the checksum of a package is to use the link path
instead of the content of the pointed file.

Additionally the `getPackageFileListWithoutVcs` procedure is fixed to
return also the symbolic link files.

Related to nim-lang#127
When cloning a Git repository on Windows Git puts the link file path as
the content of an ordinary file. When downloading tarball the symbolic
link files cannot be properly established and Tar writes empty files
instead. This leads to different packages checksums compared to the case
when the package is downloaded via Git or downloaded on Linux either
via Git or via tarball.

To fix the problem we parsed the content of the tarball and extract the
link paths from there. After that, we write a link file with the link
path in it overriding the empty file extracted by Tar.

Additionally, the `getPackageFileListWithoutVcs` procedure is fixed to
return relative paths to files with `/` character as directory separator
even on Windows, because they are used in the package checksum
calculation and we must have the same checksum for the same package
regardless of the operating system or download method.

Related to nim-lang#127
Comment on lines 312 to 331
when defined(windows):
# On Windows symbolic link files are not being extracted properly by the
# `tar` command. They are extracted as empty files, but when cloning the
# repository with Git they are extracted as ordinary files with the link
# path in them. For that reason here we parse the tar file content to
# extract the symbolic links and add their paths manually to the content of
# their files.
let listCmd = &"{getTarExePath()} -ztvf {filePath} --force-local"
let (cmdOutput, cmdExitCode) = doCmdEx(listCmd)
if cmdExitCode != QuitSuccess:
raise nimbleError(tryDoCmdExErrorMessage(listCmd, cmdOutput, cmdExitCode))
let lines = cmdOutput.splitLines()
for line in lines:
if line.contains(" -> "):
let parts = line.split
let linkPath = parts[^1]
let linkNameParts = parts[^3].split('/')
let linkName = linkNameParts[1 .. ^1].foldl(a / b)
writeFile(downloadDir / linkName, linkPath)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like we will always have edge cases where our hashes are different. Maybe we should instead just ignore symlinks in our hash calculations? How do other package managers handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that for now all edge cases are handled properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, filtering the symbolic links will be harder to implement because where the package's checksum is calculated I already don't have the tar file. The information has to be moved through a few call stack levels in some data structure which will be used only by a part of the code paths. It will lead to a much uglier code IMHO.

@dom96
Copy link
Collaborator

dom96 commented Sep 8, 2021

To be honest I'm not sure why we need this extra logic. Why is it a bug that we use the content of the linked file instead of its path? The resulting hash won't be compromised, if the symlink changes then we'll get a different hash because the new linked file's contents will be different.

@bobeff
Copy link
Contributor Author

bobeff commented Sep 9, 2021

Why is it a bug that we use the content of the linked file instead of its path?

@dom96 On Windows, the links are not actually links, but ordinary files with link paths in them. Without the extra logic, we will have different checksums on Windows and POSIX systems.

@dom96
Copy link
Collaborator

dom96 commented Sep 9, 2021

Ahh, I get it. Thanks for the explanation!

I'll leave to @Araq to give his thoughts.

src/nimblepkg/download.nim Outdated Show resolved Hide resolved
@Araq Araq merged commit 820e609 into nim-lang:master Sep 10, 2021
@Araq
Copy link
Member

Araq commented Sep 10, 2021

Seems fine to me, should even work with Windows UNC paths.

@bobeff bobeff deleted the bugfix/symlinks-checksums branch September 20, 2021 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants