Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support relative paths to linked working trees #5898

Merged
merged 4 commits into from
Oct 29, 2024

Conversation

chrisd8088
Copy link
Member

This PR adds an alternative implementation of the function used by the git lfs prune command to locate all linked working trees, in order to ensure that Git LFS continues to be compatible with future versions of Git. This change addresses the recent CI job failures seen in our Build with latest Git jobs; for instance, in this job run for PR #5876.

Since the introduction of support in Git for multiple linked working trees, in version 2.5.0, paths to these working trees have been stored in the .git/worktrees internal file structure as full, absolute paths. At present, Git LFS reads some of these internal files in order to retrieve the full paths to all linked working trees, which it needs when running the git lfs prune command. The gitscanner.ScanIndex() method invoked by git lfs prune results in two git diff-index commands being run in each linked working tree. One of those, in particular, scans the staged changes in the working tree's Git index for pointers to Git LFS objects, as any such references mean that the corresponding Git LFS objects should be retained and not pruned.

The next release of Git (version 2.48.0) is expected to include changes to the representation of linked working tree file paths in the .git/worktrees internal file structure. As of commit git/git@717af91 the paths in the .git/worktrees/*/gitdir files will be stored as relative paths rather than absolute ones, while still supporting older gitdir files containing absolute paths.

While we could continue to read those gitdir files and try to detect relative paths, and then join those to the full path of the main working tree, we can avoid this complexity and the fragility it implies by using the git worktree list command instead. Since Git version 2.36.0 this command has offered both a --porcelain mode and a -z option, which ensures any file paths with unusual characters can be easily parsed.

We therefore introduce an alternative implementation of our GetAllWorktrees() function—which we rename from GetAllWorkTrees(), for consistency with the rest of our codebase—that uses the git worktree list command to retrieve a list of all linked working trees, plus the main working tree, if it exists (i.e., if the main repository is not bare).

Before adding our new implementation, we first expand our Go and shell test suites to check a number of additional cases involving linked working trees, such as trees with detached HEADs, bare main repositories, and working trees that have been deleted but for which Git still has internal metadata.

Because the git worktree list command provides a convenient prunable attribute which indicates whether a linked working tree actually still exists, we also update our legacy implementation to perform a similar check, which means the git lfs prune command can now quickly skip trying to run git diff-index in nonexistent working trees.

This PR should be most easily reviewed on a commit-by-commit basis.

The GetAllWorkTrees() function in our "git" package was introduced as
part of the implementation of the "git lfs prune" command in PR git-lfs#742.
This function examines the contents of a repository's ".git/worktrees"
folder and returns an array of Worktree structures for any linked
working trees it finds, plus one Worktree structure for the main
working tree, if one exists.  The "git lfs prune" command then checks
these working trees' current Git references and indexes for any Git LFS
objects it should retain in the Git LFS object cache.

In subsequent commits in this PR we expect to add an alternative
implementation of the GetAllWorkTrees() function which uses the
"git worktree list" command, as this will ensure we remain compatible
with some forthcoming changes in Git that will alter the contents
of the "gitdir" files in ".git/worktrees" hierarchy.  We will need
to retain the legacy implementation of the GetAllWorkTrees() function
as well, as prior to Git version 2.36.0 the "git worktree list" command
is either not available or does not support the full set of options we
will require.

Before introducing our new version of the GetAllWorkTrees() function,
we first expand the set of tests and checks we perform against it
and the "git lfs prune" command.  This will allow us to verify that
both implementations return identical data under the same conditions.

In particular, in our existing TestWorkTrees() Go test function we add
a linked working tree which references a detached HEAD in the form of
local tag instead of a branch, and confirm that its SHA is also used
as its name, and that its type is returned as RefTypeOther.

We also make use of the NoError() assertion to check the returned
error value from the GetAllWorkTrees() function.

As well, we add a new TestWorktreesBareRepo() Go test function which
checks that when GetAllWorkTrees() is run in a bare repository it
does not return a Worktree structure at all, as there is no main
working tree.  To create the bare repository we add a NewBareRepo()
wrapper function to our "util" test utility package.

In our t/t-prune-worktree.sh shell test script, we slightly expand
the set of checks performed by our "prune worktree" test, just to
clarify which objects are pruned by our "git lfs prune" command when
a linked working tree has been removed but the "git worktree prune"
command has not yet been run.

Finally, we add a "prune worktree (bare main)" test to our test
script which confirms that when the "git lfs prune" command is
run in a working tree linked to a bare repository, no Git LFS objects
are pruned, even if they match no retention configuration settings.
This is because the bare main repository has no Git remotes, so
all Git LFS objects are treated as never having been pushed to a
remote and are therefore all ineligible for pruning.
The GetAllWorkTrees() function in our "git" package was introduced as
part of the implementation of the "git lfs prune" command in PR git-lfs#742,
along with the Worktree structure and various related functions, such
as the pruneTaskGetRetainedWorktree() function run by the "git lfs prune"
command.

In all instances, we use the terms "Worktree" and "worktree" rather
than the camel-case "WorkTree" or "workTree", except in the name of
the GetAllWorkTrees() function and its associated TestWorkTrees() test
function.

Before we add another implementation of the GetAllWorkTrees() function
in a subsequent commit in this PR, we first revise the function's name
to match that of the structures it returns, which slightly improves the
consistency of our naming scheme.
The GetAllWorktrees() function in our "git" package was introduced as
part of the implementation of the "git lfs prune" command in PR git-lfs#742.
This function examines the contents of a repository's ".git/worktrees"
folder and returns an array of Worktree structures for any linked
working trees it finds, plus one Worktree structure for the main
working tree, if one exists.  The "git lfs prune" command then checks
these working trees' current Git references and indexes for any Git LFS
objects it should retain in the Git LFS object cache.

In subsequent commits in this PR we expect to add an alternative
implementation of the GetAllWorktrees() function which uses the
"git worktree list" command, as this will ensure we remain compatible
with some forthcoming changes in Git that will alter the contents
of the "gitdir" files in ".git/worktrees" hierarchy.  We will need
to retain the legacy implementation of the GetAllWorktrees() function
as well, as prior to Git version 2.36.0 the "git worktree list" command
is either not available or does not support the full set of options we
will require.

The "git worktree list" command reports when a linked working tree
no longer exists and so its data in the ".git/worktrees" hierarchy
could be pruned with the "git worktree prune" command.  Our new
implementation of the GetAllWorktrees() function will therefore be
able to return this prunable state as a flag in the Worktree structure.

The pruneTaskGetRetainedIndex() function run by our "git lfs prune"
command calls the ScanIndex() method of the GitScanner structure in
our "lfs" package, whose internal functions then create two
DiffIndexScanner structures and invoke their Scan() methods repeatedly.
That structure's initialization function, NewDiffIndexScanner(),
uses the DiffIndex() function in our "git" package to start a
"git diff-index" command and pipe its output into a buffer, which
is then consumed by the DiffIndexScanner's Scan() method.

As we run these "git diff-index" commands in each linked working
tree, if it no longer actually exists, the commands simply fail.
We silently ignore these failures because the DiffIndex() function
uses the gitBufferedStdout() function to start the command, and
that discards the stderr stream, we never call the Wait() method
of the underlying Cmd structure from the "os/exec" package, so we
also discard the exit code from the "git diff-index" command.

However, we can avoid running unnecessary "git diff-index" commands
entirely if we detect that a linked working tree no longer exists.
As this will be straightforward in our new implementation of the
GetAllWorktrees() function, we first update our existing version
of the function to perform a check similar to one made by Git,
and return a true flag value in our Worktree structure if the
working tree is missing.

We add a Prunable flag to the Worktree structure, and check it
in the pruneTaskGetRetainedWorktree() function so we only run the
pruneTaskGetRetainedIndex() function if a linked working tree's
Prunable flag is false.

We then expand our existing TestWorktrees() test function to
validate that when a linked working tree is removed, the Prunable
flag is set as we expect.

Note that our "prune worktree" test in the t/t-prune-worktree.sh test
script already performs checks of our "git lfs prune" command after
a linked working tree is removed.  These checks pass without the
changes in this commit because the "git diff-index" commands we try to
run in the working tree simply fail, and we then ignore those failures.
With the changes in this commit, the behaviour of the "git lfs prune"
command remains the same, but it no longer tries to run "git diff-index"
in non-extant working trees.
The GetAllWorktrees() function in our "git" package was introduced as
part of the implementation of the "git lfs prune" command in PR git-lfs#742.
This function examines the contents of a repository's ".git/worktrees"
folder and returns an array of Worktree structures for any linked
working trees it finds, plus one Worktree structure for the main
working tree, if one exists.  The "git lfs prune" command then checks
these working trees' current Git references and indexes for any Git LFS
objects it should retain in the Git LFS object cache.

As noted in previous commits in this PR, to check for Git LFS objects
which should be retained, we run the "git diff-index" command in each
extant linked working tree.  Specifically, the commands are started by
the DiffIndex() function in our "git" package, which is run by the
NewDiffIndexScanner() function called by the internal functions invoked
by the ScanIndex() method of the GitScanner structure in our "lfs"
package.  That method is run by the pruneTaskGetRetainedIndex() function
of our "git lfs prune" command.

In order to execute the "git diff-index" command in each linked working
tree, we require the full, absolute paths of the working trees.  At the
moment, our GetAllWorktrees() function expects that it can retrieve these
full paths from the "gitdir" files in each working tree's subdirectory
under the ".git/worktrees" directory.

However, forthcoming changes in Git, likely to be released in Git
version 2.48.0, will alter the contents of these "gitdir" files to
contain relative paths.  As a result, we will no longer be able to
rely on them to determine a full path to each linked working tree.
In particular, commit git/git@717af91
causes Git to now write relative paths into the "gitdir" files,
while still supporting any legacy absolute paths found in existing
"gitdir" files.

We can avoid the need to try to parse these relative paths and
determine the correct full path by using the "git worktree list"
command to retrieve the list of linked working trees, plus the
main working tree, if any.  This command returns full paths along
with the current Git references associated with each working tree,
plus several additional attributes.

Prior to Git version 2.36.0, though, the "git worktree list" command
either did not exist, lacked support for the "--porcelain" option,
or lacked support for the "-z" option, so we continue to use our
existing GetAllWorktrees() function's implementation when the
available Git version is lower than 2.36.0.  We therefore rename
the existing version of our GetAllWorktrees() function to
getAllWorktreesFromGitDir(), and invoke it at the top of our new
version of the GetAllWorktrees() function if the Git version is
not 2.36.0 or above.

Our new implementation of the GetAllWorktrees() function simply
calls the "git worktree list --porcelain -z" command and parses
the output, creating the same array of Worktree structures as if
we had parsed the contents of the ".git/worktrees" hierarchy
directly, except that we can depend on the Git command to resolve
any relative paths in "gitdir" files into absolute ones for us.

We do need to call the Clean() function of the "path/filepath" package
to ensure the directory paths are reformatted to match those returned
by the Dir() function of the same package, since that is used by our
legacy GetAllWorktrees() function.  On Windows in particular the
Clean() function will adjust the directory path separators to
align with the default for the OS.

Our parsing logic requires that we find at least an initial "worktree"
line and a subsequent "HEAD" line in the output from the "git worktree
list" command.  Assuming those are found and have non-empty values,
we can return a Worktree structure containing them.  However, if
we also find a "bare" attribute in a subsequent line, we do not
return a Worktree structure for this entry, as we want to ignore
bare main repositories in the same manner as our existing
GetAllWorktrees() function does.

Conveniently, the "git worktree list" command checks if the paths it
returns actually exist, and if they do not, it includes a "prunable"
attribute in its output.  If we detect this attribute we then set the
Prunable flag that we introduced into our Worktree structure in a prior
commit in this PR.
@chrisd8088 chrisd8088 requested a review from a team as a code owner October 29, 2024 02:06
git/git.go Show resolved Hide resolved
git/git.go Show resolved Hide resolved
@chrisd8088
Copy link
Member Author

I should note that the updated tests in this PR, both the Go and shell tests, run successfully using Git v2.35.0. That verifies that our existing GetAllWorkTrees() code is executed with a Git version prior to v.2.36.0 and that the revised tests work as expected.

Our CI jobs currently run the tests against Git v2.0.0 (which doesn't support multiple working trees at all) and Git v2.47.0, plus a build of Git from the upstream development project. The jobs with Git v2.0.0 therefore skip all the worktree-related tests and code, while the other two jobs both test our new implementation which uses git worktree list. In the case of the jobs with the build of upstream Git, relative paths are used internally by Git in its working tree metadata, and that's what exposed the incompatibility with our existing approach to reading the paths to linked working trees.

However, none of these runs verify that our old approach is still used and works for Git versions between 2.5.0 (when multiple working trees were introduced) and 2.35.0, which is the last version without support for the -z option to git worktree list. For those Git versions, we fall back to our legacy implementation, and so I've manually run tests with Git v2.35.0 to confirm that it still works and our tests all succeed.

@chrisd8088 chrisd8088 merged commit 53d69fd into git-lfs:main Oct 29, 2024
10 checks passed
@chrisd8088 chrisd8088 deleted the use-worktree-list branch October 29, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants