Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gix fetch with fast-forward support #548

Merged
merged 34 commits into from
Oct 3, 2022
Merged

gix fetch with fast-forward support #548

merged 34 commits into from
Oct 3, 2022

Conversation

Byron
Copy link
Owner

@Byron Byron commented Oct 1, 2022

  • fast determination of fast-forward, with the use of dates in the currently loaded 'level' nodes that will be iterated to determine a stop condition: if everything is already older than the commit we look for, stop.
  • use dry-run fetch in gix remote refs to provide more detailed information about what would be done to refs - it's OK to do as is I think, as fetch currently also does negotiation even in dry-run mode. Maybe that's something to avoid, maybe it's something that can help later. For now it will simply avoid receiving the pack entirely.
  • gix fetch to perform an actual fetch, with dry-run mode
  • a fetch-test that validates the output filenames - needs pack- prefix
  • an issue with progress reporting blocking the clone the first time you try (if the clone is fast enough)
  • deal with keep-alive lines which are empty data lines before the pack is set and only relevant in V1 which may not send progress before the pack is sent.
  • correctly set protocol version on spawned git process

Performance Issues

  • git_tempfile uses a NamedTempFile under the hood, and that's not buffered. Probably should be wrapped into a buf-writer to assure decent write performance. Right now, 60% of CPU time go into writing a tempfile. BufWriter should be around it in the bundle writer.

Performance Notes

  • fetch the whole linux kernel in ~53s with 8.5 cores, git does that in 3m with 3 cores (of all the times, 40s are used for data transfer of 3.8GB) for a ~3.4x speedup
    • with 1 core, 2m 25s, and git takes 3m 47s (1.55x speedup)
    • with 3 cores like git: 1m 17s (git takes 3m) (2.3x speedup)
    • we use 1.4GB peak memory, git takes 3.05GB
  • fetch git (like done on their CI) is done in 2.0s and git takes 5.9s for 2.95x speedup, and 1.3s of which are transferring 91MB
    • with 1 core in 5.4s (60MB peak memory), git takes 8.2s (1.5x speedup)
    • with 3 cores in 2.81 (63MB peak memory), git takes 6.24s (2.2x speedup)

For local clones, this time saving is absolutely significant as the whole history is typically cloned, and on CI it will be relative to the pack size which is greatly affected by depth=1. The latter seems to be used by actions/checkout@v3 though, and maybe one day @v4 will offer gitoxide instead if the configuration of auth works similarly to the on of git (extraHeaders for instance).

Out of scope

  • async implementation
  • worktree updates
  • merges

Previous PR

…aking)

Breaking indication in commit message has to happen later once things
statbilize a bit more.
…me)`.(#450)

It allows to stop traversals early if all commmits to be traversed are
older than a given time.
The implementation for non-dry-run exists as well, but the test doesn't
quite work yet.
prepare for making fast-forward calculations even if force is specified.
That way we emulate gits behaviour, which does the same unless it's
turned off.
The latter we don't allow yet as it should really be fast enough due
to the date-cutoff during traversal.
Also known as 'anonymous remotes'.
Unfortunately its test is in `git-repository`, but better than nothing.
That way it's easier where the error is coming from in terms of
the protocol.
This just prints the ref-map, which will be determined to have no change
at all.
…it still violates the protocol in dry-run mode as it actually has to
get the pack, apparently the server can't be told to stop otherwise.
Needs more experimentation as well.
This happens if the local transport is used, which goes by V1 by default
and it's probably good to try supporting it if it's easy enough, instead
of forcing V2 which we probably could.
Previously the NamedTempFile would receive every little write request
for millions of objects, consuming considerable amounts of time.

Now a buf writer alleviates this issue entirely.
Changed due to new naming of index and packs.
It is well-intended but is likely to hang the calling process
if for any reason not all process output was consumed.

Even though not waiting for the process leaves it running, it will
stop naturally once its output pipe breaks once once
our handles for it are inevitable dropped at the same time.
This demands explanation. Previously the issue was a blocking transport
implementation on drop, but that has been fixed by simply letting the
child process run into a broken pipe to stop it.

Now existing early won't hang because of the transport being dropped
with data still available on the stdout pipe.

The issue to fix here is that keep-alive packets are not currently
understood in V1, and git can send us empty packs.
Previously this was only done for ssh based connections, and requires
setting an environment variable.
Now V2 is actually used in our local tests as well.
Keepalive packets are side-band only empty datalines that should
just be ignored. This is now happening, allowing longer git
operations to work as they will send keepalive packets every 5 seconds,
and previously we would choke on it.

Note that empty datalines are never send otherwise, making it a
previously unused marker that can safely be skipped.
@Byron Byron merged commit f47c891 into main Oct 3, 2022
@Byron Byron deleted the fetch-pack branch October 3, 2022 11:16
@Byron Byron mentioned this pull request Oct 4, 2022
27 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant