`gix fetch` with fast-forward support #548

Byron · 2022-10-01T07:48:32Z

fast determination of fast-forward, with the use of dates in the currently loaded 'level' nodes that will be iterated to determine a stop condition: if everything is already older than the commit we look for, stop.
~~use dry-run fetch in gix remote refs to provide more detailed information about what would be done to refs~~ - it's OK to do as is I think, as fetch currently also does negotiation even in dry-run mode. Maybe that's something to avoid, maybe it's something that can help later. For now it will simply avoid receiving the pack entirely.
gix fetch to perform an actual fetch, with dry-run mode
a fetch-test that validates the output filenames - needs pack- prefix
an issue with progress reporting blocking the clone the first time you try (if the clone is fast enough)
deal with keep-alive lines which are empty data lines before the pack is set and only relevant in V1 which may not send progress before the pack is sent.
correctly set protocol version on spawned git process

Performance Issues

git_tempfile uses a NamedTempFile under the hood, and that's not buffered. Probably should be wrapped into a buf-writer to assure decent write performance. Right now, 60% of CPU time go into writing a tempfile. BufWriter should be around it in the bundle writer.

Performance Notes

fetch the whole linux kernel in ~53s with 8.5 cores, git does that in 3m with 3 cores (of all the times, 40s are used for data transfer of 3.8GB) for a ~3.4x speedup
- with 1 core, 2m 25s, and git takes 3m 47s (1.55x speedup)
- with 3 cores like git: 1m 17s (git takes 3m) (2.3x speedup)
- we use 1.4GB peak memory, git takes 3.05GB
fetch git (like done on their CI) is done in 2.0s and git takes 5.9s for 2.95x speedup, and 1.3s of which are transferring 91MB
- with 1 core in 5.4s (60MB peak memory), git takes 8.2s (1.5x speedup)
- with 3 cores in 2.81 (63MB peak memory), git takes 6.24s (2.2x speedup)

For local clones, this time saving is absolutely significant as the whole history is typically cloned, and on CI it will be relative to the pack size which is greatly affected by depth=1. The latter seems to be used by actions/checkout@v3 though, and maybe one day @v4 will offer gitoxide instead if the configuration of auth works similarly to the on of git (extraHeaders for instance).

Out of scope

async implementation
worktree updates
merges

Previous PR

fetch pack #539

…aking) Breaking indication in commit message has to happen later once things statbilize a bit more.

…me)`.(#450) It allows to stop traversals early if all commmits to be traversed are older than a given time.

The implementation for non-dry-run exists as well, but the test doesn't quite work yet.

prepare for making fast-forward calculations even if force is specified.

That way we emulate gits behaviour, which does the same unless it's turned off. The latter we don't allow yet as it should really be fast enough due to the date-cutoff during traversal.

Also known as 'anonymous remotes'.

…n similar to `git fetch` (#450)

…r for some reason (#450)

…lient::fetch::Response` (#450)

Unfortunately its test is in `git-repository`, but better than nothing.

That way it's easier where the error is coming from in terms of the protocol.

)

This just prints the ref-map, which will be determined to have no change at all.

…it still violates the protocol in dry-run mode as it actually has to get the pack, apparently the server can't be told to stop otherwise. Needs more experimentation as well.

This happens if the local transport is used, which goes by V1 by default and it's probably good to try supporting it if it's easy enough, instead of forcing V2 which we probably could.

Previously the NamedTempFile would receive every little write request for millions of objects, consuming considerable amounts of time. Now a buf writer alleviates this issue entirely.

Changed due to new naming of index and packs.

It is well-intended but is likely to hang the calling process if for any reason not all process output was consumed. Even though not waiting for the process leaves it running, it will stop naturally once its output pipe breaks once once our handles for it are inevitable dropped at the same time.

This demands explanation. Previously the issue was a blocking transport implementation on drop, but that has been fixed by simply letting the child process run into a broken pipe to stop it. Now existing early won't hang because of the transport being dropped with data still available on the stdout pipe. The issue to fix here is that keep-alive packets are not currently understood in V1, and git can send us empty packs.

Previously this was only done for ssh based connections, and requires setting an environment variable.

Now V2 is actually used in our local tests as well.

Keepalive packets are side-band only empty datalines that should just be ignored. This is now happening, allowing longer git operations to work as they will send keepalive packets every 5 seconds, and previously we would choke on it. Note that empty datalines are never send otherwise, making it a previously unused marker that can safely be skipped.

Byron added 30 commits October 1, 2022 15:31

first test to validate new sort-by-date with cutoff sorting mode (bre…

0699c7e

…aking) Breaking indication in commit message has to happen later once things statbilize a bit more.

feat: add `commit::Sorting::ByCommitTimeNewestFirstCutoffOlderThan(ti…

86a99a9

…me)`.(#450) It allows to stop traversals early if all commmits to be traversed are older than a given time.

first non-fastforward tests in dry-run mode (#450)

2d0d782

The implementation for non-dry-run exists as well, but the test doesn't quite work yet.

add test to validate actual fast-forwards, without dry-run (#450)

e3b937e

refactor (#450)

8334148

refactor (#450)

c0d3ced

prepare for making fast-forward calculations even if force is specified.

always perform fast-forward checks even if force is specified. (#450)

a34370c

That way we emulate gits behaviour, which does the same unless it's turned off. The latter we don't allow yet as it should really be fast enough due to the date-cutoff during traversal.

thanks clippy

83f2156

Frame for gix fetch (#450)

5b72d27

support for --url for arbitrary urls when fetching (#450)

8c7351c

Also known as 'anonymous remotes'.

feat!: remove gix remote --url in favor of determining the intentio…

92bbe33

…n similar to `git fetch` (#450)

a sketch of how fetching could work, but it fails with the line-reade…

ac17e11

…r for some reason (#450)

fix: Link up Io error as source for error chaining; add Debug to `c…

266395e

…lient::fetch::Response` (#450)

properly implement fetch::Arguments::is_empty() (#450)

df36ede

Unfortunately its test is in `git-repository`, but better than nothing.

Improve error description with local context (#450)

8e9e0b2

That way it's easier where the error is coming from in terms of the protocol.

Fetch works, but needs more printing of information after the fact (#450

57fab9a

)

fix build (#450)

d034882

fix journeytests (#450)

9c9df03

better error messages if no ref-specs are provided where needed (#450)

981488b

first basic printing of result when no change was found (#450)

cd1d2aa

This just prints the ref-map, which will be determined to have no change at all.

Display for remote::fetch::update::refs::Mode (#450)

169a979

first sketch of printing fetch results, but… (#450)

13ac9ba

…it still violates the protocol in dry-run mode as it actually has to get the pack, apparently the server can't be told to stop otherwise. Needs more experimentation as well.

fix hang in V1 mode (#450)

ce9b591

This happens if the local transport is used, which goes by V1 by default and it's probably good to try supporting it if it's easy enough, instead of forcing V2 which we probably could.

support for handshake information in gix fetch (#450)

c47dcc6

adjust fetch-progress range (#450)

6e2a237

also inform about the location of the new pack and index files (#450)

d782ff0

fix: increase pack-receive performance using a BufWriter (#450)

a745512

Previously the NamedTempFile would receive every little write request for millions of objects, consuming considerable amounts of time. Now a buf writer alleviates this issue entirely.

prefix created pack files with pack- (#450)

e489b10

fix journey-test expecations (#450)

f10a3e0

Changed due to new naming of index and packs.

Byron added 4 commits October 3, 2022 12:00

fix: set the protocol version for local git transports as well. (#450)

41b0c19

Previously this was only done for ssh based connections, and requires setting an environment variable.

fix test expectations to handle V1/V2 differences. (#450)

e616174

Now V2 is actually used in our local tests as well.

Byron merged commit f47c891 into main Oct 3, 2022

Byron deleted the fetch-pack branch October 3, 2022 11:16

Byron mentioned this pull request Oct 4, 2022

Fetch and clone support (bare) #450

Open

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`gix fetch` with fast-forward support #548

`gix fetch` with fast-forward support #548

Byron commented Oct 1, 2022 •

edited

Loading

gix fetch with fast-forward support #548

gix fetch with fast-forward support #548

Conversation

Byron commented Oct 1, 2022 • edited Loading

Performance Issues

Performance Notes

Out of scope

Previous PR

`gix fetch` with fast-forward support #548

`gix fetch` with fast-forward support #548

Byron commented Oct 1, 2022 •

edited

Loading