fix git remote performance on dolt push#10538
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes GitBlobstore performance when pushing to a Git remote by avoiding unnecessary fetches for cached non-manifest keys, batching non-manifest writes to push alongside the manifest in a single commit/push, and making git fetch ignore the remote’s default refspecs to avoid failures from stale tracking refs.
Changes:
- Add
--refmap=togit fetchto prevent processing the remote’s configured fetch refspecs and avoid ref conflicts. - Defer non-manifest
Put/Concatenateremote writes and flush them duringCheckAndPut("manifest")as a single commit+push. - Update tests to account for deferred non-manifest writes by explicitly flushing via
CheckAndPut("manifest").
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| go/store/blobstore/internal/git/impl.go | Adjusts FetchRef to use --refmap= so fetch only applies the explicit refspec. |
| go/store/blobstore/git_blobstore.go | Implements deferred non-manifest writes, cache-first read behavior for non-manifest keys, and manifest-flush batching. |
| go/store/blobstore/git_blobstore_test.go | Updates remote-managed tests to flush deferred writes via manifest CheckAndPut. |
| go/store/blobstore/git_blobstore_chunked_put_test.go | Updates chunked put tests for deferred write behavior and remote verification after flush. |
| go/store/blobstore/git_blobstore_helpers_test.go | Removes helper tests for functions deleted/refactored by the read-path changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@coffeegoddd DOLT
|
|
@coffeegoddd DOLT
|
macneale4
left a comment
There was a problem hiding this comment.
One nit, and a question about removed tests. If the tests were moved because they are no longer relevant, go ahead and ship. It seems like they are still useful though, unless I'm not understanding something.
| return f.pushRefWithLease(ctx, remote, srcRef, dstRef, expectedDstOID) | ||
| } | ||
|
|
||
| func TestGitBlobstoreHelpers_resolveCommitForGet(t *testing.T) { |
There was a problem hiding this comment.
why is this no longer required? Seems like you'd still want to know when these resources aren't found.
There was a problem hiding this comment.
these were for some helper methods that were removed, we have coverage elsewhere
| require.Equal(t, []byte("abcdefghij"), got) | ||
|
|
||
| // Flush deferred writes and verify remote state. | ||
| _, err = bs.CheckAndPut(ctx, "", "manifest", 3, bytes.NewReader([]byte("m1\n"))) |
There was a problem hiding this comment.
nit. the use of m1\n makes is look like this is a magic string, when I think it's just like saying foobar\n - so it's clear to people that it's nonsense data input.
|
@coffeegoddd DOLT
|
|
@coffeegoddd DOLT
|
Prior to this change pushes that would take ~6s to DoltHub took ~70s to a Git remote. This pr removes fetch for non-manifest keys that are already in the cache on the read path which brings a push to a git remote down to around ~17s. One the write path, tablefiles are now only written to the remote at the same time the manifest is written, in a single commit and push call. This change brings the performance down to about ~8s in my testing. This optimization doesn't change the Blobstore interface, but does introduce a risk for a caller-side bug, by incorrectly calling Put(tf1), CheckAndPut(manifest_referencing_tf1_and_tf2), Put(tf2). This changes the visibility contract that a Put is already on the remote.
Also, fixes git fetch failures caused by stale remote-tracking refs: add --refmap="" so fetch only processes our explicit refspec and ignores the remote's default fetch refspecs, which can fail with ref directory/file conflicts unrelated to our ref.