Skip to content

blobstore: add chunked-object mode to GitBlobstore#10424

Merged
coffeegoddd merged 21 commits intomainfrom
db/gitblobstore-next-2a
Feb 9, 2026
Merged

blobstore: add chunked-object mode to GitBlobstore#10424
coffeegoddd merged 21 commits intomainfrom
db/gitblobstore-next-2a

Conversation

@coffeegoddd
Copy link
Copy Markdown
Contributor

@coffeegoddd coffeegoddd commented Feb 5, 2026

This PR introduces GitBlobstore, a Blobstore implementation backed by a git repository’s object database (bare repo or .git dir). Keys
are stored as paths in the tree of a commit pointed to by a configured ref (e.g. refs/dolt/data), enabling Dolt remotes to be hosted on
standard git remotes.

High-level design

• Storage model
• Each blobstore key maps to a git tree path under the ref’s commit.
• Small objects are stored as a single git blob at .
• Large objects (when chunking enabled) are stored as a git tree at containing part blobs:
• /00000001, /00000002, … (lexicographically ordered)
• No descriptor header / no stored total size; size is derived by summing part blob sizes.
• Roll-forward only: this PR supports the above formats; it does not include backward-compat for any older descriptor-based chunking
formats.
• Per-key versioning
• Get/Put/CheckAndPut return a per-key version equal to the object id at :
• inline: blob OID
• chunked: tree OID
• Idempotent Put
• For non-manifest keys, Put fast-succeeds if already exists (assumes content-addressed semantics common in NBS/table files),
returning the existing per-key version without consuming the reader.
• manifest remains mutable and is updated via CheckAndPut.
CheckAndPut semantics
• CheckAndPut performs CAS against the current per-key version at (not against the HEAD commit hash).
• Implementation uses a ref-level CAS retry loop:
• re-checks version at current HEAD
• only consumes/hashes the reader after the expected version matches
• retries safely if the ref advances due to unrelated updates
• Blob↔tree transitions
• Handles transitions between inline blob and chunked tree representations by proactively removing conflicting index paths before
staging new entries (avoids git index file-vs-directory conflicts).

Internal git plumbing additions

Adds/uses a unified internal GitAPI abstraction to support:
• resolving path objects and types (blob vs tree)
• listing tree entries for chunked reads
• removing paths from the index in bare repos
• staging and committing new trees, with configurable author/committer identity fallback

Base automatically changed from db/gitblobstore-next-2 to main February 5, 2026 22:45
@coffeegoddd coffeegoddd force-pushed the db/gitblobstore-next-2a branch 2 times, most recently from 6e772da to 608904f Compare February 5, 2026 23:04
@coffeegoddd coffeegoddd requested a review from Copilot February 5, 2026 23:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an opt-in chunked-object representation for GitBlobstore to work around Git's single-blob size limitations. The implementation adds a MaxPartSize configuration option that, when enabled, splits large objects into multiple part blobs referenced by a descriptor blob. The descriptor uses a simple text format with a magic string ("DOLTBS1") for easy detection.

Changes:

  • Added internal gitbs package with descriptor encoding/parsing, range operations, and part path generation
  • Enhanced Get path to transparently detect and read chunked objects via descriptor parsing
  • Refactored Put and CheckAndPut to support chunked writes when MaxPartSize is configured
  • Improved Put to use create-only CAS (zero OID) when creating refs to prevent losing concurrent writes
  • Added comprehensive test coverage for chunked operations, including multipart reads and range queries

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.

Show a summary per file
File Description
go/store/blobstore/internal/gitbs/descriptor.go Implements descriptor encoding/parsing with validation and DOLTBS1 magic detection
go/store/blobstore/internal/gitbs/descriptor_test.go Tests for descriptor round-trip, validation, and error cases
go/store/blobstore/internal/gitbs/descriptor_helpers_test.go Tests for internal descriptor parsing helpers
go/store/blobstore/internal/gitbs/ranges.go Implements range normalization and part slicing logic for offset/length operations
go/store/blobstore/internal/gitbs/ranges_test.go Tests for range operations including negative offsets and boundary spanning
go/store/blobstore/internal/gitbs/ranges_helpers_test.go Tests for internal range helper functions including overflow cases
go/store/blobstore/internal/gitbs/oid.go OID validation accepting both upper and lower case hex characters
go/store/blobstore/internal/gitbs/parts_path.go Generates deterministic fanout paths for part blobs under reserved namespace
go/store/blobstore/internal/gitbs/parts_path_test.go Tests for part path generation including case normalization
go/store/blobstore/git_blobstore.go Main implementation: adds MaxPartSize config, multipart reader, chunked write logic, and improved ref creation CAS
go/store/blobstore/git_blobstore_parts.go Implements idempotent part blob staging to ensure reachability
go/store/blobstore/git_blobstore_parts_test.go Tests part staging idempotency and reachability verification
go/store/blobstore/git_blobstore_multipart_test.go Unit tests for multipart reader including offset handling and error cases
go/store/blobstore/git_blobstore_helpers_test.go Test utilities and tests for refactored helper functions
go/store/blobstore/git_blobstore_chunked_get_test.go Integration tests for reading chunked objects with various range operations
go/store/blobstore/git_blobstore_chunked_put_test.go Integration tests for writing and verifying chunked objects
go/store/blobstore/git_blobstore_chunked_checkandput_test.go Integration tests for CheckAndPut with chunked objects

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coffeegoddd
Copy link
Copy Markdown
Contributor Author

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
608904f ok 5937471
version total_tests
608904f 5937471
correctness_percentage
100.0

@coffeegoddd
Copy link
Copy Markdown
Contributor Author

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
5ccd811 ok 5937471
version total_tests
5ccd811 5937471
correctness_percentage
100.0

@coffeegoddd
Copy link
Copy Markdown
Contributor Author

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
0cb918f ok 5937471
version total_tests
0cb918f 5937471
correctness_percentage
100.0

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +635 to +646
// tree -> blob OR tree overwrite: remove old child entries under <key>/...
entries, err := gbs.api.ListTree(ctx, parent, key)
if err != nil {
return err
}
if len(entries) == 0 {
return nil
}
paths := make([]string, 0, len(entries))
for _, e := range entries {
paths = append(paths, key+"/"+e.Name)
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When overwriting an existing tree at key, this only removes the immediate children (key/<name>). If any child is itself a subtree, removing key/<subdir> won’t remove leaf entries like key/<subdir>/file, so stale files can remain staged and end up in the new commit. This should remove all descendant index entries under key/ (e.g. list recursively and remove leaf paths) to guarantee a clean overwrite.

Suggested change
// tree -> blob OR tree overwrite: remove old child entries under <key>/...
entries, err := gbs.api.ListTree(ctx, parent, key)
if err != nil {
return err
}
if len(entries) == 0 {
return nil
}
paths := make([]string, 0, len(entries))
for _, e := range entries {
paths = append(paths, key+"/"+e.Name)
}
// tree -> blob OR tree overwrite: remove all existing entries under <key>/...
var paths []string
var walkTree func(prefix string) error
walkTree = func(prefix string) error {
entries, err := gbs.api.ListTree(ctx, parent, prefix)
if err != nil {
return err
}
for _, e := range entries {
childPath := prefix + "/" + e.Name
paths = append(paths, childPath)
// If this child is itself a tree, recurse into it so we remove
// all descendant entries from the index.
_, childTyp, err := gbs.api.ResolvePathObject(ctx, parent, childPath)
if err != nil {
if git.IsPathNotFound(err) {
// Entry disappeared concurrently; just skip it.
continue
}
return err
}
if childTyp == git.ObjectTypeTree {
if err := walkTree(childPath); err != nil {
return err
}
}
}
return nil
}
if err := walkTree(key); err != nil {
return err
}
if len(paths) == 0 {
return nil
}

Copilot uses AI. Check for mistakes.
@coffeegoddd
Copy link
Copy Markdown
Contributor Author

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
3e4401c ok 5937471
version total_tests
3e4401c 5937471
correctness_percentage
100.0

@coffeegoddd
Copy link
Copy Markdown
Contributor Author

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
6c31dfd ok 5937471
version total_tests
6c31dfd 5937471
correctness_percentage
100.0

@coffeegoddd coffeegoddd force-pushed the db/gitblobstore-next-2a branch from 6c31dfd to b7bb09d Compare February 9, 2026 20:46
@coffeegoddd
Copy link
Copy Markdown
Contributor Author

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
b7bb09d ok 5937471
version total_tests
b7bb09d 5937471
correctness_percentage
100.0

@coffeegoddd coffeegoddd merged commit e5f2f60 into main Feb 9, 2026
23 of 24 checks passed
@coffeegoddd
Copy link
Copy Markdown
Contributor Author

@reltuk and I had a conversation about this IRL and are moving forward to get Git remotes feature out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants