Skip to content

fix(binlog): handle empty TEXT/BLOB columns and missing chunks in serialization#10603

Closed
scbrown wants to merge 1 commit intodolthub:mainfrom
scbrown:fix/binlog-text-serialization-crash
Closed

fix(binlog): handle empty TEXT/BLOB columns and missing chunks in serialization#10603
scbrown wants to merge 1 commit intodolthub:mainfrom
scbrown:fix/binlog-text-serialization-crash

Conversation

@scbrown
Copy link
Copy Markdown

@scbrown scbrown commented Mar 2, 2026

Fixes #10601

Summary

When log_bin=1 is enabled, DOLT_COMMIT on a table with TEXT/BLOB columns panics in encodeBytesFromAddress because:

  1. Empty TEXT/BLOB columns store a zero hash in the tuple — the serializer tries to load this from the ChunkStore, which returns EmptyChunk
  2. The assertTrue in nodeStore.Read converts ChunkStore misses into unrecoverable panics

Changes

  • binlog_type_serialization.go: Add addr.IsEmpty() check in encodeBytesFromAddress — serialize empty TEXT/BLOB as zero-length value with correct length prefix size for the blob type. Wrap errors with address and type context.
  • node_store.go: Convert assertTrue(c.Size() > 0, ...) panic to a returned fmt.Errorf so the server stays up and logs the issue.
  • binlog_type_serialization_test.go: Add TestTextSerializer_MissingChunk that reproduces the ChunkStore miss using separate storage backends and explicit cache purging.

Testing

  • Unit tests pass (go vet clean, TestTextSerializer suite, new TestTextSerializer_MissingChunk)
  • Verified on a production Dolt v1.83.0 server with 11K+ row table containing 4 TEXT columns
  • INSERT + DOLT_COMMIT with empty string, NULL, and populated TEXT columns all succeed with binlog enabled

…log serialization

When log_bin=1 is enabled and DOLT_COMMIT is called on a table with TEXT/BLOB
columns, the binlog serializer can crash in two ways:

1. Empty TEXT/BLOB columns store a zero hash (all zeros) in the tuple.
   encodeBytesFromAddress attempts to load this from the ChunkStore, which
   fails with "empty chunk returned from ChunkStore". Fix: check addr.IsEmpty()
   before the ChunkStore lookup and serialize as a zero-length value.

2. On large tables, the shared NodeStore cache can evict blob tree nodes.
   When the binlog serializer reads evicted nodes, it falls through to
   ChunkStore.Get which returns EmptyChunk, triggering an assertion panic
   in nodeStore.Read. Fix: convert the assertTrue panic to a returned error
   so the server stays up and logs the issue instead of crashing.

Includes a test (TestTextSerializer_MissingChunk) that reproduces the
ChunkStore miss by using separate NodeStore/ChunkStore instances and
purging the shared cache to force the fallthrough path.

Reproduction: INSERT + DOLT_COMMIT on any table with TEXT columns while
log_bin=1 is enabled. Observed on a production server with 11K+ row table
containing 4 TEXT columns.
@fulghum
Copy link
Copy Markdown
Contributor

fulghum commented Mar 3, 2026

Thank you for sending this PR. This was helpful to quickly see what was going on in the code. I've made a slightly different change in #10621, so I'll close out this PR.

@fulghum fulghum closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Binlog serialization panics on TEXT/BLOB columns during DOLT_COMMIT

3 participants