fix(binlog): handle empty TEXT/BLOB columns and missing chunks in serialization#10603
Closed
scbrown wants to merge 1 commit intodolthub:mainfrom
Closed
fix(binlog): handle empty TEXT/BLOB columns and missing chunks in serialization#10603scbrown wants to merge 1 commit intodolthub:mainfrom
scbrown wants to merge 1 commit intodolthub:mainfrom
Conversation
…log serialization When log_bin=1 is enabled and DOLT_COMMIT is called on a table with TEXT/BLOB columns, the binlog serializer can crash in two ways: 1. Empty TEXT/BLOB columns store a zero hash (all zeros) in the tuple. encodeBytesFromAddress attempts to load this from the ChunkStore, which fails with "empty chunk returned from ChunkStore". Fix: check addr.IsEmpty() before the ChunkStore lookup and serialize as a zero-length value. 2. On large tables, the shared NodeStore cache can evict blob tree nodes. When the binlog serializer reads evicted nodes, it falls through to ChunkStore.Get which returns EmptyChunk, triggering an assertion panic in nodeStore.Read. Fix: convert the assertTrue panic to a returned error so the server stays up and logs the issue instead of crashing. Includes a test (TestTextSerializer_MissingChunk) that reproduces the ChunkStore miss by using separate NodeStore/ChunkStore instances and purging the shared cache to force the fallthrough path. Reproduction: INSERT + DOLT_COMMIT on any table with TEXT columns while log_bin=1 is enabled. Observed on a production server with 11K+ row table containing 4 TEXT columns.
Contributor
|
Thank you for sending this PR. This was helpful to quickly see what was going on in the code. I've made a slightly different change in #10621, so I'll close out this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #10601
Summary
When
log_bin=1is enabled,DOLT_COMMITon a table with TEXT/BLOB columns panics inencodeBytesFromAddressbecause:assertTrueinnodeStore.Readconverts ChunkStore misses into unrecoverable panicsChanges
binlog_type_serialization.go: Addaddr.IsEmpty()check inencodeBytesFromAddress— serialize empty TEXT/BLOB as zero-length value with correct length prefix size for the blob type. Wrap errors with address and type context.node_store.go: ConvertassertTrue(c.Size() > 0, ...)panic to a returnedfmt.Errorfso the server stays up and logs the issue.binlog_type_serialization_test.go: AddTestTextSerializer_MissingChunkthat reproduces the ChunkStore miss using separate storage backends and explicit cache purging.Testing