VStream: Prevent buffering entire transactions (OOM risk), instead send chunks to client#18849
VStream: Prevent buffering entire transactions (OOM risk), instead send chunks to client#18849mattlord merged 22 commits intovitessio:mainfrom
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
…nd chunks to client Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
23bdc58 to
80fb058
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18849 +/- ##
==========================================
+ Coverage 69.73% 69.81% +0.07%
==========================================
Files 1608 1610 +2
Lines 214776 215360 +584
==========================================
+ Hits 149781 150347 +566
- Misses 64995 65013 +18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
…n/vstream_test Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses a critical OOM (Out Of Memory) issue in VTGate when streaming very large transactions (multi-GB) through VStream. The root cause was that VTGate buffered entire transactions in memory before sending them to clients, even when transactions were chunked from tablets.
Key Changes:
- Introduces a configurable transaction chunk size threshold (default 128MB) that triggers lock-based contiguous delivery for large transactions
- Implements dynamic lock acquisition when transactions exceed the threshold to ensure non-interleaved delivery across shards while still allowing chunked transmission
- Adds comprehensive unit and e2e tests to verify chunking behavior and prevent transaction interleaving
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| proto/vtgate.proto | Adds transaction_chunk_size field to VStreamFlags for configurable chunking threshold |
| go/vt/proto/vtgate/vtgate.pb.go | Generated protobuf code for the new transaction_chunk_size field |
| go/vt/proto/vtgate/vtgate_vtproto.pb.go | Generated vtproto code for serialization/deserialization of transaction_chunk_size |
| go/vt/vtgate/vstream_manager.go | Core implementation: adds transaction state tracking, dynamic lock acquisition for large transactions, and chunked event sending |
| go/vt/vtgate/vstream_manager_test.go | New unit test verifying that large transactions from one shard don't interleave with events from other shards |
| go/test/endtoend/vreplication/vstream_test.go | Updates e2e tests with 1KB chunk size to ensure chunking is actually tested |
| go/test/endtoend/vreplication/initial_data_test.go | Adds helper function to insert large transactions for testing chunking behavior |
| go/test/endtoend/vreplication/vreplication_test.go | Minor refactoring to move connection initialization earlier in test function |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I also added the needs website docs label as we'll need to update this page: https://vitess.io/docs/24.0/reference/vreplication/vstream/ |
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
go/vt/vtgate/vstream_manager.go
Outdated
| // Large incomplete transaction detected - acquire lock to prevent interleaving | ||
| // Lock will be held across subsequent callbacks until transaction completes |
There was a problem hiding this comment.
At this point there may already be interleaved events, no? Is that a problem?
There was a problem hiding this comment.
There will actually not be interleaved events.
This is because current/default behavior is to send transactions atomically, only once all events from BEGIN to COMMIT are accumulated.
There are two cases when we reach this code:
- The lock is not held - this means that other streams are sending transactions atomically. When we go to acquire the lock, atomic transactions may have completed just before us. But they are atomic so it's fine. All other streams will be halted while we send our chunked transaction.
- The lock is held - this means another shard is holding the lock and chunking. In this case we wait until that shard has finished its transaction. Then we acquire the lock, thus the complete, chunked transaction of another shard will be sent prior to us beginning to chunk our shard's transaction.
In either case, the events between different transactions of different shards are not interleaved. The only interleaving happens at the transaction-level, ie whole transactions interleaved across shards (not inter-transaction event level)
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
| // defaultTransactionChunkSizeBytes is the default threshold for chunking transactions. | ||
| // 0 (the default value for protobuf int64) means disabled, clients must explicitly set a value to opt in for chunking. | ||
| const defaultTransactionChunkSizeBytes = 0 |
There was a problem hiding this comment.
Do we still need this now that the flag's default is also 0? I guess it makes it cleaner when we later make it opt-out. Totally fine to leave it here.
There was a problem hiding this comment.
Yes, I think it's good to have it there for future ease of updating, and i added a comment related to that. It also makes the code explicit and self documenting
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
…nd chunks to client (vitessio#18849) Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
#764) * VStream: Prevent buffering entire transactions (OOM risk), instead send chunks to client (vitessio#18849) Signed-off-by: twthorn <thomaswilliamthornton@gmail.com> * Fix static code checks Signed-off-by: twthorn <thomaswilliamthornton@gmail.com> * Remove utils import Signed-off-by: twthorn <thomaswilliamthornton@gmail.com> * Fix keyspaces to watch test Signed-off-by: twthorn <thomaswilliamthornton@gmail.com> --------- Signed-off-by: twthorn <thomaswilliamthornton@gmail.com> Co-authored-by: Tanjin Xu <109303790+tanjinx@users.noreply.github.com>
…nd chunks to client (vitessio#18849) Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
…nd chunks to client (vitessio#18849) Signed-off-by: twthorn <thomaswilliamthornton@gmail.com> Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>
…nd chunks to client (vitessio#18849) Signed-off-by: twthorn <thomaswilliamthornton@gmail.com> Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>
* Improve cgroup metric management (vitessio#18791) Signed-off-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * VStream: Prevent buffering entire transactions (OOM risk), instead send chunks to client (vitessio#18849) Signed-off-by: twthorn <thomaswilliamthornton@gmail.com> Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Run VStream copy only when VGTID requires it, use TablesToCopy in those cases (vitessio#18938) Signed-off-by: twthorn <thomaswilliamthornton@gmail.com> Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Regenerate vtgate.pb.go proto file Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Fix tests Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Complete PR vitessio#18791 backport: Update metrics_cgroup.go Apply missing changes from PR vitessio#18791 to metrics_cgroup.go: - Replace cgroup1Manager and cgroup2Manager with single cgroupManager - Add errCgroupMetricsNotAvailable error variable - Add sync.Once for lazy initialization - Remove cgroup v1 support, only support cgroup v2 - Simplify implementation with unified cgroup manager This fixes compilation errors in metrics_cgroup_test.go. * Add missing github.com/containerd/cgroups dependency Required by metrics_cgroup.go for cgroup v1/v2 support. Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Fix cgroups import to use v3 The v1 cgroups package is incompatible with Go 1.24.10. Use cgroups/v3 consistently throughout the file. Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Fix goimports formatting Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> --------- Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>
* Improve cgroup metric management (vitessio#18791) Signed-off-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * VStream: Prevent buffering entire transactions (OOM risk), instead send chunks to client (vitessio#18849) Signed-off-by: twthorn <thomaswilliamthornton@gmail.com> Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Run VStream copy only when VGTID requires it, use TablesToCopy in those cases (vitessio#18938) Signed-off-by: twthorn <thomaswilliamthornton@gmail.com> Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Regenerate vtgate.pb.go proto file Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Fix tests Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Complete PR vitessio#18791 backport: Update metrics_cgroup.go Apply missing changes from PR vitessio#18791 to metrics_cgroup.go: - Replace cgroup1Manager and cgroup2Manager with single cgroupManager - Add errCgroupMetricsNotAvailable error variable - Add sync.Once for lazy initialization - Remove cgroup v1 support, only support cgroup v2 - Simplify implementation with unified cgroup manager This fixes compilation errors in metrics_cgroup_test.go. * Add missing github.com/containerd/cgroups dependency Required by metrics_cgroup.go for cgroup v1/v2 support. Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Fix cgroups import to use v3 The v1 cgroups package is incompatible with Go 1.24.10. Use cgroups/v3 consistently throughout the file. Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> * Fix goimports formatting Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com> --------- Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>
Description
There is a bug that causes OOM errors for vtgate when a very large transaction (e.g., multi-GB) but with many reasonably sized operations is sent over VStream.
The problem is caused by this logic. We buffer with
eventssthe entire transaction before sending it. Very large transactions eg multi-GB can cause OOM errors. Example described hereThis PR aims to fix this by allowing for locking across multiple received event batches from a tablet. And thus allows for sending chunked transactions even before the COMMIT is received, while still preserving the order for the VStream (ie even for multi-shard, the transactions cannot be interleaved, each transaction is sent in its entirety before sending the next transaction of any shard).
I am open to putting this behind a flag. There may be performance implication from this additional locking.
Another approach may be a size in bytes like vstream_packet_size for a tablet, but for the vtgate. If that size in bytes is exceeded a lock is acquired, and then we will start sending the transaction as chunks (and stop accumulating it in memory). Open to discussion on this.
Testing
For reproduction, I added a test that fails without this change (asserts that transactions should NOT be accumulated before sending):
With these changes, the test passes.
Docs PR: vitessio/website#2028
Related Issue(s)
Checklist
Deployment Notes
AI Disclosure