Skip to content

VStream: Prevent buffering entire transactions (OOM risk), instead send chunks to client#18849

Merged
mattlord merged 22 commits intovitessio:mainfrom
twthorn:fix_vtgate_oom
Dec 5, 2025
Merged

VStream: Prevent buffering entire transactions (OOM risk), instead send chunks to client#18849
mattlord merged 22 commits intovitessio:mainfrom
twthorn:fix_vtgate_oom

Conversation

@twthorn
Copy link
Contributor

@twthorn twthorn commented Oct 31, 2025

Description

There is a bug that causes OOM errors for vtgate when a very large transaction (e.g., multi-GB) but with many reasonably sized operations is sent over VStream.

The problem is caused by this logic. We buffer with eventss the entire transaction before sending it. Very large transactions eg multi-GB can cause OOM errors. Example described here

it was from an 11GB size transaction (11 MB per row * 1000 rows). So it seems there was some issue with breaking this transaction up between tablet to vtgate and vtgate to client. We are still looking into it, but wanted to share early findings

This PR aims to fix this by allowing for locking across multiple received event batches from a tablet. And thus allows for sending chunked transactions even before the COMMIT is received, while still preserving the order for the VStream (ie even for multi-shard, the transactions cannot be interleaved, each transaction is sent in its entirety before sending the next transaction of any shard).

I am open to putting this behind a flag. There may be performance implication from this additional locking.

Another approach may be a size in bytes like vstream_packet_size for a tablet, but for the vtgate. If that size in bytes is exceeded a lock is acquired, and then we will start sending the transaction as chunks (and stop accumulating it in memory). Open to discussion on this.

Testing

For reproduction, I added a test that fails without this change (asserts that transactions should NOT be accumulated before sending):

tthornton at tthornt-ltm9dwz in ~/git/twthorn/vitess on fix_vtgate_oom [$]
$ go test -v ./go/vt/vtgate -run TestVStreamLargeTransactionMemory 2>&1 | tail -30
=== RUN   TestVStreamLargeTransactionMemory
E1103 15:49:17.252008   26813 vstream_manager.go:442] Error in vstream for keyspace:"TestVStream" shard:"-20" gtid:"pos": context canceled
context ended while streaming from TestVStream/-20
    vstream_manager_test.go:2403: Max memory used: 57 MB
    vstream_manager_test.go:2407:
                Error Trace:    /Users/tthornton/git/twthorn/vitess/go/vt/vtgate/vstream_manager_test.go:2407
                Error:          "57" is not less than "25"
                Test:           TestVStreamLargeTransactionMemory
                Messages:       Memory usage should stay low due to immediate transaction chunk sending. VTGate should not accumulate all 50 chunks (~50MB) before sending.
--- FAIL: TestVStreamLargeTransactionMemory (0.01s)
FAIL
FAIL    vitess.io/vitess/go/vt/vtgate   1.147s
FAIL

With these changes, the test passes.

Docs PR: vitessio/website#2028

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

AI Disclosure

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Oct 31, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Oct 31, 2025
@github-actions github-actions bot added this to the v24.0.0 milestone Oct 31, 2025
…nd chunks to client

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

❌ Patch coverage is 80.32787% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.81%. Comparing base (79af4c1) to head (0fa4663).
⚠️ Report is 42 commits behind head on main.

Files with missing lines Patch % Lines
go/vt/vtgate/vstream_manager.go 80.32% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18849      +/-   ##
==========================================
+ Coverage   69.73%   69.81%   +0.07%     
==========================================
  Files        1608     1610       +2     
  Lines      214776   215360     +584     
==========================================
+ Hits       149781   150347     +566     
- Misses      64995    65013      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@twthorn twthorn added Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature) and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Nov 3, 2025
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
@twthorn twthorn requested a review from mattlord November 4, 2025 02:32
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
…n/vstream_test

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a critical OOM (Out Of Memory) issue in VTGate when streaming very large transactions (multi-GB) through VStream. The root cause was that VTGate buffered entire transactions in memory before sending them to clients, even when transactions were chunked from tablets.

Key Changes:

  • Introduces a configurable transaction chunk size threshold (default 128MB) that triggers lock-based contiguous delivery for large transactions
  • Implements dynamic lock acquisition when transactions exceed the threshold to ensure non-interleaved delivery across shards while still allowing chunked transmission
  • Adds comprehensive unit and e2e tests to verify chunking behavior and prevent transaction interleaving

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
proto/vtgate.proto Adds transaction_chunk_size field to VStreamFlags for configurable chunking threshold
go/vt/proto/vtgate/vtgate.pb.go Generated protobuf code for the new transaction_chunk_size field
go/vt/proto/vtgate/vtgate_vtproto.pb.go Generated vtproto code for serialization/deserialization of transaction_chunk_size
go/vt/vtgate/vstream_manager.go Core implementation: adds transaction state tracking, dynamic lock acquisition for large transactions, and chunked event sending
go/vt/vtgate/vstream_manager_test.go New unit test verifying that large transactions from one shard don't interleave with events from other shards
go/test/endtoend/vreplication/vstream_test.go Updates e2e tests with 1KB chunk size to ensure chunking is actually tested
go/test/endtoend/vreplication/initial_data_test.go Adds helper function to insert large transactions for testing chunking behavior
go/test/endtoend/vreplication/vreplication_test.go Minor refactoring to move connection initialization earlier in test function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mattlord mattlord added the NeedsWebsiteDocsUpdate What it says label Nov 25, 2025
@mattlord
Copy link
Member

I also added the needs website docs label as we'll need to update this page: https://vitess.io/docs/24.0/reference/vreplication/vstream/

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
@twthorn twthorn added the NeedsWebsiteDocsUpdate What it says label Nov 26, 2025
Comment on lines +896 to +897
// Large incomplete transaction detected - acquire lock to prevent interleaving
// Lock will be held across subsequent callbacks until transaction completes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point there may already be interleaved events, no? Is that a problem?

Copy link
Contributor Author

@twthorn twthorn Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will actually not be interleaved events.

This is because current/default behavior is to send transactions atomically, only once all events from BEGIN to COMMIT are accumulated.

There are two cases when we reach this code:

  1. The lock is not held - this means that other streams are sending transactions atomically. When we go to acquire the lock, atomic transactions may have completed just before us. But they are atomic so it's fine. All other streams will be halted while we send our chunked transaction.
  2. The lock is held - this means another shard is holding the lock and chunking. In this case we wait until that shard has finished its transaction. Then we acquire the lock, thus the complete, chunked transaction of another shard will be sent prior to us beginning to chunk our shard's transaction.

In either case, the events between different transactions of different shards are not interleaved. The only interleaving happens at the transaction-level, ie whole transactions interleaved across shards (not inter-transaction event level)

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
@twthorn twthorn requested a review from mattlord December 2, 2025 20:42
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
@mattlord mattlord removed the NeedsWebsiteDocsUpdate What it says label Dec 4, 2025
Copy link
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving now, although I'd like to discuss my comments before we merge (still need a second reviewer). Please let me know what you think.

Nice work on this, @twthorn ! I appreciate your patience and persistence on this. ❤️

Comment on lines +78 to +80
// defaultTransactionChunkSizeBytes is the default threshold for chunking transactions.
// 0 (the default value for protobuf int64) means disabled, clients must explicitly set a value to opt in for chunking.
const defaultTransactionChunkSizeBytes = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this now that the flag's default is also 0? I guess it makes it cleaner when we later make it opt-out. Totally fine to leave it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's good to have it there for future ease of updating, and i added a comment related to that. It also makes the code explicit and self documenting

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
@mattlord mattlord merged commit 488ef3d into vitessio:main Dec 5, 2025
103 of 105 checks passed
twthorn added a commit to slackhq/vitess that referenced this pull request Dec 11, 2025
…nd chunks to client (vitessio#18849)

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
tanjinx added a commit to slackhq/vitess that referenced this pull request Dec 30, 2025
#764)

* VStream: Prevent buffering entire transactions (OOM risk), instead send chunks to client (vitessio#18849)

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>

* Fix static code checks

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>

* Remove utils import

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>

* Fix keyspaces to watch test

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>

---------

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Co-authored-by: Tanjin Xu <109303790+tanjinx@users.noreply.github.com>
twthorn added a commit to slackhq/vitess that referenced this pull request Jan 28, 2026
…nd chunks to client (vitessio#18849)

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
twthorn added a commit to slackhq/vitess that referenced this pull request Jan 28, 2026
…nd chunks to client (vitessio#18849)

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>
twthorn added a commit to slackhq/vitess that referenced this pull request Jan 29, 2026
…nd chunks to client (vitessio#18849)

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>
tanjinx pushed a commit to slackhq/vitess that referenced this pull request Jan 30, 2026
* Improve cgroup metric management (vitessio#18791)

Signed-off-by: Matt Lord <mattalord@gmail.com>

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* VStream: Prevent buffering entire transactions (OOM risk), instead send chunks to client (vitessio#18849)

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Run VStream copy only when VGTID requires it, use TablesToCopy in those cases (vitessio#18938)

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Regenerate vtgate.pb.go proto file

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Fix tests

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Complete PR vitessio#18791 backport: Update metrics_cgroup.go

Apply missing changes from PR vitessio#18791 to metrics_cgroup.go:
- Replace cgroup1Manager and cgroup2Manager with single cgroupManager
- Add errCgroupMetricsNotAvailable error variable
- Add sync.Once for lazy initialization
- Remove cgroup v1 support, only support cgroup v2
- Simplify implementation with unified cgroup manager

This fixes compilation errors in metrics_cgroup_test.go.

* Add missing github.com/containerd/cgroups dependency

Required by metrics_cgroup.go for cgroup v1/v2 support.

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Fix cgroups import to use v3

The v1 cgroups package is incompatible with Go 1.24.10.
Use cgroups/v3 consistently throughout the file.

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Fix goimports formatting

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

---------

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>
sbaker617 pushed a commit to slackhq/vitess that referenced this pull request Feb 5, 2026
* Improve cgroup metric management (vitessio#18791)

Signed-off-by: Matt Lord <mattalord@gmail.com>

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* VStream: Prevent buffering entire transactions (OOM risk), instead send chunks to client (vitessio#18849)

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Run VStream copy only when VGTID requires it, use TablesToCopy in those cases (vitessio#18938)

Signed-off-by: twthorn <thomaswilliamthornton@gmail.com>
Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Regenerate vtgate.pb.go proto file

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Fix tests

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Complete PR vitessio#18791 backport: Update metrics_cgroup.go

Apply missing changes from PR vitessio#18791 to metrics_cgroup.go:
- Replace cgroup1Manager and cgroup2Manager with single cgroupManager
- Add errCgroupMetricsNotAvailable error variable
- Add sync.Once for lazy initialization
- Remove cgroup v1 support, only support cgroup v2
- Simplify implementation with unified cgroup manager

This fixes compilation errors in metrics_cgroup_test.go.

* Add missing github.com/containerd/cgroups dependency

Required by metrics_cgroup.go for cgroup v1/v2 support.

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Fix cgroups import to use v3

The v1 cgroups package is incompatible with Go 1.24.10.
Use cgroups/v3 consistently throughout the file.

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

* Fix goimports formatting

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>

---------

Signed-off-by: Thomas Thornton <thomaswilliamthornton@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug Report: VTGates OOM on large transactions

4 participants