Skip to content

Conversation

@spilchen
Copy link
Contributor

Retry an insert with testutils.SucceedsSoon. This helps to resolve a testflake where the insert fails because it runs with a timestamp from before the schema change that created the table. Normally, the transaction gets pushed, but it can flake in multi-tenant environments.

Release note: none
Closes #157251

@spilchen spilchen self-assigned this Nov 18, 2025
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@spilchen spilchen marked this pull request as ready for review November 18, 2025 20:54
@spilchen spilchen requested a review from a team as a code owner November 18, 2025 20:54
Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fqazi reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @spilchen)


pkg/sql/catalog/lease/lease_test.go line 1815 at r1 (raw file):

	// Not sure whether run in the past and so sees clock uncertainty push.
	testutils.SucceedsSoon(t, func() error {

Were you able to confirm this eliminates the flake? Just want to make sure the timestamp actually gets moved up. Also the transaction isn't stuck in an error state.

Copy link
Contributor Author

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @fqazi)


pkg/sql/catalog/lease/lease_test.go line 1815 at r1 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

Were you able to confirm this eliminates the flake? Just want to make sure the timestamp actually gets moved up. Also the transaction isn't stuck in an error state.

Unfortunately, I wasn't able to reproduce this. I ran this on my local machine (without my fix), but no luck:

dev test pkg/sql/catalog/lease --ignore-cache -f TestTableCreationPushesTxnsInRecentPast --stress --count 10000

This issue does pop up once in a while: #140254 (Jan2025), #116810 (Dec2023), #114366 (Nov2023), #113208 (Oct2023)

If you notice the frequency slowed down. There was a fix (#116552) that went in Dec2023. That added an override for the uncertainty interval (went to 1 second). I am wondering if that's a better approach? Maybe setting it to 2 seconds.

As a test, I set the MaxOffset to a small value (100ms), and then I was able to repro quite easy. So, bumping MaxOffset to a second definitely helps things. I then applied my SucceedsSoon fix with a small MaxOffset to see what happens when the timestamp isn't pushed. And the test failed with an error like this:

lease_test.go:1815: condition failed to evaluate within 45s: pq: current transaction is aborted, commands ignored until end of transaction block

So, using SucceedsSoon requires restart of the transaction, which kind of defeats the purpose of the test. I either need to use SucceedsSoon for the entire testcase or try bumping the MaxOffset. I'm going to go with doubling of MaxOffset.

Retry an insert with testutils.SucceedsSoon. This helps to resolve a
testflake where the insert fails because it runs with a timestamp from
before the schema change that created the table. Normally, the
transaction gets pushed, but it can flake in multi-tenant environments.

Release note: none
Closes cockroachdb#157251
@spilchen spilchen force-pushed the gh-157251/251118/1521/push-txns-test-fix/pr-ready branch from f18de8d to b3d45c1 Compare November 19, 2025 12:57
Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@fqazi reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @spilchen)


pkg/sql/catalog/lease/lease_test.go line 1815 at r1 (raw file):

Previously, spilchen wrote…

Unfortunately, I wasn't able to reproduce this. I ran this on my local machine (without my fix), but no luck:

dev test pkg/sql/catalog/lease --ignore-cache -f TestTableCreationPushesTxnsInRecentPast --stress --count 10000

This issue does pop up once in a while: #140254 (Jan2025), #116810 (Dec2023), #114366 (Nov2023), #113208 (Oct2023)

If you notice the frequency slowed down. There was a fix (#116552) that went in Dec2023. That added an override for the uncertainty interval (went to 1 second). I am wondering if that's a better approach? Maybe setting it to 2 seconds.

As a test, I set the MaxOffset to a small value (100ms), and then I was able to repro quite easy. So, bumping MaxOffset to a second definitely helps things. I then applied my SucceedsSoon fix with a small MaxOffset to see what happens when the timestamp isn't pushed. And the test failed with an error like this:

lease_test.go:1815: condition failed to evaluate within 45s: pq: current transaction is aborted, commands ignored until end of transaction block

So, using SucceedsSoon requires restart of the transaction, which kind of defeats the purpose of the test. I either need to use SucceedsSoon for the entire testcase or try bumping the MaxOffset. I'm going to go with doubling of MaxOffset.

Yeah, lets just bump up MaxOffset this isn't too frequent. So, hopefully it decreases likelihood further.

@spilchen
Copy link
Contributor Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 19, 2025

@craig craig bot merged commit a8320b5 into cockroachdb:master Nov 19, 2025
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql/catalog/lease: TestTableCreationPushesTxnsInRecentPast failed

3 participants