-
Notifications
You must be signed in to change notification settings - Fork 4k
pkg/sql/catalog/lease: make table creation push test resilient #158014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/sql/catalog/lease: make table creation push test resilient #158014
Conversation
fqazi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fqazi reviewed 1 of 1 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @spilchen)
pkg/sql/catalog/lease/lease_test.go line 1815 at r1 (raw file):
// Not sure whether run in the past and so sees clock uncertainty push. testutils.SucceedsSoon(t, func() error {
Were you able to confirm this eliminates the flake? Just want to make sure the timestamp actually gets moved up. Also the transaction isn't stuck in an error state.
spilchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @fqazi)
pkg/sql/catalog/lease/lease_test.go line 1815 at r1 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
Were you able to confirm this eliminates the flake? Just want to make sure the timestamp actually gets moved up. Also the transaction isn't stuck in an error state.
Unfortunately, I wasn't able to reproduce this. I ran this on my local machine (without my fix), but no luck:
dev test pkg/sql/catalog/lease --ignore-cache -f TestTableCreationPushesTxnsInRecentPast --stress --count 10000
This issue does pop up once in a while: #140254 (Jan2025), #116810 (Dec2023), #114366 (Nov2023), #113208 (Oct2023)
If you notice the frequency slowed down. There was a fix (#116552) that went in Dec2023. That added an override for the uncertainty interval (went to 1 second). I am wondering if that's a better approach? Maybe setting it to 2 seconds.
As a test, I set the MaxOffset to a small value (100ms), and then I was able to repro quite easy. So, bumping MaxOffset to a second definitely helps things. I then applied my SucceedsSoon fix with a small MaxOffset to see what happens when the timestamp isn't pushed. And the test failed with an error like this:
lease_test.go:1815: condition failed to evaluate within 45s: pq: current transaction is aborted, commands ignored until end of transaction block
So, using SucceedsSoon requires restart of the transaction, which kind of defeats the purpose of the test. I either need to use SucceedsSoon for the entire testcase or try bumping the MaxOffset. I'm going to go with doubling of MaxOffset.
Retry an insert with testutils.SucceedsSoon. This helps to resolve a testflake where the insert fails because it runs with a timestamp from before the schema change that created the table. Normally, the transaction gets pushed, but it can flake in multi-tenant environments. Release note: none Closes cockroachdb#157251
f18de8d to
b3d45c1
Compare
fqazi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fqazi reviewed 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @spilchen)
pkg/sql/catalog/lease/lease_test.go line 1815 at r1 (raw file):
Previously, spilchen wrote…
Unfortunately, I wasn't able to reproduce this. I ran this on my local machine (without my fix), but no luck:
dev test pkg/sql/catalog/lease --ignore-cache -f TestTableCreationPushesTxnsInRecentPast --stress --count 10000This issue does pop up once in a while: #140254 (Jan2025), #116810 (Dec2023), #114366 (Nov2023), #113208 (Oct2023)
If you notice the frequency slowed down. There was a fix (#116552) that went in Dec2023. That added an override for the uncertainty interval (went to 1 second). I am wondering if that's a better approach? Maybe setting it to 2 seconds.
As a test, I set the MaxOffset to a small value (100ms), and then I was able to repro quite easy. So, bumping MaxOffset to a second definitely helps things. I then applied my
SucceedsSoonfix with a small MaxOffset to see what happens when the timestamp isn't pushed. And the test failed with an error like this:lease_test.go:1815: condition failed to evaluate within 45s: pq: current transaction is aborted, commands ignored until end of transaction blockSo, using
SucceedsSoonrequires restart of the transaction, which kind of defeats the purpose of the test. I either need to useSucceedsSoonfor the entire testcase or try bumping the MaxOffset. I'm going to go with doubling of MaxOffset.
Yeah, lets just bump up MaxOffset this isn't too frequent. So, hopefully it decreases likelihood further.
|
TFTR! bors r+ |
Retry an insert with testutils.SucceedsSoon. This helps to resolve a testflake where the insert fails because it runs with a timestamp from before the schema change that created the table. Normally, the transaction gets pushed, but it can flake in multi-tenant environments.
Release note: none
Closes #157251