Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: broken tx retries for cluster clients after #697 #709

Merged
merged 3 commits into from
Dec 24, 2024

Conversation

rueian
Copy link
Collaborator

@rueian rueian commented Dec 22, 2024

No description provided.

return nil
}
count.m[p]++
count.m[cc]++
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make naming consistent.

break
}
}
if last == cmds.InitSlot {
return nil
}
} else if init {
cc := c.pslots[last]
count.m[cc] += inits
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't count cmd.InitSlot commands to c.pslots[last] in the code above this, so add the count back here.

@rueian rueian force-pushed the fix-cluster-tx-retry branch 3 times, most recently from dbe3db5 to 6bfd890 Compare December 23, 2024 02:26
@rueian rueian changed the title fix: broken tx retris for cluster clients after #697 fix: broken tx retries for cluster clients after #697 Dec 23, 2024
@wyxloading
Copy link
Contributor

wyxloading commented Dec 23, 2024

Find one extreme case:

MULTI
MULTI
GET foo
EXEC

We will get [4]RedisResult with all successful results. And resp[3] (which means EXEC result) will get a len == 3 array, that's not accurate right. The second MULTI should get a error ERR MULTI calls can not be nested and the transaction can EXEC though.

EDIT: maybe we should find the nearest successful MULTI command, but not the nearest MULTI command, if we want to find the whole transaction willing to retry ?

@rueian
Copy link
Collaborator Author

rueian commented Dec 23, 2024

The second MULTI should get a error ERR MULTI calls can not be nested and the transaction can EXEC though.

I just tried the double MULTI case but got an EXECABORT when doing EXEC. I think we shouldn't retry the transaction if the nearest MULTI command fails with a Redis error.

@wyxloading
Copy link
Contributor

wyxloading commented Dec 23, 2024

I think we shouldn't retry the transaction if the nearest MULTI command fails with a Redis error.

make sense though.

BTW i got different response from redis:7.4 cluster

cmd:

MULTI
MULTI
PTTL foo
EXEC

response

OK
ERR MULTI calls can not be nested
QUEUED
1) (integer) -2

EDIT:
If we sent the second MULTI command after PTTL foo, EXEC with same result.
If key foo belong to slot in state migrating, it will get ASK error, seems like it will fit in the code that find out a transaction and retry logic

@rueian rueian force-pushed the fix-cluster-tx-retry branch from 6bfd890 to f61ee99 Compare December 23, 2024 05:25
Comment on lines +679 to +682
for mi = i; mi >= 0 && !isMulti(commands[mi]) && !isExec(commands[mi]); mi-- {
}
for ei = i; ei < len(commands) && !isMulti(commands[ei]) && !isExec(commands[ei]); ei++ {
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both mi and ei cursor will stop at either MULTI or EXEC to avoid crossing tx boundaries.

@rueian rueian force-pushed the fix-cluster-tx-retry branch from f61ee99 to b4c543d Compare December 23, 2024 05:47
@rueian
Copy link
Collaborator Author

rueian commented Dec 23, 2024

I think we shouldn't retry the transaction if the nearest MULTI command fails with a Redis error.

make sense though.

BTW i got different response from redis:7.4 cluster

cmd:

MULTI
MULTI
PTTL foo
EXEC

response

OK
ERR MULTI calls can not be nested
QUEUED
1) (integer) -2

EDIT: If we sent the second MULTI command after PTTL foo, EXEC with same result. If key foo belong to slot in state migrating, it will get ASK error, seems like it will fit in the code that find out a transaction and retry logic

I tested these cases on Valkey 8, and they all failed with EXECABORT. But anyway, I think we should just not retry on these wired cases.

@wyxloading
Copy link
Contributor

LGTM mostly, but i have to test this PR again tomorrow

@rueian rueian force-pushed the fix-cluster-tx-retry branch from b4c543d to fbdca04 Compare December 23, 2024 15:48
cluster.go Outdated
}
for ei = i; ei < len(commands) && !isMulti(commands[ei]) && !isExec(commands[ei]); ei++ {
}
if mi >= 0 && mi < ei && ei < len(commands) && isMulti(commands[mi]) && isExec(commands[ei]) && resps[mi].val.string == ok { // a transaction is found.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the nearest MULTI didn't succeed, we don't retry the tx.

cluster.go Outdated
}
for ei = i; ei < len(commands) && !isMulti(commands[ei]) && !isExec(commands[ei]); ei++ {
}
if mi >= 0 && mi < ei && ei < len(commands) && isMulti(commands[mi]) && isExec(commands[ei]) && resps[mi].val.string == "QUEUED" { // a transaction is found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resps[mi].val.string should be OK or just verify there is no error ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be OK.

cluster.go Outdated
continue // the transaction has been added to the retries, go to the next cmd.
}
}
if hasInit && (mi < i && i < ei && mi >= 0 && isMulti(commands[mi]) || i > ei && ei >= 0 && isMulti(commands[ei])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isMulti(commands[ei]) maybe a typo ? should be isExec(commands[ei])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a typo. If commands[ei] is a MULTI then we should not retry the command on i. However, I found that it is actually a dead code, i will never be larger than ei at this point, so the condition is removed now.

@rueian rueian force-pushed the fix-cluster-tx-retry branch from 443734d to aafed64 Compare December 24, 2024 06:29
@rueian rueian marked this pull request as ready for review December 24, 2024 07:01
@rueian rueian requested a review from wyxloading December 24, 2024 07:49
Copy link
Contributor

@wyxloading wyxloading left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@wyxloading
Copy link
Contributor

IMO we should publish a release for it.

@rueian rueian force-pushed the fix-cluster-tx-retry branch from aafed64 to e5cfe35 Compare December 24, 2024 08:37
@rueian rueian force-pushed the fix-cluster-tx-retry branch from 9792175 to 060a937 Compare December 24, 2024 09:41
@rueian
Copy link
Collaborator Author

rueian commented Dec 24, 2024

IMO we should publish a release for it.

Yeap, but we just forget handling ASKING correctly. I have made a new commit for it.

@rueian rueian merged commit 99daad7 into main Dec 24, 2024
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants