Prevent race conditions in concurrent MLS commit requests#2525
Prevent race conditions in concurrent MLS commit requests#2525stefanwire merged 15 commits intodevelopfrom
Conversation
3e4487c to
5ce9adc
Compare
| . zUser claimant | ||
| ) | ||
|
|
||
| -- TODO(SB) generalise such that two prepared commits can be sent |
pcapriotti
left a comment
There was a problem hiding this comment.
Looks good if CI agrees. Minor comments follow.
| lockAcquired <- acquireCommitLock gid epoch ttl | ||
| when (lockAcquired == NotAcquired) $ | ||
| throwS @'MLSStaleMessage | ||
| bracket |
There was a problem hiding this comment.
Why not acquire the lock on the opening side of the bracket?
| addRemoteMLSClients = "update member_remote_user set mls_clients = mls_clients + ? where conv = ? and user_remote_domain = ? and user_remote_id = ?" | ||
|
|
||
| acquireCommitLock :: PrepQuery W (GroupId, Epoch, Int32) Row | ||
| acquireCommitLock = "insert into mls_commit_locks (group_id, epoch) values (?, ?) if not exists using ttl ?" |
There was a problem hiding this comment.
I'm surprised that this works. I was under the impression that you can't have TTL values as "question mark" parameters.
There was a problem hiding this comment.
We tried it locally by temporarily removing the finishing part of the bracket and observed the entries appearing and finally disappearing after the TTL expired. Would this manual test count or should I look deeper into it?
|
Ah, one more thing. What happens if the TTL expires before commit processing is finished? I think in that case there's a race between the thread that started the lock, and some potential threads receiving a competing commit, since the former behaves as if it is still holding the lock. For extra safety, maybe we should add a timeout T to the commit processing block, and correspondingly set the TTL for the lock to slightly more than T. This should minimise the probability of such failures. |
|
Not sure if this improves things. If we add a timeout and the commit processing can't meet it, then it's quite likely we're left with a broken state. If we don't add a timeout and the commit processing doesn't finish in time we're losing the guarantee the lock gives, but then there's still the chance that no commit arrives before the commit processing is finished. Alternatively we could increase the TTL of the lock to make this problem less likely to occur. |
9f420de to
06bd07d
Compare
https://wearezeta.atlassian.net/browse/FS-437
Checklist
make git-add-cassandra-schemato update the cassandra schema documentation.changelog.d.