fix(ludicrous): Fix logical race condition in concurrent execution of mutations #7269
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The following crash is seen in alpha, if we run high load writes in ludicrous mode for a long time.
The error was due to a logical race condition in concurrent execution of mutations in ludicrous mode. Consider the following mutation scenario:
mutation M1 -> (uid:12, name:"alice")
mutation M2 -> (uid:12, name:"bob")
The conflict keys for both of these mutations will be same. Assume that processing of M1 is already started by the
e.worker()
goroutine, and then M2 arrives. M2 will have a dependency on M1 and theinDeg
of M2 will be 1. But by the time, M2 goes to check if it hasinDeg == 0
here -> https://github.com/dgraph-io/dgraph/blob/65be0bd5d84439de328c57db8a51ff7b5041adc9/worker/executor.go#L249 it is possible that M1 would have been completed and it would have unblocked M2, reducedinDeg
of M2 to 0 and started the processing of M2. In the check in (line: 249), M2 will seeinDeg
to be 0 and will start its processing again. This causes the issue of double done on the watermark.This issue can be consistently reproduced, if we add a sleep after releasing the lock here -> https://github.com/dgraph-io/dgraph/blob/65be0bd5d84439de328c57db8a51ff7b5041adc9/worker/executor.go#L247 and do mutation on same <uid, predicate> multiple times.
This change is