Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could exhibit the same issue if OnRecoveryComplete is able to register another state machine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, enumerating over a copy would be the safer choice. At least it does not occur from concurrency (which comes as a surprise to users)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know where the concurrency is coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recovery process itself is done safely, but as new activations are created so are the instances of state machines as part of the grain's ctor. Each of them registers themselves in the state machine manager as their ctor runs. That process happens outside the work loop of the manager, and it can modify the list of state machines which is currently being enumerated as part of notifying all state machines that recovery is completed. Those two operations happen in concurrently by different threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locking on the recovery notification, means concurrent attempts for state machines to register themselves have to wait until all SMs are notified, i.e. the
RegisterStateMachine(name, sm)correctly locks but that lock is open if the enumeration does not hold the lock (which it does now in this PR)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats how i see it, hope it makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StateMachineManageris supposed to be created per-activation, though, so all of this should be happening on the same thread. Do you have a repro somewhere that I could look at?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, how did i miss that!!! Yeah, it happens sometimes for the "automatic job scenario"
https://github.com/ledjon-behluli/DurableStateMachines/blob/main/playground/DurableStateMachines.CTS/Program.cs#L49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not look too closely into that as i was under the assumption that the SM manager was for the silo, so i got fooled by the lack of lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock is good anyway