Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancelling timed messages fails sometimes #136

Closed
garin1000 opened this issue Mar 5, 2023 · 3 comments
Closed

Cancelling timed messages fails sometimes #136

garin1000 opened this issue Mar 5, 2023 · 3 comments

Comments

@garin1000
Copy link
Contributor

Hi,

In my project, I send/cancel timed messages very rapidly. They are used as per-message timeouts for receiving a sequence of messages containing bulk data, which is quite fast on localhost (2.5ms per message). After a received message, I cancel the corresponding timeout and create a new one for the next block of data.

Rarely (not exactly reproducable, maybe every 1000 timed sends), cancelling does not work, so that I receive a timed message which should already have been cancelled.

I added some debut prints and noticed an issue when inserting and cancelling a lot of timed messages in short time:

Sometimes, my code calls EventSender::send_with_timer() and EventSender::cancel_timer() while EventReceiver::enque_timers() is still working on removing timers. The remove_timer_receiver queue will then also iterate over the new remove message for a timer, which hadn't even been added to the timer BTreeMap. In the next call of EventSender::enque_timers(), the timer_receiver queue contains the timer, which should already have been cancelled. Of course, this timer then fires at some time.

The EventReceiver::enque_timers() functions seems to be called too rarely and then handles several crossbeam messages at once, which increases the probability of this race condition. In my opinion, the actual issue is that timer creation and cancellation are handled in different queues. Due to this, the creation/cancellation order can be reversed.
If you have them in a single queue, the order would always be correct, and it would be guaranteed, that a timer is created before is is cancelled.

Looking forward to your opinion on this issue :-)

Best,
Garin

@lemunozm
Copy link
Owner

lemunozm commented Mar 6, 2023

Thanks for sharing this and for doing this excellent debug process. Good catch! It took me really little time to get into the problem, and yes, indeed:

the actual issue is that timer creation and cancellation are handled in different queues

That is exactly what is happening. And the solution, as you say, is to merge both queues. Probably using an enum to differentiate the event could be enough:

enum Timer<E> {
    Create(Duration, E)
    Cancel(TimerId)
} 

Would you like to make a PR with the fix?

@garin1000
Copy link
Contributor Author

I didn't implement a fix, yet. I try to find some time this evening and make a PR.

I will probably keep the tuples, and just change timer.1 to take the new Timer enum. That then would be

enum Timer<E> {
    Create(E),
    Cancel
}

@lemunozm
Copy link
Owner

lemunozm commented Mar 6, 2023

Thanks @garin1000, no hurry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants