gcoap: Suspected crosstalk between requests (possible NULL call) #14390
Labels
Area: CoAP
Area: Constrained Application Protocol implementations
Area: network
Area: Networking
Type: bug
The issue reports a bug / The PR fixes a bug (including spelling errors)
From the way gcoap memos are set up, I suspect that timeouts can under race conditions slip to a new request:
Steps to reproduce
were not taken yet. I have not observed this, just went through either the code paths or the documentation.
I expect this to be rather rare in practice, and hard to reproduce (even locally, let alone remotely, which is why I'm not giving this the security issue treatment).
Expected results and steps forward
gcoap should be free of such races.
I had briefly hoped that there could be a bipartite scheme of access (we only clear gcoap's timeouts in the message handler, leave the memos as ALMOST_UNUSED, and only set the memos to UNUSED when all timeout events were processed), but it seems that the event queue doesn't work that way.
My preferred design would be to allow event_timeout_clear to say "sorry events have been set in motion" and then cancel whatever would touch that memo. (That would mean that incoming responses to just timing-out requests would be discarded even though they arrived at the host, but it's already a race). That has the downside of being comparatively intrusive (as the events timer inherits its uncommunicative behavior from xtimer).
In the mean time, I will continue #14178 disregarding that race, as however this is resolved can still be applied there.
Still evaluating other possibilities.
Versions
master of May (8a2b089) is where I did the code path checks
[edit: formatting]
The text was updated successfully, but these errors were encountered: