Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcoap: Suspected crosstalk between requests (possible NULL call) #14390

Open
chrysn opened this issue Jun 29, 2020 · 0 comments
Open

gcoap: Suspected crosstalk between requests (possible NULL call) #14390

chrysn opened this issue Jun 29, 2020 · 0 comments
Assignees
Labels
Area: CoAP Area: Constrained Application Protocol implementations Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@chrysn
Copy link
Member

chrysn commented Jun 29, 2020

From the way gcoap memos are set up, I suspect that timeouts can under race conditions slip to a new request:

  • A response event timeout is set on a memo.
  • Very late, a response comes in.
  • The timeout fires, and the timeout event gets placed to on the queue.
  • The response is processed, and event_timeout_clear gets called. That function's documentation says that it may be a no-op if the event has been queued already. (This is where I found it: I wanted to do the same also on empty ACKs and was astounded by event_timeout_clear not giving clear indication whether the CB was stopped or not in gcoap: Process CON responses #14178)
  • After the response was processed by the application code, the memo is set to GCOAP_MEMO_UNUSED.
  • Now there is a window of admittedly not much time until the event loop will process the timeout -- a bit more after gcoap: Process CON responses #14178, but a race is a race, and there could also be other events in the queue that delay things.
  • In that time, a user thread could send a gcoap message. This either sets up a new timeout event in the old's place, or nulls it. (The violation of documentation happens here already: event_timeout_set demands that the used event must stay valid until the event has been processed.)
  • The event gets processed, and _on_resp_timeout gets fired.
  • The new request now (immediately after being sent) receives a timeout event, or the event fires with its callback address NULLed.

Steps to reproduce

were not taken yet. I have not observed this, just went through either the code paths or the documentation.

I expect this to be rather rare in practice, and hard to reproduce (even locally, let alone remotely, which is why I'm not giving this the security issue treatment).

Expected results and steps forward

gcoap should be free of such races.

I had briefly hoped that there could be a bipartite scheme of access (we only clear gcoap's timeouts in the message handler, leave the memos as ALMOST_UNUSED, and only set the memos to UNUSED when all timeout events were processed), but it seems that the event queue doesn't work that way.

My preferred design would be to allow event_timeout_clear to say "sorry events have been set in motion" and then cancel whatever would touch that memo. (That would mean that incoming responses to just timing-out requests would be discarded even though they arrived at the host, but it's already a race). That has the downside of being comparatively intrusive (as the events timer inherits its uncommunicative behavior from xtimer).

In the mean time, I will continue #14178 disregarding that race, as however this is resolved can still be applied there.

Still evaluating other possibilities.

Versions

master of May (8a2b089) is where I did the code path checks

[edit: formatting]

@chrysn chrysn added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: CoAP Area: Constrained Application Protocol implementations labels Jun 29, 2020
@chrysn chrysn self-assigned this Jun 29, 2020
@miri64 miri64 added this to the Release 2020.07 milestone Jul 6, 2020
@aabadie aabadie added the Area: network Area: Networking label May 20, 2021
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: CoAP Area: Constrained Application Protocol implementations Area: network Area: Networking Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

4 participants