Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] LwIP: Incorrect task when not using LWIP_TCPIP_CORE_LOCKING #28590

Open
rojer opened this issue Aug 8, 2023 · 4 comments
Open

[BUG] LwIP: Incorrect task when not using LWIP_TCPIP_CORE_LOCKING #28590

rojer opened this issue Aug 8, 2023 · 4 comments
Labels
bug Something isn't working needs triage

Comments

@rojer
Copy link
Contributor

rojer commented Aug 8, 2023

Reproduction steps

LwIP offers two modes to run TCP/IP stack: a dedicated task or in-line.
In the former case, access to LwIP's state is serialized by requiring that all APIs be invoked from the dedicated task (throwing a closure on the task's work queue if necessary).
In the latter, a mutex is used and it is sufficient to LOCK_TCPIP_CORE();.
tcpip_api_call() function is provided by LwIP and will do the right thing, depending on whether LWIP_TCPIP_CORE_LOCKING is set or not.
There are advantages and disadvantages to both approaches but the point is, both are supported by the stack and either one can be used by a particular platform.
However, CHIP assumes that LWIP_TCPIP_CORE_LOCKING is used while not explicitly requiring it, which results in incorrect locking model being used if LWIP_TCPIP_CORE_LOCKING is not set (basically no serialization), and exposes applications to races within networking stack with unpredictable results.

Notably, ESP32 runs LwIP in dedicated task mode by default (LWIP_TCPIP_CORE_LOCKING is a config option but is not the default).

CHIP should wither assert that LWIP_TCPIP_CORE_LOCKING is enabled or (preferably) wrap LwIP API calls in tcpip_api_call.

Bug prevalence

Always

GitHub hash of the SDK that was being used

4f658f6

Platform

esp32

Platform Version(s)

No response

Anything else?

// Lock LwIP stack
LOCK_TCPIP_CORE();

this assumes core locking is enabled, LOCK_TCPIP_CORE is a no-op otherwise.

tcpip_api_call should be used to synchronously invoke an API function, using locking or sync closure if necessary.

@shubhamdp
Copy link
Contributor

@rojer I have enabled the static assert in #28798 and skipped it for ASR and bouffalolab platforms. Please take a look.

@shubhamdp
Copy link
Contributor

shubhamdp commented Aug 23, 2023

Here is one more PR from @tx2rx for ASR: #28824

@rojer
Copy link
Contributor Author

rojer commented Aug 23, 2023

@shubhamdp that looks reasonable. if you need help testing non-core-locking case, i'd be happy to.

@rojer
Copy link
Contributor Author

rojer commented Sep 4, 2023

@shubhamdp i took a stab at it and #29057 is the result. this is pretty straightforward and seems to be working for me so far.

A note on LwIPReceiveUDPMessage: it will almost certainly be invoked in the context of a different thread/task than CHIP's: with LWIP_TCPIP_CORE_LOCKING=0 it will be the TCPIP thread for sure, with LWIP_TCPIP_CORE_LOCKING=1 we don't really know, some other thread handling the interface RX (having a dedicated task for that is a common pattern this is what ESP32 does too, with separate tasks for wifi and eth). Granted, the method doesn't do much before posting the packet to the event queue, but even "not much" is something that could be racy, so I added Lock/UnlockChipStack() calls.

Correction: turns out, i cannot use PlatformManager from Inet due to a dependency cycle. So we should be locking the stack but looks like at present we can't. I'll need guidance on how to break this cycle...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

No branches or pull requests

2 participants