Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: config write guard #2360

Merged
merged 1 commit into from
Nov 7, 2024
Merged

Conversation

schlimmchen
Copy link
Contributor

I was trying to tackle application of changed Dynamic Power Limiter settings in my feature branch implementing support for multiple inverters managed by the DPL. Then I noticed, that a bunch of nasty problem would go away if I knew that I could trust the config being consistent, i.e., if the config was only written synchronously with the main loop.

The latter ist currently not true: The config is usually manipulated in the context of the async webserver task, while it is read in the context of the main loop.

I wanted to keep the overhead when reading the config to a minimum, so locking a mutex each time the config shall be read is out of the picture, IMHO.

Writing the config happens rather seldom. So it should be okay if that takes a couple of milliseconds longer -- in the worst case.

My approach: Writing the config is only possible when acquiring a WriteGuard, which locks the "config write mutex" and waits for the signal (from the main loop) to start writing. The signal is needed to make writing the config synchronous with the main loop, and the mutex is needed to use the condition_variable but also to lock multiple writers from each other (which is less of a concern, to be honest).

I like this approach, because I like RAII. However, I am not sure if this is the best approach. Another idea would be to enqueue a callback which is then executed within the context of the main loop. We would need to copy and bind to that callback a significant amount of data, which seems off-putting. Also, access to the queue of callbacks then also need to be locked by a mutex.

@tbnobody I would like to know your opinion on this. Do you like this approach?

@LennartF22 We talked about this briefly. What are your thoughts on this approach?

@LennartF22
Copy link
Contributor

@schlimmchen I have not looked at the changes in detail yet, but it's definitely an improvement.

There are more synchronization issue between the WebApi and the main loop though, like here (if I'm not mistaken):

auto inv = Hoymiles.addInverter(inverter->Name, inverter->Serial);

@tbnobody
Copy link
Owner

Looks great! I am just curious if we would ever have to write the config in the context of the main loop (as it will generate a dead lock). Currently I cannot imagine any use case.

@schlimmchen schlimmchen marked this pull request as draft October 18, 2024 19:42
@schlimmchen
Copy link
Contributor Author

Thanks for your feedback!

I am just curious if we would ever have to write the config in the context of the main loop (as it will generate a dead lock).

Yes. I saw that, too. If one would inadvertently try to do this, the deadlock would happend when testing the change for the first time, and the config would not be saved at all, so it should be noticed early. And if we need to do that, we can adjust the WriteGuard c'tor to see whether the current task it the main loop task and then only lock the mutex, not waiting for the signal.

Let me polish and double-check this, then I will un-draft this PR (next week).

@schlimmchen
Copy link
Contributor Author

@tbnobody are you passionate about

auto const& config = Configuration.get();

versus

const auto& config = Configuration.get();

or whether it's consistent across the code base?

the configuration write guard is now required when the configuration
struct shall be mutated. the write guards locks multiple writers against
each other and also, more importantly, makes the writes synchronous to
the main loop. all code running in the main loop can now be sure that
(1) reads from the configuration struct are non-preemtive and (2) the
configuration struct as a whole is in a consistent state when reading
from it.

NOTE that acquiring a write guard from within the main loop's task will
immediately cause a deadlock and the watchdog will trigger a reset. if
writing from inside the main loop should ever become necessary, the
write guard must be updated to only lock the mutex but not wait for a
signal.
@schlimmchen
Copy link
Contributor Author

I am still happy with this. I double checked that generating a DTU serial works as expected using an erased ESP32:

[2024-10-22 21:27:07.684] rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
[2024-10-22 21:27:07.690] configsip: 0, SPIWP:0xee
[2024-10-22 21:27:07.690] clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[2024-10-22 21:27:07.695] mode:DIO, clock div:2
[2024-10-22 21:27:07.701] load:0x3fff0030,len:1184
[2024-10-22 21:27:07.701] load:0x40078000,len:13232
[2024-10-22 21:27:07.707] load:0x40080400,len:3028
[2024-10-22 21:27:07.707] entry 0x400805e4
[2024-10-22 21:27:08.379] E (694) esp_core_dump_flash: No core dum���ѥѥ���found!
[2024-10-22 21:27:08.379] E (694) esp_core_dump_flash: No core dump partition found!
[2024-10-22 21:27:08.398] 
[2024-10-22 21:27:08.398] Starting OpenDTU
[2024-10-22 21:27:08.398] Initialize FS... E (13) esp_littlefs: ./components/esp_littlefs/src/littlefs/lfs.c:1367:error: Corrupted dir pair at {0x0, 0x1}
[2024-10-22 21:27:08.409] 
[2024-10-22 21:27:08.409] E (20) esp_littlefs: mount failed,  (-84)
[2024-10-22 21:27:08.415] E (23) esp_littlefs: Failed to initialize LittleFS
[2024-10-22 21:27:08.415] [    37][E][LittleFS.cpp:98] begin(): Mounting LittleFS failed! Error: -1
[2024-10-22 21:27:08.426] failed... trying to format...E (46) esp_littlefs: ./components/esp_littlefs/src/littlefs/lfs.c:1367:error: Corrupted dir pair at {0x0, 0x1}
[2024-10-22 21:27:08.437] 
[2024-10-22 21:27:08.437] E (47) esp_littlefs: mount failed,  (-84)
[2024-10-22 21:27:08.437] E (51) esp_littlefs: Failed to initialize LittleFS
[2024-10-22 21:27:08.475] failedReading configuration... [   105][E][vfs_api.cpp:105] open(): /littlefs/config.json does not exist, no permits for creation
[2024-10-22 21:27:08.486] Failed to read file, using default configuration
[2024-10-22 21:27:08.486] Check for default DTU serial... generate serial based on ESP chip id: 199980103760... done
[2024-10-22 21:27:08.754] done
[2024-10-22 21:27:08.754] Reading PinMapping... [   384][E][vfs_api.cpp:105] open(): /littlefs/pin_mapping.json does not exist, no permits for creation
[2024-10-22 21:27:08.767] using default config done
[2024-10-22 21:27:08.767] Initialize Network... done
[2024-10-22 21:27:09.039] Setting Hostname... Initialize NTP... done
[2024-10-22 21:27:09.039] Initialize SunPosition... done
[2024-10-22 21:27:09.044] Initialize MqTT... done
[2024-10-22 21:27:09.044] Initialize WebApi... done
[2024-10-22 21:27:09.054] Initialize Display... done
[2024-10-22 21:27:09.054] Initialize LEDs... done
[2024-10-22 21:27:09.054] Initialize Hoymiles interface... Invalid pin config
[2024-10-22 21:27:09.060] Switch to WiFi mode
[2024-10-22 21:27:09.064] Setting Hostname... done
[2024-10-22 21:27:39.403] Disable search for AP... done

@schlimmchen schlimmchen marked this pull request as ready for review October 22, 2024 19:29
bool read();
bool write();
void migrate();
CONFIG_T& get();
CONFIG_T const& get();
Copy link

@mathieucarbou mathieucarbou Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this getter could become problematic: how it ensures that reads going through it will be protected by the mutex ?
What if another task / core modifies the config while another one calls this method to access a value ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, there is no assurance. I did not want to create overhead when reading the config structure. we assume the config structure is only read in context of the main loop. since writing is synchronized with the main loop using the changes in this PR, there is no issue when reading the structure in context of the main loop. however, other tasks also do read the structure, and indeed, those reads are still not protected through the changes in this PR. IMHO, that's a next step. it's going to be a pain to address this...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that's not complicated: the config class just has to expose the mutex (or 2 methods) and where you read the config in the code, you lock the mutex before then unlock it.

The ESP-IDF and Arduino Core are full of designs like that, especially in the LWIP layer which now require to lock/unlock all operations since Matter was introduced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not write it was complicated, just a pain 😉 What I meant was: large changeset. If we want to use a lock guard as well, we also need an additional scope, meaning indenting code, meaning even larger changeset. And things do indeed get complicated, or at least nasty, if the mutex must be released before a function is called, as to avoid recursive locking. I assume recursive locks are to be avoided, especially since we also use other mutexes and deadlocks are looming once we need to juggle multiple locks simultaneously.

The ESP-IDF and Arduino Core are full of designs like that

You are suggesting that we should not be afraid of the overhead of locking the same mutex dozens of times in the main loop in every round of the loop? Usually, I am very liberal when it comes to trading some performance for new and useful features, but locking mutexes? Am I paranoid?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree!
All depends on who reads, when and at which frequency.
I don't know enough the app internally.
Maybe otherwise returning copies or partial copies of the config when reading would be an option also
I was just highlighting the fact that there is no protection theorically.
But if in practice it never happens then it's good.

@tbnobody tbnobody merged commit 74e3947 into tbnobody:master Nov 7, 2024
8 of 9 checks passed
@schlimmchen schlimmchen deleted the config-write-guard branch November 7, 2024 18:52
Copy link

github-actions bot commented Dec 8, 2024

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants