-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding SettingsLoad() logic #5054
Comments
This kind of reminds me of Windows' "Last known configuration" I think I have two concerns (for now)
|
2. This is the all point of the code - see the other issue
1. There is no intention to protect OTA under power failure (OTA is rare
relative to flash updates)
--
Hanoh
Sent from my iPhone
|
Feel free to fork and use your own code. This is the nice thing of Open Source |
So are CRC failures - if you're getting a lot of these its probably because you have bad spi flash, or your spi flash is end of life :) |
The fix is to solve the following case (your words) which is much more
frequent than CRC error due to flash end of life OR OTA.
“If a write error occurs it could be because of a power outage at the exact
moment of writing a new flash page it may be useful but it may as well also
be because a hardware watchdog exception occurred (which could be more
likely than a power failure).”
If CRC is good for one of the sectors I would definitely want to use it and
NOT to reconfigure it the device manually.
The watechdog is not due to the setting values beacuse the flash write is
async to the values in the RAM.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5054 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMbjvXkoDoEYPAl4dbYdNZHgaWEkkL76ks5vH1GjgaJpZM4aWaja>
.
--
Hanoh
Sent from my iPhone
|
I think you missed my point, Yes, the flash write is after a setting has changed but if the watchdog resets due to flash failure then I definately do not want to use any of the flash blocks whether they can or cannot be verified to be valid in one way or another. |
I’m trying to protect the common things that *CAN* be protected.
Failure with the flash hardware itself is out of the scope — there is
nothing to do about it. OTA the same.
“Power failure” while there is a inflight write to one of the sectors
*can* be protected.
Could you say why NOT to protect *this* case?
I don’t think it is a good excuse not to use this protection beacuse there
are cases that can’t be protected. We should do our best.
Remind me this:
https://en.wikipedia.org/wiki/Broken_windows_theory
On Mon, 28 Jan 2019 at 22:00 Andre Thomas ***@***.***> wrote:
I think you missed my point,
Yes, the flash write is after a setting has changed but if the watchdog
resets due to flash failure then I definately do not want to use any of the
flash blocks whether they can or cannot be verified to be valid in one way
or another.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5054 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMbjvdHW-HFdULuZyRjkoH87AACtN2FXks5vH1bSgaJpZM4aWaja>
.
--
Hanoh
Sent from my iPhone
|
Its not an excuse - its a reason. There is a big difference between knowing with absolute certainty that a previously written flash page is good compared to restore it from a known good initial configuration. But hey, I'm just an amateur programmer so I don't know much - lets wait for a response from @arendst before we play ping pong some more? |
Looking at the code I see more issues
read_setting_from sector(sector) |
Fork the code and feel free to make PR's so that they can be discussed there. Opening an issue and suggesting code changes is not a constructive way of changing anyone's view on the way the code is implemented. Lastly, opening an issue on the same topic because the previous one was closed is not the way to go. |
@hhaim I suggest you make your changes and then do some tests to see if it solves your issue. If so I might want to implement it. As of now my 30+ devices never had a crc error causing them to reset to default. I also never have a need to remove AC to the devices either. When AC was lost unexpectedly they all came back up as expected. Remember that there is only a flash write when a settings is going to be changed or the relay is switched. Then, depending on the state of So your flash corruption could only happen during one of the above situations. |
@arendst thanks for the quick answer. There are two things mixed in this issue.
While fixing #2 I get to issue #1 regarding #2: regarding #1: |
@arendst is it possible to change I think that I have such situation in the past (my garage controller) and I lost GPIO configuration because of power flickering... |
It is possible that you gus have hardware which is already damaged by yourself in a way you handle it? I never had these issue and i have a lot of sonoff devices. |
by power flickering (the power which is delivered to my house). This is nothing unusual in the village I live, it happens from time to time (power flickering). There are moments that I have a "disco" in my house for 10-15 seconds. If I correctly understand the lines from sonoff.ino then after 5 restarts during If I understand it wrong (lines 2487 till 2511) then you can ignore my posts.
It is not a hardware problem (at least the scenario that I described) but the Tasmota behavior when there are "fast restarts" during |
Please, check #4645 In that issue Theo has explained how to disable that feature if you don't need it. Thanks |
Sure, but there are no lines What I'm asking is to allow user (by |
I second that. I would set it to 20 in my case to eliminate false positive
(3 is normal for our national power supplier)
On Tue, 29 Jan 2019 at 19:30 benzino77 ***@***.***> wrote:
Sure, but there are no lines 2658 until 2680 in sonoff.ino anymore ;) I
think those lines are now 2487 until 2511
What I'm asking is to allow user (by user_config_override.h) to slightly
modify this feature by setting own BOOT_LOOP_TIME or change "boot number
thresholds"
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5054 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMbjveExytUqGb8AoMaz3ygJbwkY5rT1ks5vIIUxgaJpZM4aWaja>
.
--
Hanoh
Sent from my iPhone
|
BOOT_LOOP_TIME can be changed in my_user_config.h or my_user_config_override.h (if you use the override config file) by adding the following two lines to your relevant config file.
|
Thanks, A different define is needed
BOOT_COUNTS_TH 20
==
if (RtcReboot.fast_reboot_count > 1) {
To
==
if (RtcReboot.fast_reboot_count > BOOT_COUNTS_TH+1) {
The others numbers should be changes accordingly
On Tue, 29 Jan 2019 at 20:44 Andre Thomas ***@***.***> wrote:
BOOT_LOOP_TIME can be changed in my_user_config.h or
my_user_config_override.h (if you use the override config file) by adding
the following two lines to your relevant config file.
#undef BOOT_LOOP_TIME
#define BOOT_LOOP_TIME 20
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5054 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMbjvc4q92xq8i3r63ArJ4c3oMCy14cMks5vIJaKgaJpZM4aWaja>
.
--
Hanoh
Sent from my iPhone
|
Get off your phone and make a PR then :) |
@hhaim We highly appreciate a PR to see what high skilled programmer like you can teach us |
Why don't you guys buy a good UPS then? |
@jziolkowski we are talking about UPS for all home electricity - In my case only a big generator will do the job. |
@andrethomas thanks! Please correct me if I'm wrong: I should rather use lower value of #define BOOT_COUNTS_TH 1
// Disable functionality as possible cause of fast restart within BOOT_LOOP_TIME seconds (Exception, WDT or restarts)
if (RtcReboot.fast_reboot_count > BOOT_COUNTS_TH + 1) { // Restart twice
Settings.flag3.user_esp8285_enable = 0; // Disable ESP8285 Generic GPIOs interfering with flash SPI
if (RtcReboot.fast_reboot_count > BOOT_COUNTS_TH + 2) { // Restart 3 times
for (uint8_t i = 0; i < MAX_RULE_SETS; i++) {
if (bitRead(Settings.rule_stop, i)) {
bitWrite(Settings.rule_enabled, i, 0); // Disable rules causing boot loop
}
}
}
if (RtcReboot.fast_reboot_count > BOOT_COUNTS_TH + 3) { // Restarted 4 times
Settings.rule_enabled = 0; // Disable all rules
}
if (RtcReboot.fast_reboot_count > BOOT_COUNTS_TH + 4) { // Restarted 5 times
for (uint8_t i = 0; i < sizeof(Settings.my_gp); i++) {
Settings.my_gp.io[i] = GPIO_NONE; // Reset user defined GPIO disabling sensors
}
}
if (RtcReboot.fast_reboot_count > BOOT_COUNTS_TH + 5) { // Restarted 6 times
Settings.module = SONOFF_BASIC; // Reset module to Sonoff Basic
// Settings.last_module = SONOFF_BASIC; I can then redefine |
@benzino77 have a look into this PR
<#5063>
…On Wed, Jan 30, 2019 at 8:04 AM benzino77 ***@***.***> wrote:
@andrethomas <https://github.com/andrethomas> thanks! Please correct me
if I'm wrong: I should rather use *lower* value of BOOT_LOOP_TIME rather
than *higher*? Lower value will lead to the situation that "power
flickering" should be *really* fast to reset my configuration to safe
state (disabled rules, GPIO, Sonoff Basic module, etc.).
I *definitely* don't want to remove this feature as suggested by
@ascillato <https://github.com/ascillato>. I don't want to brick my
device (and have to remove the device from the wall to reflash it by serial
- this feature is here for a good reason) by wrong configuration which will
lead to "boot loop" or whatever. I suspect that something like the code
below (but I'm not Tasmota programmer) should give me (and maybe others,
but again, please correct me if I'm wrong) better control of the "power
flickering" situation:
#define BOOT_COUNTS_TH 1
// Disable functionality as possible cause of fast restart within BOOT_LOOP_TIME seconds (Exception, WDT or restarts)
if (RtcReboot.fast_reboot_count > BOOT_COUNTS_TH + 1) { // Restart twice
Settings.flag3.user_esp8285_enable = 0; // Disable ESP8285 Generic GPIOs interfering with flash SPI
if (RtcReboot.fast_reboot_count > BOOT_COUNTS_TH + 2) { // Restart 3 times
for (uint8_t i = 0; i < MAX_RULE_SETS; i++) {
if (bitRead(Settings.rule_stop, i)) {
bitWrite(Settings.rule_enabled, i, 0); // Disable rules causing boot loop
}
}
}
if (RtcReboot.fast_reboot_count > BOOT_COUNTS_TH + 3) { // Restarted 4 times
Settings.rule_enabled = 0; // Disable all rules
}
if (RtcReboot.fast_reboot_count > BOOT_COUNTS_TH + 4) { // Restarted 5 times
for (uint8_t i = 0; i < sizeof(Settings.my_gp); i++) {
Settings.my_gp.io[i] = GPIO_NONE; // Reset user defined GPIO disabling sensors
}
}
if (RtcReboot.fast_reboot_count > BOOT_COUNTS_TH + 5) { // Restarted 6 times
Settings.module = SONOFF_BASIC; // Reset module to Sonoff Basic// Settings.last_module = SONOFF_BASIC;
I can then redefine BOOT_COUNTS_TH and BOOT_LOOP_TIME to fit my
environment/needs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5054 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMbjvWqfXgcQFhnq3sK_Pbo04qf-Tf_2ks5vITXxgaJpZM4aWaja>
.
--
Hanoh
Sent from my iPhone
|
BOOT_LOOP_TIME defines the number of seconds after successful boot which RtcReboot.fast_reboot_count will be reset to 0 which essentially means that things are probably working as they should. I think its fair to assume that if you have very unstable AC and you want the device to believe that everything is OK then you can decrease this value so that the reboot counter stops contributing to settings changes sooner (lets say 1 second) - So if no secondary resets within a 1 second period then the firmware will assume everything is OK and continue to attempt to function as configured. I think its fair to say that if a reset/reboot occurs because of a hardware configuration issue it will most likely occur within milliseconds from boot so this will give you sufficient protection in case you configured a special function pin by accident which causes the wdt to meet one of its reset conditions. The main challenge here is to find a balance between how many possible reboots you would expect within a certain time frame measured against the probability of having a rule that places your device in a boot loop. I think implementing the BOOT_COUNTS_TH code you proposed will not make much difference if you're trying to mitigate power on-off transitions from your utility provider as it just adds another counter into the mix. The question is what is the shortest possible time in which your utility company will power cycle your house/neighborhood... Will it cause your device to boot more than once in 10 seconds? If not, then redefining it to a lower value would not be beneficial. Also note that the serious changes only start after the 3rd reboot within BOOT_LOOP_TIME seconds so there really has to be something wrong with the device - I don't think any power utility in the world can shed and restore electricity to a neighborhood 3 times in succession without exceeding the 10 second window at least once which would leave the configuration in tact as expected. |
@andrethomas thanks again!
Yes, this is definitely a challenge to find that balance. It is not trivial and obvious.
That was just a proposition for tasmota masters to comment or point weakness of that approach - thanks for that
You are a man of little faith :) @hhaim thanks - looks promising |
Sure, use 1 second then to give some tolerance to hardware configuration issues and make sure your rules are correct before enforcing them on a device which is inside a wall or a tree trunk :) |
@benzino77 Sounds like a nice village to retire in... Theo made changes with the following merges: 9825d6f (Add resiliency to saved Settings) |
WOW! There is even SetOptionXX for that! @arendst @andrethomas @hhaim - thank you gents! |
IMPORTANT NOTICE
If you do not complete the template below it is likely that your issue will not be addressed. When providing information about your issue please be as extensive as possible so that it can be solved by as little as possible responses.
FAILURE TO COMPLETE THE REQUESTED INFORMATION WILL RESULT IN YOUR ISSUE BEING CLOSED
Make sure these boxes are checked [x] before submitting your issue - Thank you!
status 0
:source understanding issue
I'm trying to make the CRC issue more resilient but not sure I follow the current logic of the code
In case of default mode "(rotate 0 = Save in next flash slot)"
This is what I expect from this function:
To read the Setting from the sector that match the following
However It does not do that instead
I would expect it to be
This is the more resilient CRC code
(Please, remember to close the issue when the problem has been addressed)
The text was updated successfully, but these errors were encountered: