Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More resilient to flash crc error #5032

Closed
hhaim opened this issue Jan 26, 2019 · 26 comments
Closed

More resilient to flash crc error #5032

hhaim opened this issue Jan 26, 2019 · 26 comments
Labels
enhancement Type - Enhancement that will be worked on fixed Result - The work on the issue has ended

Comments

@hhaim
Copy link

hhaim commented Jan 26, 2019

Version: 6.2.1

I had a power outage a few times and 10% of the devices got to default configuration.

looking into the code I see that it can be more resilient to this issue BUT it is not (more for flash wear out)

The code

void SettingsLoad() {

if (Settings.version > 0x06000000) { bad_crc = (Settings.cfg_crc != GetSettingsCrc()); } // if one slot has CRC error, break
  
}

  if (bad_crc ..) { SettingsDefault(); } // go to default 

This means that at the first crc error we just stop and write the default configuration.
Wouldn't it be better to keep looking for one with a valid crc and fail only if all slots are with bad crc ?

BTW I see that the flash boot count is not saved at startup which is a good thing for this situation (in newer version)

(Change bootcount update (being first) flash write to 10 seconds after restart)

@ascillato
Copy link
Contributor

Sorry. CRC is for all the block.

This is a feature for making the device to go to default values if you have flashed it from stock firmware to Tasmota but without erasing.

The issue you experiment is another. Tasmota have another feature that checks for software crashing at boot time. If your device boots and crash (or loose power) in less than 2 seconds and that happen several times in a row, the config goes to default. This is to prevent devices to be unusable due to bad config.

If you want to disable that, please search in issues. This has been addressed before and Theo had answered with the required changes in the code for you to compile by yourself.

Thanks

If you need further assistant please just ask here or in the Tasmota Support Chat. Thanks.

@ascillato2 ascillato2 added duplicated Result - Duplicated Issue template missing/incomplete Action - Template Missing or incomplete (issue will be closed) labels Jan 26, 2019
@ascillato2
Copy link
Collaborator

Support Information

See Wiki for more information.
See Chat for more user experience.

@hhaim
Copy link
Author

hhaim commented Jan 26, 2019

Thanks for the quick response.
In my case the configuration was valid, there was no crash, it just reboot due to other sources (home power). Why would it write default configuration in this case? This configuration was loaded and was OK once! can you point out to the other duplicate issue?

In any case I think the issue I'm reporting is valid too (power issue while writing the flash --> CRC issue that will create the same issue) -- this should be fixed on top of the "other" issue that you will point out

@ascillato
Copy link
Contributor

Sorry, I have already told you. If you have power to your device for less than 2 seconds then it has no power, then it has again, then it has not and that cycle several times in a row, it will be exactly the same as your device crashing by bad config. The device is not knowning that the several fast reboots cycle is because of power loss instead of bad config. That is why it have happened to you. The feature is not going to be removed that is why, Theo, the owner, had told the procedure.

@ascillato
Copy link
Contributor

#4645

@ascillato
Copy link
Contributor

In any case I think the issue I'm reporting is valid too (power issue while writing the flash --> CRC issue that will create the same issue) -- this should be fixed on top of the "other" issue that you will point out

In that case (bad crc of the block) it is ok and safer to go to a default state.

@hhaim
Copy link
Author

hhaim commented Jan 26, 2019

My suggestion is to be more resilinet to power issue while flash is written.
There is a difference. Would it help to get a code diff to understand it better?
What you are describing is a bit different mechanism. It is called boot loop
This mechanism does NOT write the default configuration, it just change the mode to sonoff basic ( in RAM, not flash) maybe I missing something but this is the code

  if (RtcReboot.fast_reboot_count > 1) {          // Restart twice
    Settings.flag3.user_esp8285_enable = 0;       // Disable ESP8285 Generic GPIOs interfering with flash SPI
    if (RtcReboot.fast_reboot_count > 2) {        // Restart 3 times
      for (byte i = 0; i < MAX_RULE_SETS; i++) {
        if (bitRead(Settings.rule_stop, i)) {
          bitWrite(Settings.rule_enabled, i, 0);  // Disable rules causing boot loop
        }
      }
    }
    if (RtcReboot.fast_reboot_count > 3) {        // Restarted 4 times
      Settings.rule_enabled = 0;                  // Disable all rules
    }
    if (RtcReboot.fast_reboot_count > 4) {        // Restarted 5 times
      **Settings.module = SONOFF_BASIC;             // Reset module to Sonoff Basic**
//      Settings.last_module = SONOFF_BASIC;
      for (byte i = 0; i < MAX_GPIO_PIN; i++) {
        Settings.my_gp.io[i] = GPIO_NONE;         // Reset user defined GPIO disabling sensors
      }
    }

Woudl you be able to point out the old issue?

@ascillato
Copy link
Contributor

Already did. Please read previous comments

@hhaim
Copy link
Author

hhaim commented Jan 26, 2019

I disagree! Tasmota image should be more resilant to power issue, I have 10 devices it does not make sense that any power issue will get it back to default.

@ascillato
Copy link
Contributor

Ok. There is no problem in disagreeing. Really.

The power issue that will led to going to defaults it is like a boot loop. On/off/on/off several times and the on is less than 2 seconds.

@hhaim
Copy link
Author

hhaim commented Jan 26, 2019

I see the issue now. thanks again.
I will comment this out -- this is again boot loop --not flash crc error.
To be more resiliant to flash issue, This issue should be fixed too, Could you reopen it?
BTW: I think that it would be nice to have a setOption or build option to disable boot-loop

Tasmota image should be rock solid -- it does not make sense to me that my child by doing a few power/on/off convert my smart home to be unusable.

@ascillato
Copy link
Contributor

If you don't want the feature to go to default settings, there is no problem, please just delete the lines Theo had pointed out.

It is not a bug to be fixed. It is a feature that you don't need. Please, just delete it. Thanks

@ascillato
Copy link
Contributor

Your child will need to play with the main brakers of your home to duplicate your issue hehehe

@hhaim
Copy link
Author

hhaim commented Jan 26, 2019

I do understand that. But this is not the point of this issue.
Let say I do fix the boot loop issue by commenting out the lines.
I still have another issue.
Let's get to my child use-case doing random power/on/off -- I think we could be mroe resilient to this use-case too.

The problem of random power/on/off is that it could happen while doing FLASH WRITE --> creating CRC error for this block (it is not that difficult e.g. in boot count save -- after 10 sec)
In the next boot, there could be a read from last slot with not valid CRC which will again trigger default configuration flash write. A better way is to look for a VALID configuration in one of the 7 slots (maybe older but VALID)

hope this is more clear now.

@ascillato
Copy link
Contributor

Let's get to my child use-case doing random power/on/off

A sonoff device properly installed, is connected to mains all the time. So to take out the power you should go down the main brakers of your home. Your child do that? Please don't let him/her do that. It is dangerous.

If you turn on off the relay, the device will be on all the time. Only the load will switch on and off.

Anyway, I get it. You don't want the feature of resetting the config to default. No problem. Just delete it. You are able to delete it for boot loop and also for crc check.

@hhaim
Copy link
Author

hhaim commented Jan 27, 2019 via email

@ascillato
Copy link
Contributor

ascillato commented Jan 27, 2019

For your issue of going to default values, you should start using a syslog server (use level 4 in Tasmota for more debug information) in order to search for the root cause of it:

  • CRC issue
  • boot-loop
  • a button is pressed more than 40 seconds

On any of those events, Tasmota publishes in the console the reason of config reset.

Let's review all this.

The esp8266 is a very cheap and tiny device. It has an EEPROM that supports a limited number of writing cycles. So, some sort of EEPROM management and data checking is recommended for this type of devices. Your FireTv don't have a cheap esp8266. And also the memory it has is indeed a very robust (and expensive) one that supports a huge amount of data to be read and wrote. So there is no comparison between them, sorry.

So, as the writes in EEPROM need to be limited, Tasmota has an EEPROM management in place:

  • it moves the location for the data that is written very frequently (like if the relay is ON or is OFF to preserve the state when Tasmota restarts), so the wear of memory is decreased.
  • it has a CFG_HOLDER that is used to know if the device is virgin or new and will format the memory with the default config.
  • it has a CRC check for the config block so as to get information that the device is reaching its end of life. (all esp8266 devices will not work forever) - This type of issue you can saw also in Raspberry Pi. If you use a SD card and you write information everyday like for example using it as a NVR for cameras, you will need to replace the SD card every year.
  • it has a boot-loop protection, that detects restarts. This works this way. If the device boots up, but in less than 2 seconds, it crash and restarts, after the 3er time that it does that in a row, some parameters will go to the defaults. If the boot loop continues, some other goes to default, this until all parameters goes to default for a safe start.

Remember that if you don't want them, you are able to delete any of them in the code. These are features, not bugs to be fixed.

So, all those protections are the resilence of this software. Not having them is not having resilence. If you don't have the boot-loop or CRC check, your device will start indeed with the stored config, but as something is wrong, this could led to unexpected behaviours (options enabled that were disabled, incorrect data in variables, etc etc). So, you can disable the code of all of them if you want, but in that case the software will not be resilent. Sorry.

The probability of having a bad data stored in the EEPROM in the case of losing power, is indeed very low. This is due how it is managed by the hardware. When Tasmota execute the command to store data in eeprom, the device uses a little more energy for an internal capacitor, after that, it uses that stored energy for saving the data. If in that precise moment there is no more power, the data will be saved anyway. Most chips with eeprom uses this to avoid bad writes in memory. It is just a feature of the hardware.

If you search in issues, there are few reports of resetting data to default. Most of them are because incorrect configuration of the buttons. Tasmota has a multipress support for buttons, so if you press the button (or a switch is in GND position) for 40 seconds, Tasmota will interpret that as full reset. You can disable that by commands if you want.

I have several devices with Tasmota at home since more than a year, and never had the unintended reset to default issue. I had several power loss but all of my devices come back online without problem.

Remember that if you have a weird power loss of having a ON/OFF cycle of less than 2 seconds ON and for more than 10 times in a row, you will have a similar case like a boot-loop. For a blackout is very weird to have that. (remember that this has nothing to do with pressing the buttons fast to turn on and off the relay. That will not affect your config. And turning on and off is what your child will do. You will not have any problem with that). Also a bad CRC check will trigger just when the device is reaching its end of life. You don't need to worry about that in a new device. After some years you should. There is also an example in issues of a very fast degradation of eeprom using the MEM variables for the Rule's feature of Tasmota. I don't remember the exact case right now, but using rules, one user was storing data in a MEM variable every few seconds, so in 3 months I think, it start having weird values returned from it. MEMs were not meant for that high usage so they are stored in the same eeprom position.

So, I hope the features of Tasmota are a bit more clear now.

(All these great features were designed and coded by Theo)

@ascillato2 ascillato2 added question Type - Asking for Information and removed duplicated Result - Duplicated Issue template missing/incomplete Action - Template Missing or incomplete (issue will be closed) labels Jan 27, 2019
@hhaim
Copy link
Author

hhaim commented Jan 27, 2019 via email

@ascillato2
Copy link
Collaborator

What am I missing?

I understand you don't agree about the actual eeprom management of Tasmota (those are features, not bugs nor issues). No problem at all with that. You can disable/delete in the code the features you don't want.

At this point your request is just to delete/disable some eeprom management features of Tasmota. Theo (@arendst the owner) has explained in the referenced issue that the actual features will remain. I also agree with that. Sorry.

I really want to thank you about this discussion. Very interesting. Sorry that the outcome is not the one you want. Anyway, I hope you will continue sharing ideas. Thanks

@hhaim
Copy link
Author

hhaim commented Jan 27, 2019 via email

hhaim added a commit to hhaim/Sonoff-Tasmota that referenced this issue Jan 29, 2019
@andrethomas2 andrethomas2 added enhancement Type - Enhancement that will be worked on fixed Result - The work on the issue has ended labels Jan 30, 2019
@andrethomas2 andrethomas2 removed the question Type - Asking for Information label Jan 30, 2019
@andrethomas2
Copy link
Collaborator

@arendst made changes with the following merges:

9825d6f (Add resiliency to saved Settings)
0007df1 (^^)

@meingraham
Copy link
Collaborator

meingraham commented Jan 30, 2019

SetOption36 was introduced with version 6.3.0.14 and deprecated shortly thereafter in 6.3.0.15 when Dynamic Sleep was re-worked. Is there any concern that those who use 6.3.0.14 might get "bitten" by re-using SetOption36 here in 6.4.1.13 if they upgrade their firmware and have the "wrong" setting for this option?

@arendst
Copy link
Owner

arendst commented Jan 30, 2019

Nope

@hhaim
Copy link
Author

hhaim commented Jan 30, 2019

Upgraded function will override this field with 1. There is no issue

@meingraham
Copy link
Collaborator

meingraham commented Jan 30, 2019

Alright; nice and succinct response @arendst ;-)

OK, it will override whatever value is set in the "old" configuration to 1. When does it not override the prior SetOption36 setting?

I have 18 devices sitting on 6.3.0.14 with SetOption36 set to 250 operating quite nicely and very stable.

Say something changes in the not so near future (long enough for my brain cells to have forgotten about SetOption36 boot loop control). It could be that my router gets a firmware upgrade and I start having Core incompatibility issues. Or a new killer TASMOTA feature comes along and it's a must have. Whatever the reason, it warrants upgrading to a newer version of TASMOTA with SetOption36 used to control boot loop default restoration.

Mike

@arendst
Copy link
Owner

arendst commented Jan 31, 2019

SetOption36 was removed long before release of 6.4.0 and replaced by a forced sleep of 50. So you can test the result of this removel now.

Although I try to keep every change backward compatible it is really a relieve if some changes just do not get backward compatability like this one. Remeber, it all happend within the dev cycle between 6.3 and 6.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Type - Enhancement that will be worked on fixed Result - The work on the issue has ended
Projects
None yet
Development

No branches or pull requests

6 participants