Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lfs_fs_forceconsistency()->lfs_fs_deorphan() doesn't commit updated gstate #604

Open
kasper0 opened this issue Oct 17, 2021 · 8 comments
Open

Comments

@kasper0
Copy link

kasper0 commented Oct 17, 2021

Hi,
while debugging the lfs, I've noticed that lfs_fs_forceconsistency()->lfs_fs_deorphan() doesn't commit updated gstate after fixing orphans. The responsibility for saving new gstate lies with a function calling lfs_fs_forceconsistency(), but it fails if the function returns before making a commit (e.g. mkdir() when the directory already exists).

Unfortunately, it takes a long time to perform the consistency check on systems with limited resources and dozens of files. As a result, if the issue mentioned above occurs followed by a reboot or power outage, time will be wasted on another consistency check (in my case, 1600 files/ 30minutes).

Moreover, it would be great to perform the consistency check directly from API without using the side effect of mkdir/open/remove/rename.

@Ldd309
Copy link

Ldd309 commented Apr 14, 2023

Hi,
I have the same problem as you, do you have any solution?

@geky
Copy link
Member

geky commented Apr 14, 2023

Hi @kasper0, thanks for creating an issue, this is good feedback.

The motivation for delaying the gstate commit is that we effectively write the gstate for free in the common case of being followed by a commit. This can be significant on storage with large prog_sizes (SD, NAND, mostly).

If a function fails before it can commit, the pending gstate should at least still be saved in lfs_t, and written out on the next operation that commits. So you should only pay for lfs_fs_forceconsistency at most once per power-cycle. Let me know if you're seeing different behavior.

Though it makes sense to me to add a public lfs_fs_forceconsistency that can be called to 1. check consistency without other side effects and 2. commit the gstate eagerly.

@Ldd309
Copy link

Ldd309 commented Apr 17, 2023

Hi @kasper0, thanks for creating an issue, this is good feedback.

The motivation for delaying the gstate commit is that we effectively write the gstate for free in the common case of being followed by a commit. This can be significant on storage with large prog_sizes (SD, NAND, mostly).

If a function fails before it can commit, the pending gstate should at least still be saved in lfs_t, and written out on the next operation that commits. So you should only pay for lfs_fs_forceconsistency at most once per power-cycle. Let me know if you're seeing different behavior.

Though it makes sense to me to add a public lfs_fs_forceconsistency that can be called to 1. check consistency without other side effects and 2. commit the gstate eagerly.

@geky Thank you for your answer, but every time I open a file in the 'RDWR' mode, it takes a long time. The following is the time I printed out for each function.

-----lfs_fs_demove[0]
-----lfs_dir_fetch[3755]
-----lfs_dir_fetch[8498]
-----lfs_dir_fetch[13151]
-----lfs_fs_parent[21677]
-----lfs_dir_get[22126]
-----lfs_dir_fetch[26770]
-----lfs_fs_parent[35295]
-----lfs_dir_get[35746]
-----lfs_dir_fetch[40613]
-----lfs_fs_parent[49148]
-----lfs_dir_get[49601]
-----lfs_dir_fetch[54878]
-----lfs_fs_parent[63404]
-----lfs_dir_get[63858]
-----lfs_dir_fetch[70502]
-----lfs_fs_parent[79028]
-----lfs_dir_get[79476]
-----lfs_dir_fetch[86193]
-----lfs_dir_fetch[92808]
-----lfs_dir_fetch[97185]
-----lfs_fs_parent[105721]
-----lfs_dir_get[106182]
-----lfs_dir_fetch[111114]
-----lfs_fs_parent[119654]
-----lfs_dir_get[120116]
-----lfs_dir_fetch[125598]
-----lfs_fs_parent[134138]
-----lfs_dir_get[134598]
-----lfs_dir_fetch[141001]
-----lfs_fs_parent[149541]
-----lfs_dir_get[150002]
-----lfs_dir_fetch[154164]
-----lfs_fs_parent[162715]
-----lfs_dir_get[163191]
-----restart[163592]
-----lfs_fs_deorphan[163932]
-----lfs_fs_forceconsistency[164394]
path --- /testf
-----lfs_dir_get[3]
08-08 03:14:41 D/lfs_test lfs_test: file_system: tt open cost 285 ms.
08-08 03:14:41 D/lfs_test lfs_test: file_system: tt close cost 0 ms.
-----lfs_fs_demove[0]
-----lfs_dir_fetch[3754]
-----lfs_dir_fetch[8498]
-----lfs_dir_fetch[13152]
-----lfs_fs_parent[21676]
-----lfs_dir_get[22125]
-----lfs_dir_fetch[26769]
-----lfs_fs_parent[35295]
-----lfs_dir_get[35746]
-----lfs_dir_fetch[40615]
-----lfs_fs_parent[49139]
-----lfs_dir_get[49592]
-----lfs_dir_fetch[54868]
-----lfs_fs_parent[63394]
-----lfs_dir_get[63848]
-----lfs_dir_fetch[70496]
-----lfs_fs_parent[79020]
-----lfs_dir_get[79470]
-----lfs_dir_fetch[86187]
-----lfs_dir_fetch[92801]
-----lfs_dir_fetch[97178]
-----lfs_fs_parent[105703]
-----lfs_dir_get[106164]
-----lfs_dir_fetch[111104]
-----lfs_fs_parent[119645]
-----lfs_dir_get[120107]
-----lfs_dir_fetch[125591]
-----lfs_fs_parent[134132]
-----lfs_dir_get[134592]
-----lfs_dir_fetch[140995]
-----lfs_fs_parent[149536]
-----lfs_dir_get[149997]
-----lfs_dir_fetch[154160]
-----lfs_fs_parent[162700]
-----lfs_dir_get[163176]
-----restart[163577]
-----lfs_fs_deorphan[163917]
-----lfs_fs_forceconsistency[164379]
path --- /testf
-----lfs_dir_get[3]
08-08 03:14:41 D/lfs_test lfs_test: file_system: tt open cost 285 ms.
08-08 03:14:41 D/lfs_test lfs_test: file_system: tt close cost 0 ms.

@geky geky added needs investigation no idea what is wrong needs fix we know what is wrong labels Apr 17, 2023
@Ldd309
Copy link

Ldd309 commented Apr 23, 2023

Thank you for your prompt reply. I tried the 2.4.2 version of lfs and did not find this problem. I found that the lfs_fs_deorphan function in 2.5.0 returns a smaller value in (lfs_gstate_getorphans(&lfs->gstate), found)) each time, but In the lfs_fs_deorphan function, the judgment of the increase of the found variable is not performed, and finally the lfs_fs_deorphan function does not add or delete orphan blocks. I have some questions about the found variable, what does it do and is it necessary? Really appreciate your help!

@geky
Copy link
Member

geky commented Apr 26, 2023

This does look like a bug and deserves more scrutiny.

In theory the line you found in lfs_fs_deorphan should be setting the in-ram gstate so that the lfs_gstate_hasorphans returns false.

littlefs/lfs.c

Lines 4599 to 4602 in 6a53d76

// mark orphans as fixed
return lfs_fs_preporphans(lfs, -lfs_min(
lfs_gstate_getorphans(&lfs->gstate),
found));

I'm guessing we've somehow ended up in a state where there are no orphans in the filesystem, but the orphan flag is set. This shouldn't happen normally, since we atomically clear the orphan flag when cleaning up orphans, but that's not an excuse for handling this case poorly.

I'll try to reproduce this on my end, it would be good to have tests asserting that lfs_fs_deorphan runs at most once per boot.

I have some questions about the found variable, what does it do and is it necessary?

I think you're on to something, that found variable does look useless and problematic. It may be left over from a previous lfs_fs_deorphan implementation. I will poke around and see if it can be removed.

There are two ways lfs_fs_deorphan can be triggered. One is when we mount a filesystem and see that the orphan bit in the gstate object is set, indicating a powerless occurred while there were orphans in the filesystem. Two is when we end up needing to do recursive relocations, ie if when trying to relocate one metadata data block we found ourselves needed to relocate its parent metadata block.

One main goal of littlefs is to not require recursion, and since each metadata block has effectively two parents (one directory parent and one linked-list predecessor), this gets a bit tricky. At the same time, we really don't want to call the expensive lfs_fs_deorphan if at all possible.

The solution is to use an internal 10-bit counter stored in some of the reserved bits of the gstate to track the number of orphan. If we weren't able to fix all orphans when relocating, we fall back to lfs_fs_deorphan.

In theory it's possible to exit lfs_fs_deorphan early if we know the number of orphans, and this may have been attempted at one point. But the code as it is now clearly doesn't do that, and I'm not sure it would be worth adding at the cost of code size and fragility. It would not change the runtime complexity anyways.


If you're able to reproduce this bug consistently, it would be interesting to know if eba5553 fixes the underlying orphan counting issue. This was found in a PR adding more testing and was recently merged into the devel branch (staging area for an upcoming release).

@geky
Copy link
Member

geky commented Apr 27, 2023

I was able to artificially reproduce the repeatedly-deorphaning issue in the case there are no orphans in the filesystem, and went ahead and pushed up a fix+test here:
#811

I think this should help your case @Ldd309, lfs_fs_deorphan should only run at most once after mount now. But let me know if you're still seeing this issue.

@Ldd309
Copy link

Ldd309 commented Apr 27, 2023

Thank you very much for your help, I will try it soon

@geky
Copy link
Member

geky commented Apr 27, 2023

I've also gone ahead and created a PR to add the optional function described by @kasper0:
#812

This would also allow explicit bumping of the on-disk minor version, which is changing due to #497, so this is convenient timing.

@geky geky added fixed? and removed needs investigation no idea what is wrong needs fix we know what is wrong labels Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants