Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor release: v2.6 #814

Merged
merged 125 commits into from
May 4, 2023
Merged

Minor release: v2.6 #814

merged 125 commits into from
May 4, 2023

Conversation

geky
Copy link
Member

@geky geky commented May 1, 2023

Thanks for everyone who contributed/provided input into this!

Bringing in:

#497 Forward-looking erase-state CRCs
#572 Add littlefs-disk-img-viewer to README
#718 Add "chamelon" to the related projects section.
#752 Add test/bench runners, benchmarks, additional scripts
#800 Fix block-boundary truncate issues
#801 Add an assert for truthy-preserving bool conversions
#802 Add explicit assert for minimum block size of 128 bytes
#805 Fix issue where seeking to end-of-directory return LFS_ERR_INVAL
#807 Add littlefs2 crate to README
#809 Adopt Brent's algorithm for cycle detection
#811 Fix issue where lfs_fs_deorphan may run more than needed
#812 Add lfs_fs_mkconsistent

Draft of release notes follows:

Note!

This release bumps the on-disk minor version of littlefs from lfs2.0 -> lfs2.1.

This change is backwards-compatible, but after the first write with the new version, the image on disk will no longer be mountable by older versions of littlefs.

Read-only operations will not change the on-disk version, and the on-disk version can be explicitly bumped with lfs_fs_mkconsistent if desired.

This is the first on-disk minor version bump for littlefs, and is untested waters, so do open a GitHub issue if this creates any problems.

What's new?

  • FCRCs - forward-looking erase-state CRCs (#497)

    Found by @pjsg, littlefs was relying on a bad assumption about the order/atomicity of prog operations to detect incomplete metadata commits. In the case of power-failure, this could lead to littlefs progging the same location multiple times without an erase, leading to best case fixable corruption, worst case significantly reduced data retention.

    The solution implemented here is to store a checksum of the next erased prog-size in every commit. This way any incomplete commits that modify the erased state can be detected.

    In theory this could be implemented in a forward-compatible way, but littlefs currently treats all CRC tags as commit-ending CRC tags, so only updated versions of littlefs will be able to read metadata blocks containing FCRCs. This is the reason for the on-disk minor version bump. On the bright-side, future CRC-related extensions will be forward-compatible.

  • A complete overhaul of the testing/bench framework (#752)

    This does not affect users, but involves a long list of needed changes to make it easier to develop/bench littlefs:

    • Tests/benches now run in parallel, and much faster thanks to shared processes.
    • Tests are more reproducible, and narrow bugs can be reproduced locally repeatedly without leaving GDB.
    • More options for power-loss testing. Exhaustive testing of all 1 and 2-deep powerlosses have been added to CI.
    • More scripts related to perf measurements, plotting, tracing, etc.
    • A make help rule to help use/document the Makefile, which is sort of the entry point for littlefs development.
    • Added a CI rule to have @geky-bot comment on each PR with CI measurements (examples here). We'll see if this is an improvement over the GitHub statuses.

    The increased power-loss coverage has already found a couple of difficult to hit bugs (eba5553 and 0b11ce0), and the bench runner will be useful for evaluating littlefs's performance moving forward.

  • Added a couple more asserts to help common implementation issues (#801, #802)

  • Adopted Brent's algorithm for cycle detection (#809)

    This should significantly speed up cycle detection in the case the metadata linked-list contains a cycle.

  • Added lfs_fs_mkconsistent as proposed by @kasper0 (#812)

    lfs_fs_mkconsistent allows users to front-load the internal consistency checks littlefs needs to run before it can write to the filesystem.

    Conveniently this also allows a way to force littlefs to write the new on-disk minor version if that is desired.

  • A number of bug fixes, thanks to @ghost, @ajaybhargav, @colin-foster-in-advantage, @rvanschoren, @hgspbs, @sosthene-nitrokey, @kasper0, @Ldd309, and others (#800, #805, #811):

    • #800 - Fixed issues with truncate around block-aligned boundaries
    • #805 - Fixed issue where seeking to end-of-directory return LFS_ERR_INVAL
    • #811 - Fixed issue where lfs_fs_deorphan may run more than needed
    • eba5553 - Fixed issue where half-orphans can hide full orphans in nested power-loss situations
    • 0b11ce0 - Fixed incorrect calculation of extra space needed in mdir blocks
    • ba1c764 - Fixed issue where deorphan could get stuck circling between two half-orpans in power-loss situations
  • And a number of links to useful community projects related to littlefs, thanks to @tniessen, @yomimono, @elpiel (#572, #718, #807):

    If you have a littlefs-related project or a project that uses littlefs, consider opening a PR. Eventually my plan is to move these to a user-contributable wiki.

tniessen and others added 30 commits June 21, 2021 18:52
This is to try a different design for testing, the goals are to make the
test infrastructure a bit simpler, with clear stages for building and
running, and faster, by avoiding rebuilding lfs.c n-times.
This moves defines entirely into the runtime of the test_runner,
simplifying thing and reducing the amount of generated code that needs
to be build, at the cost of limiting test-defines to uintmax_t types.

This is implemented using a set of index-based scopes (created by
test.py) that allow different layers to override defines from other
layers, accessible through the global `test_define` function.

layers:
1. command-line overrides
2. per-case defines
3. per-geometry defines
- Indirect index map instead of bitmap+sparse array
- test_define_t and test_type_t
- Added back conditional filtering
- Added suite-level defines and filtering
- Added filtering based on suite, case, perm, type, geometry
- Added --skip, --count, and --every (will be used for parallelism)
- Implemented --list-defines
- Better helptext for flags with arguments
- Other minor tweaks
In the test-runner, defines are parameterized constants (limited
to integers) that are generated from the test suite tomls resulting
in many permutations of each test.

In order to make this efficient, these defines are implemented as
multi-layered lookup tables, using per-layer/per-scope indirect
mappings. This lets the test-runner and test suites define their
own defines with compile-time indexes independently. It also makes
building of the lookup tables very efficient, since they can be
incrementally populated as we expand the test permutations.

The four current define layers and when we need to build them:

layer                           defines         predefine_map   define_map
user-provided overrides         per-run         per-run         per-suite
per-permutation defines         per-perm        per-case        per-perm
per-geometry defines            per-perm        compile-time    -
default defines                 compile-time    compile-time    -
- Added --disk/--trace/--output options for information-heavy debugging

- Renamed --skip/--count/--every to --start/--stop/--step.

  This matches common terms for ranges, and frees --skip for being used
  to skip test cases in the future.

- Better handling of SIGTERM, now all tests are killed, reported as
  failures, and testing is halted irregardless of -k.

  This is a compromise, you throw away the rest of the tests, which
  is normally what -k is for, but prevents annoying-to-terminate
  processes when debugging, which is a very interactive process.
- Expanded test defines to allow for lists of configurations

  These are useful for changing multi-dimensional test configurations
  without leading to extremely large and less useful configuration
  combinations.

- Made warnings more visible durring test parsing

- Add lfs_testbd.h to implicit test includes

- Fixed issue with not closing files in ./scripts/explode_asserts.py

- Add `make test_runner` and `make test_list` build rules for
  convenience
- Added internal tests, which can run tests inside other source files,
  allowing access to "private" functions and data

  Note this required a special bit of handling our defining and later
  undefining test configurations to not polute the namespace of the
  source file, since it can end up with test cases from different
  suites/configuration namespaces.

- Removed unnecessary/unused permutation argument to generated test
  functions.

- Some cleanup to progress output of test.py.
Previously test defines were implemented using layers of index-mapped
uintmax_t arrays. This worked well for lookup, but limited defines to
constants computed at compile-time. Since test defines themselves are
actually calculated at _run-time_ (yeah, they have deviated quite
a bit from the original, compile-time evaluated defines, which makes
the name make less sense), this means defines can't depend on other
defines. Which was limiting since a lot of test defines relied on
defines generated from the geometry being tested.

This new implementation uses callbacks for the per-case defines. This
means they can easily contain full C statements, which can depend on
other test defines. This does means you can create infinitely-recursive
defines, but the test-runner will just break at run-time so don't do that.

One concern is that there might be a performance hit for evaluating all
defines through callbacks, but if there is it is well below the noise
floor:

- constants: 43.55s
- callbacks: 42.05s
- Added --exec for wrapping the test-runner with external commands, such as
  Qemu or Valgrind.

- Added --valgrind, which just aliases --exec=valgrind with a few extra
  flags useful during testing.

- Dropped the "valgrind" type for tests. These aren't separate tests
  that run in the test-runner, and I don't see a need for disabling
  Valgrind for any tests. This can be added back later if needed.

- Readded support for dropping directly into gdb after a test failure,
  either at the assert failure, entry point of test case, or entry point
  of the test runner with --gdb, --gdb-case, or --gdb-main.

- Added --isolate for running each test permutation in its own process,
  this is required for associating Valgrind errors with the right test
  case.

- Fixed an issue where explicit test identifier conflicted with
  per-stage test identifiers generated as a part of --by-suite and
  --by-case.
This mostly required names for each test case, declarations of
previously-implicit variables since the new test framework is more
conservative with what it declares (the small extra effort to add
declarations is well worth the simplicity and improved readability),
and tweaks to work with not-really-constant defines.

Also renamed test_ -> test, replacing the old ./scripts/test.py,
unfortunately git seems to have had a hard time with this.
This simplifies the interaction between code generation and the
test-runner.

In theory it also reduces compilation dependencies, but internal tests
make this difficult.
A small mistake in test.py's control flow meant the failing test job
would succesfully kill all other test jobs, but then humorously start
up a new process to continue testing.
GCC is a bit annoying here, it can't generate .cgi files without
generating the related .o files, though I suppose the alternative risks
duplicating a large amount of compilation work (littlefs is really
a small project).

Previously we rebuilt the .o files anytime we needed .cgi files
(callgraph info used for stack.py). This changes it so we always
built .cgi files as a side-effect of compilation. This is similar
to the .d file generation, though may be annoying if the system
cc doesn't support --callgraph-info.
This also adds coverage support to the new test framework, which due to
reduction in scope, no longer needs aggregation and can be much
simpler. Really all we need to do is pass --coverage to GCC, which
builds its .gcda files during testing in a multi-process-safe manner.

The addition of branch coverage leverages information that was available
in both lcov and gcov.

This was made easier with the addition of the --json-format to gcov
in GCC 9.0, however the lax backwards compatibility for gcov's
intermediary options is a bit concerning. Hopefully --json-format
sticks around for a while.
These scripts can't easily share the common logic, but separating
field details from the print/merge/csv logic should make the common
part of these scripts much easier to create/modify going forward.

This also tweaked the behavior of summary.py slightly.
On one hand this isn't very different than the source annotation in
gcov, on the other hand I find it a bit more readable after a bit of
experimentation.
"chamelon" implements a subset of littlefs (no global move state or
singly-linked list threaded through the directory tree) for use in the
MirageOS library operating system project. It is written entirely in
OCaml and is interoperable (with the above caveats) with the reference
implementation via FUSE.
Also renamed GCI -> CI, this holds .ci files, though there is a risk
of confusion with continuous integration.

Also added unused but generated .ci files to clean rule.
- Renamed explode_asserts.py -> pretty_asserts.py, this name is
  hopefully a bit more descriptive
- Small cleanup of the parser rules
- Added recognization of memcmp/strcmp => 0 statements and generate
  the relevant memory inspecting assert messages

I attempted to fix the incorrect column numbers for the generated
asserts, but unfortunately this didn't go anywhere and I don't think
it's actually possible.

There is no column control analogous to the #line directive. I thought
you might be able to intermix #line directives to put arguments at the
right column like so:

    assert(a == b);

    __PRETTY_ASSERT_INT_EQ(
    #line 1
           a,
    #line 1
                b);

But this doesn't work as preprocessor directives are not allowed in
macros arguments in standard C. Unfortunately this is probably not
possible to fix without better support in the language.
Yes this is more expensive, since small programs need to rewrite the
whole block in order to conform to the block device API. However, it
reduces code duplication and keeps all of the test-related block device
emulation in lfs_testbd.

Some people have used lfs_filebd/lfs_rambd as a starting point for new block
devices and I think it should be clear that erase does not need to have side
effects. Though to be fair this also just means we should have more
examples of block devices...
On one hand this seems like the wrong place for these tests, on the
other hand, it's good to know that the block device is behaving as
expected when debugging the filesystem.

Maybe this should be moved to an external program for users to test
their block devices in the future?
The main change here from the previous test framework design is:

1. Powerloss testing remains in-process, speeding up testing.

2. The state of a test, included all powerlosses, is encoded in the
   test id + leb16 encoded powerloss string. This means exhaustive
   testing can be run in CI, but then easily reproduced locally with
   full debugger support.

   For example:

   ./scripts/test.py test_dirs#reentrant_many_dir#10#1248g1g2 --gdb

   Will run the test test_dir, case reentrant_many_dir, permutation #10,
   with powerlosses at 1, 2, 4, 8, 16, and 32 cycles. Dropping into gdb
   if an assert fails.

The changes to the block-device are a work-in-progress for a
lazily-allocated/copy-on-write block device that I'm hoping will keep
exhaustive testing relatively low-cost.
With more features being added to test.py, the one-line status is
starting to get quite long and pass the ~80 column readability
heuristic. To make this worse this clobbers the terminal output
when the terminal is not wide enough.

Simple solution is to disable line-wrapping, potentially printing
some garbage if line-wrapping-disable is not supported, but also
printing a final status update to fix any garbage and avoid a race
condition where the script would show a non-final status.

Also added --color which disables any of this attempting-to-be-clever
stuff.
Before this was available implicitly by supporting both rambd and filebd
as backends, but now that testbd is a bit more complicated and no longer
maps directly to a block-device, this needs to be explicitly supported.
These have no real purpose other than slowing down the simulation
for inspection/fun.

Note this did reveal an issue in pretty_asserts.py which was clobbering
feature macros. Added explicit, and maybe a bit hacky, #undef _FEATURE_H
to avoid this.
As expected this takes a significant amount of time (~10 minutes for all
1 powerlosses, >10 hours for all 2 powerlosses) but this may be reducible in
the future by optimizing tests for powerloss testing. Currently
test_files does a lot of work that doesn't really have testing value.
geky and others added 23 commits April 21, 2023 00:28
This is a bit tricky since we need two different version of littlefs in
order to test for most compatibility concerns.

Fortunately we already have scripts/changeprefix.py for version-specific
symbols, so it's not that hard to link in the previous version of
littlefs in CI as a separate set of symbols, "lfsp_" in this case.

So that we can at least test the compatibility tests locally, I've added
an ifdef against the expected define "LFSP" to define a set of aliases
mapping "lfsp_" symbols to "lfs_" symbols. This is manual at the moment,
and a bit hacky, but gets the job done.

---

Also changed BUILDDIR creation to derive subdirectories from a few
Makefile variables. This makes the subdirectories less manual and more
flexible for things like LFSP. Note this wasn't possible until BUILDDIR
was changed to default to "." when omitted.
This uses the "github.event.pull_request.base.ref" variable as the
"lfsp" target for compatibility testing.
This just means a rewrite of the superblock entry with the new minor
version.

Though it's interesting to note, we don't need to rewrite the superblock
entry until the first write operation in the filesystem, an optimization
that is already in use for the fixing of orphans and in-flight moves.

To keep track of any outdated minor version found during lfs_mount, we
can carve out a bit from the reserved bits in our gstate. These are
currently used for a counter tracking the number of orphans in the
filesystem, but this is usually a very small number so this hopefully
won't be an issue.

In-device gstate tag:

  [--       32      --]
  [1|- 11 -| 10 |1| 9 ]
   ^----^-----^--^--^-- 1-bit has orphans
        '-----|--|--|-- 11-bit move type
              '--|--|-- 10-bit move id
                 '--|-- 1-bit needs superblock
                    '-- 9-bit orphan count
See SPEC.md for more info.

Also considered adding an explanation to DESIGN.md, but there's not a
great place for it. Maybe FCRCs are too low-level for the high-level
design document. Though may be worth reconsidering if DESIGN.md gets
revisited.
Add test/bench runners, benchmarks, additional scripts
Adopt Brent's algorithm for cycle detection
Add "chamelon" to the related projects section.
Add an assert for truthy-preserving bool conversions
Add explicit assert for minimum block size of 128 bytes
Fix issue where seeking to end-of-directory return LFS_ERR_INVAL
The underlying issue is that lfs_fs_deorphan did not updating gstate
correctly. The way it determined if there are any orphans remaining in
the filesystem was by subtracting the number of found orphans from an
internal counter.

This internal counter is a leftover from a previous implementation that
allowed leaving the lfs_fs_deorphan loop early if we know the number of
expected orphans. This can happen during recursive mdir relocations, but
with only a single bit in the gstate, can't happen during mount. If we
detect orphans during mount, we set this internal counter to 1, assuming
we will find at least one orphan.

But this presents a problem, what if we find _no_ orphans? If this happens
we never decrement the internal counter of orphans, so we would never
clear the bit in the gstate. This leads to a running lfs_fs_deorphan
on more-or-less every mutable operation in the filesystem, resulting in
an extreme performance hit.

The solution here is to not subtract the number of found orphans, but assume
that when our lfs_fs_deorphan loop finishes, we will have no orphans, because
that's the whole point of lfs_fs_deorphan.

Note that the early termination of lfs_fs_deorphan was dropped because
it would not actually change the runtime complexity of lfs_fs_deorphan,
adds code cost, and risks fragile corner cases such as this one.

---

Also added tests to assert we run lfs_fs_deorphan at most once.

Found by kasper0 and Ldd309
lfs_fs_mkconsistent allows running the internal consistency operations
(desuperblock/deorphan/demove) on demand and without any other
filesystem changes.

This can be useful for front-loading and persisting consistency operations
when you don't want to pay for this cost on the first write to the
filesystem.

Conveniently, this also offers a way to force the on-disk minor version
to bump, if that is wanted behavior.

Idea from kasper0
Fix issue where lfs_fs_deorphan may run more than needed
@geky geky added this to the v2.6 milestone May 1, 2023
@geky
Copy link
Member Author

geky commented May 1, 2023

@geky-bot won't run until this is merged, so here I am pretending to be the bot:

Tests passed ✓, Code: 16556 B, Stack: 1432 B, Structs: 772 B
Code Stack Structs Coverage
Default 16556 B 1432 B 772 B Lines 2289/2469 lines
Readonly 5990 B 448 B 772 B Branches 1174/1496 branches
Threadsafe 17362 B 1432 B 780 B Benchmarks
Migrate 18240 B 1736 B 776 B Readed 29369693876 B
Error-asserts 17192 B 1424 B 772 B Proged 1482874766 B
Erased 1568888832 B

@geky
Copy link
Member Author

geky commented May 4, 2023

Time to merge 👍

@geky geky merged commit ec3ec86 into master May 4, 2023
@geky
Copy link
Member Author

geky commented May 4, 2023

Humorously, GitHub now thinks littlefs is a python project
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants