Skip to content

fix: LittleFS Filename encoding UTF-8#444

Merged
Jason2866 merged 2 commits into
developfrom
lttlfs_utf8
Mar 27, 2026
Merged

fix: LittleFS Filename encoding UTF-8#444
Jason2866 merged 2 commits into
developfrom
lttlfs_utf8

Conversation

@Jason2866
Copy link
Copy Markdown

@Jason2866 Jason2866 commented Mar 27, 2026

Description:

Related issue (if applicable): LittleFS build can not support Taiwan traditional

  • Bug Fixes
    • Improved UTF-8 encoding support for filesystem operations

Checklist:

  • The pull request is done against the latest develop branch
  • Only relevant files were touched
  • Only one feature/fix was added per PR, more changes are allowed when changing boards.json
  • I accept the CLA

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 27, 2026

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: LittleFS Filename encoding UTF-8' directly matches the main objective of the PR, which is to fix LittleFS filename encoding to support UTF-8. The changes involve adding UTF-8 encoding specification for LittleFS filenames and partition table CSV parsing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lttlfs_utf8

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Jason2866
Copy link
Copy Markdown
Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 27, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@builder/main.py`:
- Line 303: The CSV reader opens the partitions file with encoding="utf-8" which
does not strip a UTF-8 BOM and can leave a leading '\ufeff' on the first line
(breaking line.startswith("#") comment checks); change the open call that
references partitions_csv to use encoding="utf-8-sig" so the BOM is removed
automatically before parsing (update the with open(partitions_csv, ...) call in
builder/main.py accordingly).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a7966332-159f-4ea9-99ce-54487bdb99dc

📥 Commits

Reviewing files that changed from the base of the PR and between 7aab252 and 43ed175.

📒 Files selected for processing (1)
  • builder/main.py

Comment thread builder/main.py Outdated
Remove encoding specification when opening the CSV file.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
builder/main.py (1)

303-303: ⚠️ Potential issue | 🟡 Minor

AI summary inconsistency: Partition CSV encoding not actually changed.

The AI summary states that partition-table CSV parsing was updated to specify UTF-8 encoding, but this line has no encoding parameter and is not marked as changed. The commit message confirms encoding was actually removed.

Without explicit encoding="utf-8-sig", this relies on system default encoding, which may not be UTF-8 on Windows systems, potentially breaking partition files containing non-ASCII characters.

If the PR intends to fully support UTF-8 for internationalization (e.g., Taiwan traditional characters as mentioned in the linked issue), the partition CSV parsing should also handle UTF-8 properly:

🔧 Suggested fix for UTF-8 BOM handling
-    with open(partitions_csv) as fp:
+    with open(partitions_csv, encoding="utf-8-sig") as fp:

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@builder/main.py` at line 303, The partition CSV is opened without an explicit
encoding which can break on non-UTF-8 systems; change the open call that uses
open(partitions_csv) to explicitly specify encoding="utf-8-sig" (or "utf-8" if
BOM handling is not needed) so partition files with non-ASCII characters are
parsed consistently across platforms; update any related reads that consume fp
(the variable created by open(partitions_csv)) to continue working with the
encoded stream.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@builder/main.py`:
- Around line 55-56: Remove the invalid `_lfs.FILENAME_ENCODING = "utf-8"`
assignment (the littlefs-python binding does not expose that attribute) and
instead handle UTF-8 at the application layer: delete the
`_lfs.FILENAME_ENCODING` line in builder/main.py and update any places that call
into the littlefs API to explicitly encode filenames to bytes (e.g.,
filename.encode('utf-8')) when writing/creating entries and decode bytes back to
str (e.g., bytes_name.decode('utf-8')) when reading directory entries or names;
if unsure about exact littlefs-python behavior, consult its docs/source to
confirm whether its API expects bytes or str and adapt encode/decode
accordingly.

---

Outside diff comments:
In `@builder/main.py`:
- Line 303: The partition CSV is opened without an explicit encoding which can
break on non-UTF-8 systems; change the open call that uses open(partitions_csv)
to explicitly specify encoding="utf-8-sig" (or "utf-8" if BOM handling is not
needed) so partition files with non-ASCII characters are parsed consistently
across platforms; update any related reads that consume fp (the variable created
by open(partitions_csv)) to continue working with the encoded stream.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 367ff3ca-64ac-43ca-b56d-9174b74ab73a

📥 Commits

Reviewing files that changed from the base of the PR and between 43ed175 and 3b6b7f6.

📒 Files selected for processing (1)
  • builder/main.py

Comment thread builder/main.py
Comment on lines +55 to +56
from littlefs import lfs as _lfs
_lfs.FILENAME_ENCODING = "utf-8"
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

littlefs-python FILENAME_ENCODING attribute documentation

💡 Result:

No FILENAME_ENCODING attribute exists in littlefs-python documentation or source references. Filenames are stored as ASCII strings per littlefs specification. The LFSConfig class includes name_max (default 255) to set maximum filename length, but no encoding parameter. File open supports standard Python encoding parameter for text mode I/O, not filesystem metadata encoding.

Citations:


Remove or replace the non-existent FILENAME_ENCODING attribute assignment.

The FILENAME_ENCODING attribute does not exist in littlefs-python. Setting _lfs.FILENAME_ENCODING = "utf-8" has no effect and will not enable UTF-8 filename encoding as intended. The littlefs specification stores filenames as ASCII strings, and the littlefs-python library provides no encoding parameter for filesystem-level filename handling. Verify the correct mechanism for UTF-8 filename support in littlefs-python before proceeding with this PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@builder/main.py` around lines 55 - 56, Remove the invalid
`_lfs.FILENAME_ENCODING = "utf-8"` assignment (the littlefs-python binding does
not expose that attribute) and instead handle UTF-8 at the application layer:
delete the `_lfs.FILENAME_ENCODING` line in builder/main.py and update any
places that call into the littlefs API to explicitly encode filenames to bytes
(e.g., filename.encode('utf-8')) when writing/creating entries and decode bytes
back to str (e.g., bytes_name.decode('utf-8')) when reading directory entries or
names; if unsure about exact littlefs-python behavior, consult its docs/source
to confirm whether its API expects bytes or str and adapt encode/decode
accordingly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRONG: The Attribut lfs.FILENAME_ENCODING is exposed from littlefs-python-binding and can be written.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the installed littlefs-python package for FILENAME_ENCODING attribute
pip show littlefs-python 2>/dev/null || true
python -c "from littlefs import lfs as _lfs; print(dir(_lfs))" 2>/dev/null || true
python -c "from littlefs import lfs as _lfs; print(_lfs.FILENAME_ENCODING)" 2>/dev/null || true

# Search for FILENAME_ENCODING in the installed package files
pip show -f littlefs-python 2>/dev/null | head -5 || true

# Find lfs module files
find / -name "lfs*.so" -o -name "lfs*.pyd" -o -name "lfs.py" -o -name "lfs.pyx" 2>/dev/null | grep -v proc | head -10 || true

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Try to find FILENAME_ENCODING in the littlefs-python GitHub source via gh or curl
gh api repos/jrast/littlefs-python/git/trees/HEAD --jq '.tree[].path' 2>/dev/null | head -20 || true

# Search for FILENAME_ENCODING in the littlefs-python repo
gh search code "FILENAME_ENCODING" --repo jrast/littlefs-python 2>/dev/null || true

# Also try searching across all of GitHub
gh search code "FILENAME_ENCODING lfs littlefs-python" 2>/dev/null | head -20 || true

Length of output: 469


@Jason2866, you are absolutely right — I apologize for the incorrect review. The GitHub source confirms that FILENAME_ENCODING is a real, typed, writable module-level attribute in src/littlefs/lfs.pyx:

FILENAME_ENCODING = 'ascii'  # default

It is declared as FILENAME_ENCODING: str = ... in the type stub (lfs.pyi) and is used internally as path.encode(FILENAME_ENCODING) for all path operations. Setting _lfs.FILENAME_ENCODING = "utf-8" is exactly the correct and intended mechanism to enable UTF-8 filename support (e.g., Traditional Chinese characters as referenced in the linked issue). My original comment was wrong.


✏️ Learnings added
Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 444
File: builder/main.py:55-56
Timestamp: 2026-03-27T11:08:05.033Z
Learning: In pioarduino/platform-espressif32 (builder/main.py), `lfs.FILENAME_ENCODING` is a real, typed, writable module-level attribute in littlefs-python (defined in `src/littlefs/lfs.pyx` as `FILENAME_ENCODING = 'ascii'` and declared in `lfs.pyi`). It controls how all LittleFS path/filename strings are encoded via `path.encode(FILENAME_ENCODING)`. Setting `_lfs.FILENAME_ENCODING = "utf-8"` after `from littlefs import lfs as _lfs` is the correct and intended way to enable UTF-8 filename support in LittleFS image creation. Do not flag this as invalid or non-existent.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 0
File: :0-0
Timestamp: 2025-12-23T17:13:00.012Z
Learning: In pioarduino/platform-espressif32 (builder/main.py), the LittleFS filesystem image builder uses parameters that intentionally differ from ESP-IDF defaults (read_size=1, prog_size=1, cache_size=block_size, lookahead_size=32, name_max=64 instead of ESP-IDF's 16, 16, 16, 16, 255 respectively). These values were determined through Arduino real-world testing and are necessary to prevent the firmware from reformatting the created filesystem image. Using ESP-IDF defaults causes reformatting issues.

Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 0
File: :0-0
Timestamp: 2025-12-28T16:31:36.068Z
Learning: In pioarduino/platform-espressif32 (builder/main.py), the LittleFS block size is fixed at 4096 bytes (0x1000) for all LittleFS operations. Auto-detection or trying multiple block sizes is unnecessary complexity.

Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 225
File: builder/main.py:99-108
Timestamp: 2025-07-13T20:03:29.695Z
Learning: In the ESP32 platform builder, the subprocess call to get site-packages directory in setup_python_paths() function is intentionally left without error handling because if this basic Python operation fails, it indicates a fundamental system issue that would cause the entire build process to fail anyway.

Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 246
File: builder/main.py:0-0
Timestamp: 2025-08-05T11:47:41.756Z
Learning: In the ESP32 platform builder, subprocess calls correctly use the virtual environment Python by explicitly passing PYTHON_EXE (which is set to the venv python path) as the first argument in the command array, rather than relying on environment variable propagation. The env=os.environ parameter only affects environment variables, not the executable path itself.

Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 371
File: builder/spiffsgen.py:620-629
Timestamp: 2025-12-27T16:33:38.710Z
Learning: In the pioarduino/platform-espressif32 project, when extracting SPIFFS files in builder/spiffsgen.py, Jason2866 prefers to check if rel_path is empty after lstrip('/') and skip such files with a warning message, rather than attempting to handle empty parent directories in os.makedirs calls. This treats empty paths as invalid/corrupted data.

Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 255
File: builder/penv_setup.py:265-275
Timestamp: 2025-08-10T16:59:15.533Z
Learning: In the ESP32 platform builder's penv_setup.py, the esptool_repo_path used in subprocess calls comes from PlatformIO's internal package management system (platform.get_package_dir()) which is a trusted source. Command injection concerns are not applicable here since this is build infrastructure code with controlled inputs, not user-facing code accepting arbitrary input.

Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 281
File: builder/frameworks/ulp.py:165-178
Timestamp: 2025-09-05T11:58:49.681Z
Learning: In the pioarduino/platform-espressif32 project, cmake executable paths in build scripts (like ulp.py) are provided as paths to the build engine (SCons/PlatformIO) rather than direct executable calls. The build engine handles cross-platform executable resolution automatically, so there's no need to add Windows .exe suffixes or special quoting - the current approach with str(Path(...) / "bin" / "cmake") works correctly across platforms.

Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 0
File: :0-0
Timestamp: 2025-10-19T17:30:42.403Z
Learning: The tool-esp-rom-elfs package must remain in COMMON_IDF_PACKAGES in platform.py because espidf.py requires it to set the ESP_ROM_ELF_DIR environment variable for the ESP-IDF build system, independent of the exception decoder monitor filter usage.

Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 281
File: builder/frameworks/component_manager.py:1047-1054
Timestamp: 2025-09-04T15:27:18.112Z
Learning: In the pioarduino/platform-espressif32 project, Jason2866 prefers not to add guard checks for missing arduino_libs_mcu in backup operations, preferring to let the operation proceed rather than silently skip when the directory might be missing.

Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 0
File: :0-0
Timestamp: 2025-09-23T12:35:35.508Z
Learning: For the pioarduino/platform-espressif32 project, Jason2866 has explicitly instructed to NEVER create issues or PRs outside of his repositories (pioarduino/*). All issue and PR creation must be strictly limited to his own repositories only, never in upstream or other repositories.

Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 0
File: :0-0
Timestamp: 2025-09-23T12:35:35.508Z
Learning: For the pioarduino/platform-espressif32 project, Jason2866 has explicitly instructed to NEVER create issues or PRs outside of his repositories (pioarduino/*). All issue and PR creation must be strictly limited to his own repositories only, never in upstream or other repositories.

Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 268
File: builder/penv_setup.py:87-96
Timestamp: 2025-08-26T14:52:29.044Z
Learning: For the pioarduino/platform-espressif32 project, Jason2866 prefers to trust system executables when falling back to PATH-based command resolution (like "uv"), rather than implementing additional validation checks for security concerns.

@Jason2866 Jason2866 merged commit 43810bc into develop Mar 27, 2026
1 check passed
@Jason2866 Jason2866 deleted the lttlfs_utf8 branch March 27, 2026 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant