Skip to content

expand:fix expand /dev/zero panic#10655

Closed
mattsu2020 wants to merge 11 commits intouutils:mainfrom
mattsu2020:expand_zero
Closed

expand:fix expand /dev/zero panic#10655
mattsu2020 wants to merge 11 commits intouutils:mainfrom
mattsu2020:expand_zero

Conversation

@mattsu2020
Copy link
Copy Markdown
Contributor

  • Replaced line-by-line processing with byte stream processing for better performance
  • Added ExpandState struct to track column position and line initialization state
  • Implemented UTF-8 validation with configurable incomplete UTF-8 handling
  • Added READ_BUF_SIZE constant for consistent buffer sizing
  • Optimized UTF-8 character length calculation with utf8_expected_len function
  • Improved tab expansion logic with stateful column tracking
  • Enhanced error handling for incomplete UTF-8 sequences

- Replaced line-by-line processing with byte stream processing for better performance
- Added ExpandState struct to track column position and line initialization state
- Implemented UTF-8 validation with configurable incomplete UTF-8 handling
- Added READ_BUF_SIZE constant for consistent buffer sizing
- Optimized UTF-8 character length calculation with utf8_expected_len function
- Improved tab expansion logic with stateful column tracking
- Enhanced error handling for incomplete UTF-8 sequences
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 2, 2026

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Feb 2, 2026

Merging this PR will improve performance by ×2.6

⚡ 2 improved benchmarks
✅ 282 untouched benchmarks
⏩ 38 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation expand_many_lines[100000] 96.4 ms 36.4 ms ×2.6
Simulation expand_custom_tabstops[50000] 25.1 ms 24.2 ms +3.9%

Comparing mattsu2020:expand_zero (0442f9b) with main (dbfbd0c)

Open in CodSpeed

Footnotes

  1. 38 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

}
}

struct ExpandState {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a brief docstring to this struct ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@RenjiSann
Copy link
Copy Markdown
Collaborator

This looks like it does bring a -30% in expand's performances though, so that's the opposite of an improvement

Fixed a buffer overflow issue in the expand utility where the line buffer could exceed its maximum size limit. The fix implements proper buffer size checking and switching to stream mode when the buffer reaches the maximum line buffer size (1MB). This prevents memory issues when processing very long lines and ensures the utility continues to function correctly with large input files.
@mattsu2020 mattsu2020 marked this pull request as draft February 2, 2026 23:10
The stream mode logic was inverted, causing incorrect handling of input bytes. Fixed the conditional to properly process bytes in stream mode by extending the buffer and consuming processed data, while maintaining the existing line-based processing for non-stream mode.
…line detection

Replace manual byte-by-byte newline searching with memchr library for improved performance when processing large files. This change optimizes the expand utility's line processing by using efficient memory scanning to find newlines, reducing CPU overhead during text expansion operations.
Move memchr import to top of imports section for consistency with Rust style guidelines
Replace stream-based processing with direct buffered reads for regular files to improve performance and simplify logic. Regular files now use a single read_until loop instead of the more complex stream mode handling.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 3, 2026

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/follow-name (passes in this run but fails in the 'main' branch)

mattsu2020 and others added 5 commits February 3, 2026 17:55
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
Co-authored-by: Dorian Péron <72708393+RenjiSann@users.noreply.github.com>
…te input test

This commit addresses two issues in the expand utility:

1. Optimizes UTF-8 character classification by handling ASCII characters (including tabs and backspaces) more efficiently before processing multi-byte UTF-8 sequences
2. Adds proper error handling for the infinite input test to handle permission denied errors when accessing /dev/zero and /dev/full, making the test more robust across different environments
…sion

Replace manual byte scanning with memchr_iter for ASCII tab expansion, improving performance by using optimized memory search functions instead of character-by-character iteration when processing ASCII text without backspaces.
Rename the ExpandState struct to ExpandContext to better reflect its purpose of maintaining column position and leading-space state while streaming input. This improves code readability and follows more conventional naming patterns.
@mattsu2020 mattsu2020 marked this pull request as ready for review February 3, 2026 12:09
@mattsu2020
Copy link
Copy Markdown
Contributor Author

Closing this PR
As it has been addressed in the PR below.

#10657

@mattsu2020 mattsu2020 closed this Feb 4, 2026
@mattsu2020 mattsu2020 deleted the expand_zero branch February 4, 2026 01:32
@oech3
Copy link
Copy Markdown
Contributor

oech3 commented Feb 4, 2026

But this PR has some performance++ at #10655 (comment) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants