expand:fix expand /dev/zero panic#10655
Closed
mattsu2020 wants to merge 11 commits intouutils:mainfrom
Closed
Conversation
Contributor
mattsu2020
commented
Feb 2, 2026
- Replaced line-by-line processing with byte stream processing for better performance
- Added ExpandState struct to track column position and line initialization state
- Implemented UTF-8 validation with configurable incomplete UTF-8 handling
- Added READ_BUF_SIZE constant for consistent buffer sizing
- Optimized UTF-8 character length calculation with utf8_expected_len function
- Improved tab expansion logic with stateful column tracking
- Enhanced error handling for incomplete UTF-8 sequences
- Replaced line-by-line processing with byte stream processing for better performance - Added ExpandState struct to track column position and line initialization state - Implemented UTF-8 validation with configurable incomplete UTF-8 handling - Added READ_BUF_SIZE constant for consistent buffer sizing - Optimized UTF-8 character length calculation with utf8_expected_len function - Improved tab expansion logic with stateful column tracking - Enhanced error handling for incomplete UTF-8 sequences
|
GNU testsuite comparison: |
Merging this PR will improve performance by ×2.6
Performance Changes
Comparing Footnotes
|
RenjiSann
reviewed
Feb 2, 2026
| } | ||
| } | ||
|
|
||
| struct ExpandState { |
Collaborator
There was a problem hiding this comment.
Can you add a brief docstring to this struct ?
RenjiSann
reviewed
Feb 2, 2026
Collaborator
|
This looks like it does bring a -30% in expand's performances though, so that's the opposite of an improvement |
Fixed a buffer overflow issue in the expand utility where the line buffer could exceed its maximum size limit. The fix implements proper buffer size checking and switching to stream mode when the buffer reaches the maximum line buffer size (1MB). This prevents memory issues when processing very long lines and ensures the utility continues to function correctly with large input files.
The stream mode logic was inverted, causing incorrect handling of input bytes. Fixed the conditional to properly process bytes in stream mode by extending the buffer and consuming processed data, while maintaining the existing line-based processing for non-stream mode.
…line detection Replace manual byte-by-byte newline searching with memchr library for improved performance when processing large files. This change optimizes the expand utility's line processing by using efficient memory scanning to find newlines, reducing CPU overhead during text expansion operations.
Move memchr import to top of imports section for consistency with Rust style guidelines
Replace stream-based processing with direct buffered reads for regular files to improve performance and simplify logic. Regular files now use a single read_until loop instead of the more complex stream mode handling.
|
GNU testsuite comparison: |
sylvestre
reviewed
Feb 3, 2026
Co-authored-by: Sylvestre Ledru <sylvestre@debian.org>
Co-authored-by: Dorian Péron <72708393+RenjiSann@users.noreply.github.com>
…te input test This commit addresses two issues in the expand utility: 1. Optimizes UTF-8 character classification by handling ASCII characters (including tabs and backspaces) more efficiently before processing multi-byte UTF-8 sequences 2. Adds proper error handling for the infinite input test to handle permission denied errors when accessing /dev/zero and /dev/full, making the test more robust across different environments
…sion Replace manual byte scanning with memchr_iter for ASCII tab expansion, improving performance by using optimized memory search functions instead of character-by-character iteration when processing ASCII text without backspaces.
Rename the ExpandState struct to ExpandContext to better reflect its purpose of maintaining column position and leading-space state while streaming input. This improves code readability and follows more conventional naming patterns.
Contributor
Author
|
Closing this PR |
Contributor
|
But this PR has some performance++ at #10655 (comment) . |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.