Skip to content

Conversation

@RenjiSann
Copy link
Collaborator

This PR brings initial support for Locale-aware quoting.
Basically, the code fetches the locale data (for now it looks at LC_ALL and LC_COLLATE, might have to fix it in the future).

It will parse the second part of the locale if existing to deduce the encoding: C.UTF-8, fr_FR.UTF-8 are UTF-8 encoded, while en_US and C default to simple ASCII (should be ISO-8859-1, but there is no support for it right now).

Disclaimer: Right now, only UTF-8 and ASCII encodings are supported. Further support will be discussed as it may be less trivial to implement.

In order to do this, I've heavily refactored the quoting_style code:

  1. Moved code to a subfolder to split stuff and keep everything readable
  2. Introduced a Quoter trait to factorize the interface of all quoting mechanisms
/// Common interface of quoting mecanisms.
trait Quoter {
    /// Push a valid character.
    fn push_char(&mut self, input: char);

    /// Push a sequence of valid characters.
    fn push_str(&mut self, input: &str) {
        for c in input.chars() {
            self.push_char(c);
        }
    }

    /// Push a continuous slice of invalid data wrt the encoding used to
    /// decode the stream.
    fn push_invalid(&mut self, input: &[u8]);

    /// Apply post-processing on the constructed buffer and return it.
    fn finalize(self: Box<Self>) -> Vec<u8>;
}
  1. Made use of the trait to process the input through push_char() or push_invalid() depending on the encoding.
  2. Patched the tests to accomodate for encoding-aware checks.

@RenjiSann RenjiSann requested a review from sylvestre June 14, 2025 15:21
@RenjiSann RenjiSann force-pushed the locale-aware-quoting branch 2 times, most recently from 98d25ad to 9715f06 Compare June 14, 2025 15:24
@RenjiSann
Copy link
Collaborator Author

Force pushed to add License headers and reformat (diff)

@RenjiSann RenjiSann force-pushed the locale-aware-quoting branch 4 times, most recently from 48554a2 to 820134e Compare June 14, 2025 15:53
@RenjiSann RenjiSann marked this pull request as draft June 14, 2025 15:54
@RenjiSann RenjiSann force-pushed the locale-aware-quoting branch 3 times, most recently from f104f7d to 320aae1 Compare June 14, 2025 16:14
@RenjiSann RenjiSann marked this pull request as ready for review June 14, 2025 16:32
@RenjiSann RenjiSann added J - Locale locale related issue J - Encoding encoding (UTF-8, UTF-16) related issue labels Jun 14, 2025
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/printf/printf-quote is no longer failing!

@RenjiSann
Copy link
Collaborator Author

@RenjiSann RenjiSann force-pushed the locale-aware-quoting branch 2 times, most recently from 6513a29 to 5dbd1ef Compare June 14, 2025 23:05
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/misc/usage_vs_getopt (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/printf/printf-quote is no longer failing!

@sylvestre sylvestre force-pushed the locale-aware-quoting branch from 5dbd1ef to 9ad4f01 Compare June 24, 2025 08:42
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/printf/printf-quote is no longer failing!

@RenjiSann RenjiSann force-pushed the locale-aware-quoting branch from 9ad4f01 to 5237962 Compare June 24, 2025 09:49
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/printf/printf-quote is no longer failing!

@RenjiSann RenjiSann force-pushed the locale-aware-quoting branch from 1921183 to 3fe7f94 Compare June 24, 2025 22:50
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/printf/printf-quote is no longer failing!

@sylvestre sylvestre merged commit 2b5dfe6 into uutils:main Jun 25, 2025
76 checks passed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RenjiSann is it intentional that this file is in test and not somewhere in tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that's a remainder of me testing stuff :/
I will open a merge request to revert it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

J - Encoding encoding (UTF-8, UTF-16) related issue J - Locale locale related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants