Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing support for [[:space:]] match group in the ignore crate #2962

Open
1 task done
weiznich opened this issue Jan 6, 2025 · 6 comments · May be fixed by #2963
Open
1 task done

Missing support for [[:space:]] match group in the ignore crate #2962

weiznich opened this issue Jan 6, 2025 · 6 comments · May be fixed by #2963
Labels
enhancement An enhancement to the functionality of the software.

Comments

@weiznich
Copy link

weiznich commented Jan 6, 2025

Please tick this box to confirm you have reviewed the above.

  • I have a different issue.

What version of ripgrep are you using?

ignore = "0.4.23"

How did you install ripgrep?

Cargo

What operating system are you using ripgrep on?

Fedora

NAME="Fedora Linux"
VERSION="41 (Workstation Edition)"
RELEASE_TYPE=stable
ID=fedora
VERSION_ID=41
VERSION_CODENAME=""
PLATFORM_ID="platform:f41"
PRETTY_NAME="Fedora Linux 41 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:41"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f41/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=41
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=41
SUPPORT_END=2025-12-15
VARIANT="Workstation Edition"
VARIANT_ID=workstation

Describe your bug.

The ignore crate fails to handle certain character classes as part of it's matcher implementation. I noticed this for [[:space:]] which happens to be contained in some local .gitattributes file I try to parse and use via the ignore crate to teach [jj'](https://github.com/jj-vcs/jj/) to just ignore git-lfs files. Git itself [documents the pattern syntax](https://git-scm.com/docs/gitattributes) to be the same (beside minor restrictions) than that one from .gitignore` files, therefore I've tried to use the ignore crate for this.

What are the steps to reproduce the behavior?

Run the following code and see the assertion fail:

    let mut ignore_builder = ignore::gitignore::GitignoreBuilder::new("");

    ignore_builder.add_line(None, "giga_las/samples/MOAT[[:space:]]HOUSE[[:space:]]FARM[[:space:]]BH_Raw1[[:space:]](Disks).las").unwrap();
    ignore_builder
        .add_line(None, "giga_las/samples/MOAT_HOUSE_FARM_BH_Raw1_(Disks).las")
        .unwrap();

    let ignore = ignore_builder.build().unwrap();

    assert!(matches!(
        ignore.matched(
            "giga_las/samples/MOAT_HOUSE_FARM_BH_Raw1_(Disks).las",
            false
        ),
        ignore::Match::Ignore(_)
    ));

    assert!(
        matches!(
            ignore.matched(
                "giga_las/samples/MOAT HOUSE FARM BH Raw1 (Disks).las",
                false
            ),
            ignore::Match::Ignore(_)
        ),
        "Did not match, did not satisfy the [[:space:]] matchers"
    );

What is the actual behavior?

The assertion fails

What is the expected behavior?

The assertion passes. See the character class tests from the git repository itself here: https://github.com/git/git/blob/8d8387116ae8c3e73f6184471f0c46edbd2c7601/t/t3070-wildmatch.sh#L144 for future examples

@BurntSushi
Copy link
Owner

The only [[:space:]] I see on the gitattributes docs is in a regex, not a glob.

Now, the tests you link do seem to suggest that [[:space:]] and the like are supported in globs as well, but I can't tell for sure.

If git supports this syntax, then I'm probably open to supporting it as well. But I probably won't be adding it any time soon.

@BurntSushi
Copy link
Owner

It might help if you can find some git docs for the specific glob pattern syntax that is supported.

@BurntSushi BurntSushi added the question An issue that is lacking clarity on one or more points. label Jan 6, 2025
@weiznich
Copy link
Author

weiznich commented Jan 6, 2025

I've not found a documentation entry for this, but a quick test with git 2.47.1 with the following commands indicate that this also seems to work for .gitignore files:

echo "/foo[[:space:]]bar.txt" >> .gitignore
git add .gitignore
git commit -m "Add [[:space:]] matcher"
touch foo\ bar.txt
git status
# foo\ bar.txt is not listed by git status

@BurntSushi
Copy link
Owner

Blech. Glob implementations are truly the wild west. I don't think I've ever seen that syntax in a glob before.

@okdana
Copy link
Contributor

okdana commented Jan 6, 2025

it's specified in posix that shell patterns, fnmatch(3), etc. support character classes the same as in regular expressions:

A <left-square-bracket> shall introduce a bracket expression if the characters following it meet the requirements for bracket expressions stated in XBD 9.3.5 RE Bracket Expression

bash and zsh support for character classes is described here:

the gitignore documentation doesn't explicitly say anything about it, but it strongly implies that it uses fnmatch(3):

An asterisk "*" matches anything except a slash. The character "?" matches any one character except "/". The range notation, e.g. [a-zA-Z], can be used to match one of the characters in a range. See fnmatch(3) and the FNM_PATHNAME flag for a more detailed description.

it doesn't, though. it uses a modified version of rsync's wildmatch(): https://github.com/git/git/blob/master/wildmatch.c

which i guess is good because otherwise it would be at the mercy of platform-specific inconsistencies like this (from freebsd and macos fnmatch(3)):

The current implementation of the fnmatch() function does not conform to IEEE Std 1003.2 (“POSIX.2”). Collating symbol expressions, equivalence class expressions and character class expressions are not supported.

@BurntSushi BurntSushi added enhancement An enhancement to the functionality of the software. and removed question An issue that is lacking clarity on one or more points. labels Jan 6, 2025
@BurntSushi
Copy link
Owner

Fair enough. I'm fine with adding stuff like this, but I draw the line at locale related shenanigans.

weiznich added a commit to weiznich/ripgrep that referenced this issue Jan 7, 2025
This commit adds support for character classes to the glob matching
implemented by globset. It just translates every class to the
corresponding regex class.

Fixes BurntSushi#2962
@weiznich weiznich linked a pull request Jan 7, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to the functionality of the software.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants