Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support single character wildcard in Microsoft.Extensions.FileSystemGlobbing #82406

Open
Gnbrkm41 opened this issue Feb 20, 2023 · 2 comments
Open

Comments

@Gnbrkm41
Copy link
Contributor

Gnbrkm41 commented Feb 20, 2023

Background and motivation

Currently Microsoft.Extensions.FileSystemGlobbing only has support for wildcard that matches anything between 0 and infinite times - there is not a wildcard character that matches only once.

This is problematic if you wanted to match files that have a certain character in a certain position. The particular problem I want to solve is that I have a bunch of files that are named as an arbitrary n-digit ID, and I want to only match files that have a '10' in the 5~6th position, so for example:

10000000
01000000
00100000
00010000
00001000
00000100

Here I would only want the 5th entry (000010) to match, however with the current implementation the closest you can get would be to specify *10* as the pattern - but this also returns all other entries as well, so you'd need extra code to actually filter out what you want. This is annoying as I'd rather not want extra bits of code (or have to look for other libraries when something really close is there).

It honestly surprised me that such functionality does not exist - even the most basic Command Prompt in Windows has support for ? as something that matches only a single character, and there's a few APIs in .NET that does something similar such as DirectoryInfo.EnumerateFiles. Wikipedia, as another example, lists ? as one of "the most common wildcards".

API Proposal

There is no changes in the API surface (as in new overloads / adding attributes etc). Rather, this involves a change in how the Microsoft.Extensions.FileSystemGlobbing.Matcher interprets the include / exclude patterns added by AddInclude and AddExclude methods, by adding a new special-treated character.

Currently, it appears that only /, * and ** are specially treated when interpreting the patterns. My proposal is to add ? to the list of special-treated characters - when inside a pattern, it would indicate "match any single character" (excluding none).

Why question mark: Most (if not all?) tools that accept glob patterns uses ? as the single-character wildcard. Furthermore, it cannot be used in a file name in Windows - which makes it a good candidate as changing how patterns are recognised would be a breaking change.

API Usage

The usage for making the above example work:
matcher.AddInclude("????10")

Alternative Designs

While not strictly in scope of this issue, globbing tools / libraries sometimes have other extra features ("metacharacters") when it comes to globbing, such as:

[abc] (matches any character between a, b, and c)
[a-z] (matches any character in the a-z range)
[!a-z] (matches any character not in the a-z range)
*.{txt,json} or (matches any files that has an extension of txt or json)

Risks

On *nix OSes, it appears that ? is totally valid in file names, so this would be a breaking change. However, that problem also exists with * and I'm not sure how common of a problem that would be, both in file names with question marks & usages where people try to match literal ? with FileSystemGlobbing.

One possible way to work around this would be to add ways to escape asterisks by doing \\* - \\ won't match anything anyway. Or we could make \\* the wildcard... but it's ugly and super counterintuitive.

If we're really worried about this breaking change... we could alternatively provide an overload that accepts an enum flag for selecting matching modes. As an added benefit, this would allow us to expand the globbing library much further without breaking existing codes, such as the ones listed in alternative designs. I might open an issue about exactly that separately, however if possible I do believe ? should be supported out of the box without passing any other flags.

@Gnbrkm41 Gnbrkm41 added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 20, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 20, 2023
@ghost
Copy link

ghost commented Feb 20, 2023

Tagging subscribers to this area: @dotnet/area-extensions-filesystem
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Currently Microsoft.Extensions.FileSystemGlobbing only has support for wildcard that matches anything between 0 and infinite times - there is not a wildcard character that matches only once.

This is problematic if you wanted to match files that have a certain character in a certain position. The particular problem I want to solve is that I have a bunch of files that are named as an arbitrary n-digit ID, and I want to only match files that have a '10' in the 5~6th position, so for example:

10000000
01000000
00100000
00010000
00001000
00000100

Here I would only want the 5th entry (000010) to match, however with the current implementation the closest you can get would be to specify *10* as the pattern - but this also returns all other entries as well, so you'd need extra code to actually filter out what you want. This is annoying as I'd rather not want extra bits of code (or have to look for other libraries when something really close is there).

It honestly surprised me that such functionality does not exist - even the most basic Command Prompt in Windows has support for ? as something that matches only a single character, and there's a few APIs in .NET that does something similar such as DirectoryInfo.EnumerateFiles. Wikipedia, as another example, lists ? as one of "the most common wildcards".

API Proposal

There is no changes in the API surface (as in new overloads / adding attributes etc). Rather, this involves a change in how the Microsoft.Extensions.FileSystemGlobbing.Matcher interprets the include / exclude patterns added by AddInclude and AddExclude methods, by adding a new special-treated character.

Currently, it appears that only /, * and ** are specially treated when interpreting the patterns. My proposal is to add ? to the list of special-treated characters - when inside a pattern, it would indicate "match any single character" (excluding none).

Why question mark: Most (if not all?) tools that accept glob patterns uses ? as the single-character wildcard. Furthermore, it cannot be used in a file name in Windows - which makes it a good candidate as changing how patterns are recognised would be a breaking change.

API Usage

The usage for making the above example work:
matcher.AddInclude("????10")

Alternative Designs

While not strictly in scope of this issue, globbing tools / libraries sometimes have other extra features ("metacharacters") when it comes to globbing, such as:

[abc] (matches any character between a, b, and c)
[a-z] (matches any character in the a-z range)
[!a-z] (matches any character not in the a-z range)
*.{txt,json} or (matches any files that has an extension of txt or json)

Risks

On *nix OSes, it appears that ? is totally valid in file names, so this would be a breaking change. However, that problem also exists with * and I'm not sure how common of a problem that would be, both in file names with question marks & usages where people try to match literal ? with FileSystemGlobbing.

One possible way to work around this would be to add ways to escape asterisks by doing \\* - \\ won't match anything anyway. Or we could make \\* the wildcard... but it's ugly and super counterintuitive.

If we're really worried about this breaking change... we could alternatively provide an overload that accepts an enum flag for selecting matching modes. As an added benefit, this would allow us to expand the globbing library much further without breaking existing codes, such as the ones listed in alternative designs. I might open an issue about exactly that separately, however if possible I do believe ? should be supported out of the box without passing any other flags.

Author: Gnbrkm41
Assignees: -
Labels:

api-suggestion, untriaged, area-Extensions-FileSystem

Milestone: -

@jozkee
Copy link
Member

jozkee commented Feb 21, 2023

This should be considered for #21362.

@jozkee jozkee changed the title [API Proposal]: Support single character wildcard in Microsoft.Extensions.FileSystemGlobbing Support single character wildcard in Microsoft.Extensions.FileSystemGlobbing Feb 21, 2023
@jozkee jozkee removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 21, 2023
@jozkee jozkee added this to the Future milestone Feb 21, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants