-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support single character wildcard in Microsoft.Extensions.FileSystemGlobbing #82406
Comments
Tagging subscribers to this area: @dotnet/area-extensions-filesystem Issue DetailsBackground and motivationCurrently This is problematic if you wanted to match files that have a certain character in a certain position. The particular problem I want to solve is that I have a bunch of files that are named as an arbitrary n-digit ID, and I want to only match files that have a '10' in the 5~6th position, so for example:
Here I would only want the 5th entry (000010) to match, however with the current implementation the closest you can get would be to specify It honestly surprised me that such functionality does not exist - even the most basic Command Prompt in Windows has support for API ProposalThere is no changes in the API surface (as in new overloads / adding attributes etc). Rather, this involves a change in how the Currently, it appears that only Why question mark: Most (if not all?) tools that accept API UsageThe usage for making the above example work: Alternative DesignsWhile not strictly in scope of this issue, globbing tools / libraries sometimes have other extra features ("metacharacters") when it comes to globbing, such as:
RisksOn *nix OSes, it appears that One possible way to work around this would be to add ways to escape asterisks by doing If we're really worried about this breaking change... we could alternatively provide an overload that accepts an enum flag for selecting matching modes. As an added benefit, this would allow us to expand the globbing library much further without breaking existing codes, such as the ones listed in alternative designs. I might open an issue about exactly that separately, however if possible I do believe
|
This should be considered for #21362. |
Background and motivation
Currently
Microsoft.Extensions.FileSystemGlobbing
only has support for wildcard that matches anything between 0 and infinite times - there is not a wildcard character that matches only once.This is problematic if you wanted to match files that have a certain character in a certain position. The particular problem I want to solve is that I have a bunch of files that are named as an arbitrary n-digit ID, and I want to only match files that have a '10' in the 5~6th position, so for example:
Here I would only want the 5th entry (000010) to match, however with the current implementation the closest you can get would be to specify
*10*
as the pattern - but this also returns all other entries as well, so you'd need extra code to actually filter out what you want. This is annoying as I'd rather not want extra bits of code (or have to look for other libraries when something really close is there).It honestly surprised me that such functionality does not exist - even the most basic Command Prompt in Windows has support for
?
as something that matches only a single character, and there's a few APIs in .NET that does something similar such asDirectoryInfo.EnumerateFiles
. Wikipedia, as another example, lists?
as one of "the most common wildcards".API Proposal
There is no changes in the API surface (as in new overloads / adding attributes etc). Rather, this involves a change in how the
Microsoft.Extensions.FileSystemGlobbing.Matcher
interprets the include / exclude patterns added byAddInclude
andAddExclude
methods, by adding a new special-treated character.Currently, it appears that only
/
,*
and**
are specially treated when interpreting the patterns. My proposal is to add?
to the list of special-treated characters - when inside a pattern, it would indicate "match any single character" (excluding none).Why question mark: Most (if not all?) tools that accept
glob
patterns uses?
as the single-character wildcard. Furthermore, it cannot be used in a file name in Windows - which makes it a good candidate as changing how patterns are recognised would be a breaking change.API Usage
The usage for making the above example work:
matcher.AddInclude("????10")
Alternative Designs
While not strictly in scope of this issue, globbing tools / libraries sometimes have other extra features ("metacharacters") when it comes to globbing, such as:
[abc]
(matches any character between a, b, and c)[a-z]
(matches any character in the a-z range)[!a-z]
(matches any character not in the a-z range)*.{txt,json}
or (matches any files that has an extension of txt or json)Risks
On *nix OSes, it appears that
?
is totally valid in file names, so this would be a breaking change. However, that problem also exists with*
and I'm not sure how common of a problem that would be, both in file names with question marks & usages where people try to match literal?
withFileSystemGlobbing
.One possible way to work around this would be to add ways to escape asterisks by doing
\\*
-\\
won't match anything anyway. Or we could make\\*
the wildcard... but it's ugly and super counterintuitive.If we're really worried about this breaking change... we could alternatively provide an overload that accepts an enum flag for selecting matching modes. As an added benefit, this would allow us to expand the globbing library much further without breaking existing codes, such as the ones listed in alternative designs. I might open an issue about exactly that separately, however if possible I do believe
?
should be supported out of the box without passing any other flags.The text was updated successfully, but these errors were encountered: