-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
glob support in core #38
Comments
I do think this is a good idea but as you point out glob does have 6 dependencies, including |
globs are not supported by bash v5 on Fedora 30. Running As for pulling glob into the core I'm not sure. I think it might be more appropriate to first ask if |
Glob support would be great precisely because it’s not consistent across platforms, but is needed for things like git/npm/eslint ignore, eslint overrides, etc. |
As often as globbing is used, and since most developers really don't realize how many edge cases and platform-specific issues need to be considered to produce a consistent experience, IMHO it would be great to add native support. If Node.js does decide to include this functionality, I would be willing to put some time into helping (assuming that's of any interest). Either way, my recommendation is that you look at the matching and fs operations as completely separate chunks of logic.
If it's decided that this functionality should be re-written directly in Node.js, I'd be happy to take a stab at contributing that logic, starting with just the matching. I think @isaacs has more experience and knowledge of the fs logic and cross-platform considerations than I do. But I'd be willing to help with that as well. If you decide to use external libs for any of this, I would of course also be willing and interested in collaborating on making any adjustments, improvements, or changes to our libs that are necessary to make them meet your requirements. |
Let's not throw subjective shade at other packages, though. |
sorry just saw your comment @ljharb (after I did thumbs up) |
We're meeting tomorrow and this will be on the agenda. To everyone offering comments: please attend if you're available! |
@sindresorhus It sounds like you'd be in support of pulling fast-glob into core, then? |
@boneskull I'd be in support of that decision too :) |
@ljharb It was not my intention to "throw a subjective shade". I'll be better at justifying my opinions in the future. I just wanted to mention an alternative. Here are benchmarks comparing By "better maintained", I meant more actively maintained. Poor word use on my part. For context, I built |
Yes |
OK, we discussed this at today's meeting
gimme feedback |
I would be thrilled to hand That being said, there are a few rough edges that ought to be polished down first, so here I go brutally trashing my own work :) The sync/async management is (overy) clever. It works fairly well, but it's not my favorite approach. Since it somewhat predates proper Classes in JavaScript, glob uses a different approach than most of my other more recent libraries that walk over trees. (Eg, tar, npm-packlist, etc.) The code is a bit brittle and hard to work with. It's also very (overly) optimized for performance, and meticulously designed to match Bash 4's glob behavior. None of which would in itself be a problem, if glob was actually a good cross-platform general-purpose file system globber. We could call it done and let it age gracefully. However, it has some issues with UNC paths on Windows, and lacks proper promise support, both of which are very challenging to fix in its current state. I'd love to give it a rewrite, and may someday, but honestly, if it was a priority, I probably would've done it by now. Maybe the best approach is to use it as a reference implementation, but rewrite it, fixing all the bugs and preserving the tests and actual glob resolution behavior. It works well enough for a lot of people right now (including me!), but caveat emptor: if you pull it in as-is, you might be welcoming a very stinky camel into your tent. Make sure it's not too hard to replace, if you do so. |
"Better maintained" is actually a fair and accurate assessment. No shade taken :) |
I'm working to document this initiative, so here's a rough draft of what I have, including a comparison of existing solutions. If I'm wrong here, please say so. I'll send a PR after it's had a bit of review, but wanted to keep everything in this issue for now. Globbing in Node.js coreSee nodejs/tooling#38. Motivation
Globs (henceforce capitalized) are natively supported in shells like Bash, which means you can do this, which will recursively find files matching $ some-cli-app **/*.foo.js Bash expands the above before execution of But we get in to trouble when our scripts (e.g., in To support Globs in a portable way, any CLI app would need to pull in glob or a similar module, and pass the Glob as a (double-)quoted string in the arguments: $ some-cli-app "**/*.foo.js" The utility of glob and others like it has been well-established, and it's used in enough CLI apps that the functionality belongs in Node.js core. RoadmapThe first thing that needed is a matching implementation. A Glob implementation (which uses the matcher to compare filenames) would then follow. We expect the new function(s) would be added to the path module. That said, we must be careful to not conflate filepaths and Globs, because they are not the same. Matching ImplementationAn underlying matching implementation is a prerequisite to Glob support in Node.js core. To determine what we should be implementing, we'll compare existing solutions.
Comparison of Prior ArtWe'll compare the featureset of the following:
We've identified the latter two as being most popular packages which provide the underlying matcher to a Glob implementation. We will consider the following relevant features:
For purposes of this best-effort comparison, we'll consider Bash's full capabilities instead of its default capabilities. All comparisons should reflect the latest version (as of this writing) of the respective software. If a user can toggle the specific behavior, that's noted with "Flag."
|
My key takeaway: if you expected the implementation to be simple, you will be disappointed. 😄 I don't think we can get away with not offering a handful of flags, either... |
Correct, micromatch doesn't support comments.
Indeed. I recommend also testing nested braces and extglobs, as well as braces and extglobs with nested negation patterns. Nested patterns are used more often than one might guess. Regarding bash, both minimatch and micromatch have (intentionally) different behavior in some cases. Also, it's important to note that |
Another thought.... joining paths to globs is not trivial, but it's totally possible to do it correctly, consistently, in a cross-platform compatible way. |
In your opinion, does micromatch do this correctly?
It seems it's implemented in its internal wildmatch.c. There's a JS port of wildmatch but I can't vouch for its accuracy. It also hasn't been published in 5 years. |
@jonschlinkert I'm curious. If you were going to pare down micromatch to its essence (there are a lot of options, as you'll probably agree), what would you keep? What would you drop? (Since we're aiming to determine our "MVP", as it were) |
IMO, in our implementation, I could definitely do without:
|
pls note that on Friday we have a meeting and everybody is invited. bring a +1 if you want |
Same question as in #19: what can node core do better than userland? I'm not opposed, but I do think node core can aim higher than "users won't have to install glob" and that those goals should be more clearly formulated before diving into specifics. |
Micromatch doesn't do any kind of joining, it's all left to the user to figure this out - same as minimatch and node-glob. However, I did start implementing this behavior somewhere, and I've given examples in many issues over the past few years, and I'd be happy to discuss over code when the time comes. In essence, it boils down to ensuring that all slashes are converted to posix slashes in a non-glob file path before converting it to a glob pattern. There are other nuances that would warrant a longer discussion.
My first reaction is to drop brace expansion, like you mentioned. Especially since globs with braces can be expanded before passing them to the matcher. (as a side note micromatch expands Overall, as you think about which features to include or drop, I think the most important consideration is how globs are used. With CLI tools like eslint, end-users don't have the ability to pass options to the matcher, or customize behavior by, for example, expanding braces beforehand. Removing a bunch of features or options will just make globbing useless or painful to a lot of end-users, or cause implementors to find hacky workarounds. I found this out the hard way over the past few years. Globbing is hard to do correctly, and implementors always seem to underestimate the number of edge cases their users will experience with it (especially because of Windows paths, and the general misconception that globs are just fancy file paths). This isn't an argument for not having native glob support in Node, it's the opposite. I think we can smooth this out and create consistency for users. If we look at this from 20,000 feet, what we think of as "globs" in Node.js is actually a combination of several different matching concepts, some from shell expansion, others from elsewhere:
Beyond features, there are a handful of things to consider in determining behavior:
I do. I really, really do. lol. When I created the parser for the latest micromatch, I had to make decisions about how the potentially-conflicting rules of each matching concept should work together. Instead, I avoided making those decisions by allowing the user to do it via options. If micromatch is pulled into Node, I would be more than happy to reduce options or hard-card certain behavior. This is not expected of course. |
Almost forgot, picomatch is the matcher behind micromatch. (edit: I initially said that picomatch doesn't have brace support, but I forgot that it does in the latest release. However, micromatch adds support for more complex brace expansion features). |
@vweevers That depends what you mean by "better," I suppose. Node.js doesn't provide a "better" way than userland to build services out-of-the-box; but its Another question to ask is "what's essential?" Recursive mkdir and rmdir--as well as argument parsing in #19--are essential for tools which operate on files, which is quite a lot of them. So, glob support is essential, because:
|
FWIW, Python's glob support is pretty barebones (it looks like they added globstar support in v3.5). I haven't used it myself, so don't know how Python tooling authors are getting along with it, but I'd sure like to know. |
What about modules like
The one exception could be to work around the fact that JSON does not support comments. Completely ignoring any pattern matching {
"files": [
"**/*.js",
"# reason for not publishing file.js",
"!file.js"
]
} I know this could be accomplished by pre-processing the array before passing to the matcher function but the idea here is consistency, for globs to eventually work the same everywhere. I don't see any value in supporting inline comments, for example
If this is supported should it automatically expand range as per the example at https://www.npmjs.com/package/micromatch#optionsexpandrange? Seems confusing that |
Agreed. For users, it's much more difficult to exclude unwanted dotfiles when they are included by default, than the other way around. Also, every globbing implementation I'm aware of excludes dotfiles by default. In bash, for example, you need to set It's also really common for users (or implementors) to use patterns like
Agreed. I've been planning to change that to the default. Thanks for bringing it up.
That depends on what you mean by bare bones lol. It supports globstars, ranges like |
If we move forward on this, I'd suggest using The reason is that "globbing" has been extended and implemented in a variety of different ways, and no two implementations are going to be 100% identical. There is no specification that covers all the cases users expect. So, it's not strictly accurate that all "posix shells" implement "globbing" (at least for some values of "posix shell" and "globbing"). @coreyfarrell |
Also: note that ignore-file style globbing is somewhat different from shell style globbing (though there are similarities). See https://github.com/npm/ignore-walk |
We discussed this in the meeting today.
Looking at it holistically, the result is that those of us in the meeting felt like this is biting off more than we can chew. Personally, my excitement has dimmed to the point of not wanting to champion the effort (#19 will be taxing enough, and I do want to help see that one forward). If anyone else here wants to take up the cause, nobody else here is stopping you--and I think we're happy to work with anybody who does want to move it forward--but it didn't sound like any of those in attendance have the requisite, uh, gumption. |
going to close this. somebody reopen if they want to champion. |
I expect that I'll have some time to pick this up and champion it after npm v7 and tap v15 both ship. Glob's next on my list. Probably near the end of 2020, or early 2021, so most likely it'll be a node v15 feature. I believe that the goal of an implementation should be to match the behavior of some specific and recent version of Bash, as it's the most widely used thing that people glob with. (But only as a reference implementation -- my explorations in the past showed that the actual code in bash tends to intermingle globbing with other string parsing features in a way that is very performant for a shell, but very bad for reusability in a platform like Node. We probably don't want to support command or environment substitution, for example, but should make it straightforward to add in userland.) |
I'm going ahead and reopening this, since it sounds like @isaacs is a potential champion -- it's fine if things take a while 👍 |
A PR introducing |
I'm curious if anyone thinks adding glob support to core would be a good idea?
Globs (globspecs? what's the right term?) are natively supported in POSIX (?) shells, which means you can do this:
This is expanded by the shell before execution of
some-cli-app
into a list of files matching the glob.But we get in to trouble when our scripts (e.g., in
package.json
) use globs, because they will not be expanded, and the scripts will fail (or otherwise exhibit unexpected behavior) on Windows.To support globs in a portable way, any CLI app would need to pull in
glob
or a similar module, and pass the glob as a quoted string in the arguments:$ some-cli-app "**/*.foo.js"
(On Windows, you must use double-quotes above, but using single-quotes in POSIX shells means env variables [e.g.,
$FOO
] will be treated as literals; env variables are another can of worms)The utility of
glob
has been well-established, and it's used in enough CLI apps that it may be a good candidate for vendoring or some other implementation. I haven't looked at theglob
source, so it's unclear whether vendoring would be the best route, but I do know it contains several other dependencies which would also need to be pulled in (right @iansu?).Thoughts?
cc @isaacs
The text was updated successfully, but these errors were encountered: