Skip to content

Conversation

@hdgarrood
Copy link
Contributor

Implement bash style filename globbing for data-files (well, bash after doing shopt -s globstar). Subsumes #713 #784 #1343 #1344 #1973

The plan:

  • * can expand to a part of any file or directory but not descend into subdirectories
  • ** is the same, except that it can match any number of levels of directories (including 0).
  • {a,b,c} can expand into a, b, or c, where a, b, and c are globs themselves. So {foo,*bar} expands to "foo" or "abar" or "bbar"...
  • "Literal" and glob patterns may be mixed to an unlimited extent in a single directory level. That is, foo/**/* is allowed, and so is foo/**/*.hs, and foo/**/Test*.hs, and foo/**/*{Test,Spec}*.hs.
  • If a file name has one of the characters {, }, *, \ in it, a backslash \ is required to match it literally. Eg the pattern a\\b matches a\b, and the pattern \e is an error (because no such escape sequence exists).

Here's what I've done so far. It typechecks, but if I try to parse anything, it hangs. Any ideas?

@hdgarrood
Copy link
Contributor Author

It's just occurred to me that it might be easier to have type Glob = [GlobPart]; data GlobPart = Literal ... | Choice ... | MatchAny | MatchAnyRecursive, both for parsing and matching. For example, there's only one way of concatting a bunch of glob parts (as opposed to the current one, where you could have Concat (Concat (Literal "x") (Literal "y)) Literal "z" or Concat (Literal "x") (Concat (Literal "y") (Literal "z"))

Ensure that canonicalise operates on nested Globs within choices
rather than just the top level Glob
@hdgarrood
Copy link
Contributor Author

Thoughts so far:

  • The code is based on the assumption that parses are never ambiguous - they either result in exactly one function FilePath -> Bool that indicates whether a match includes a file, or they result in an error (for, eg, an invalid escape sequence).
  • It could probably be much more efficient than it currently is. My implementation feels a bit ugly at the moment.

Any feedback is very welcome.

@hdgarrood
Copy link
Contributor Author

Is it ok to follow symlinks when expanding a pattern?

@hdgarrood
Copy link
Contributor Author

Also presumably this should apply to all fields of type filename list - that is:

  • license-files
  • data-files
  • extra-source-files
  • extra-doc-files
  • extra-tmp-files
  • includes
  • install-includes
  • c-sources

@tibbe
Copy link
Member

tibbe commented Jun 30, 2014

It might be worth looking at existing implementations of glob patterns, Haskell or otherwise, for inspiration. One example: https://hackage.haskell.org/package/Glob

@hdgarrood
Copy link
Contributor Author

oh yeah, thanks, I forgot about that library.

Looking at the documentation it seems the people who wrote that seem to know what they're doing, certainly more than I do. Is there a policy on what packages Cabal is allowed to depend on? I guess you want to keep that list as small as possible?

@tibbe
Copy link
Member

tibbe commented Jun 30, 2014

Right now we can only depend on packages that ship with GHC, as Cabal ships with GHC. We're trying to fix that issue in the future (by not having GHC depend on Cabal), but in the meantime perhaps we could just include the minimal amount of code to support globbing in Cabal itself.

@23Skidoo
Copy link
Member

Right now we can only depend on packages that ship with GHC, as Cabal ships with GHC.

To clarify: this refers only to the Cabal library itself, not to cabal-install.

@hdgarrood
Copy link
Contributor Author

Ok, great - thanks - I'll continue working on this when I get a moment.

@ezyang
Copy link
Contributor

ezyang commented Aug 27, 2014

OK, I'm going to close this PR for now, please reopen when you have more!

@ezyang ezyang closed this Aug 27, 2014
@mietek
Copy link
Contributor

mietek commented Nov 26, 2014

Looking forward to this.

@hdgarrood
Copy link
Contributor Author

I found some time! I have more code, but I'm not able to reopen this.

@hdgarrood
Copy link
Contributor Author

Also it seems like https://github.com/haskell/cabal/blob/master/Cabal/Distribution/Simple/Utils.hs#L754
should instead have return [dir </> filepath'] so that the source directory is correct? Can anyone confirm this?

@23Skidoo
Copy link
Member

23Skidoo commented Apr 4, 2015

Also it seems like https://github.com/haskell/cabal/blob/master/Cabal/Distribution/Simple/Utils.hs#L754
should instead have return [dir </> filepath'] so that the source directory is correct? Can anyone confirm this?

No, it looks like it searches the directory dir, but returns the relative path(s). The function could be better documented, though.

@bgamari
Copy link
Contributor

bgamari commented Aug 2, 2016

What is the status of this?

@mietek
Copy link
Contributor

mietek commented Aug 3, 2016

Dead by bikeshedding in #2522 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants