Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pkg_files.strip_prefix more intuitive #492

Closed
wants to merge 11 commits into from

Conversation

nacl
Copy link
Collaborator

@nacl nacl commented Dec 19, 2021

NOTE: This is currently a work-in-progress that I had some spare time to hack on. It's mostly functional, but among other things, I don't yet have:

  • The migrator functional outside of rules_pkg itself in a clean manner. Might be good enough as-is.
  • The right name for srcs_strip_prefix; I chose local_strip_prefix instead for some reason. Easy enough to change.
  • Manual cleanup done to rules_pkg itself
  • A more interesting commit message

I will likely force-push this to consolidate it down to fewer, easier to understand commits. Probably one each for the change, migrator, and migrator application.

The most interesting parts are the new files in migration/ and the updates to pkg/mappings.bzl itself. Code changes implemented are everything in #354 (comment).

For the migration scriptage chose to use libcst instead of anything starlark-specific because the tooling is currently limited. I (or at least a month ago) was not able to find any sort of starlark tooling that let me manipulate the AST that was not in buildifier itself. buildozer , while largely effective, doesn't at this time seem to have a good way to handle in-rule macros like strip_prefix on the command-line.

libcst has a robust query system and allows for preservation of whitespace so that users migrating do not have to have buildifier-style reformatting done to the output. Users can run buildifier after the fact if they want.

Fixes #354.

@nacl nacl changed the title Make strip_prefix more intuitive Make pkg_files.strip_prefix more intuitive Dec 19, 2021
@aiuto
Copy link
Collaborator

aiuto commented Feb 3, 2022

Is there any way we can move this along without being blocked on a rewriter?
Perhaps

  • make a new attrbute strip that has the new strip_prefix behavior
  • keep strip_prefix too, but do not allow both
  • eventually add a deprecation warning
  • a release or 2 after the migration tool works we drop the old strip_prefix

@nacl
Copy link
Collaborator Author

nacl commented Feb 3, 2022

Is there any way we can move this along without being blocked on a rewriter? Perhaps

* make a new attrbute `strip` that has the new strip_prefix behavior

* keep strip_prefix too, but do not allow both

* eventually add a deprecation warning

* a release or 2 after the migration tool works we drop the old strip_prefix

I do not believe that asking users to do migrations like this is burdensome, especially when the migration script works (it does, albeit not as quickly as I would like).

That being said, we could have both strip_prefix and srcs_strip_prefix exist at the same time. We do not need a new strip attribute for this. I would not be opposed to this.

This leads to a broader question: what are our compatibility guarantees? How could Bazel be changed to facilitate them more easily, as opposed to writing one-off wrapper macros?

@aiuto
Copy link
Collaborator

aiuto commented Feb 10, 2022

I don't see how this is more intuitive.
We still only have a single strip prefix (local_strip_prefix) and uses still have to use the strip_prefix.from_package() or strip_prefix.from_root(). Back at the meeting in June we decided to not have from_package() that was always to be done, and to have a distinct flag for flatten.

@nacl
Copy link
Collaborator Author

nacl commented Feb 14, 2022

(Apparently I edited your message instead of writing a new one. No idea how I did that, my bad).

The most recent time we came to consensus on this was in our meeting on 2021-12-06. I have implemented pretty much what was desired there, with some naming goofs.

The increase in intuition is that strip_prefix now does the less surprising thing of assuming package-relative paths by default, and that srcs_strip_prefix (which it should be named, not local_strip_prefix like I did here). strip_prefix.flatten() makes more sense than strip_prefix.files_only().

The problem with the flatten flag IMO is that it does surprising things when combined with srcs_strip_prefix. You shouldn't be able to provide both a the same time, and it complicates the interface a bit.

While it still requires users to look at documentation, the strip_prefix pseudomodule makes for better reading than something like "/foo/bar" or "foo/bar". To what do they apply? With the pseudomodule at least, there's a higher chance of the difference being visually apparent. Using a string makes plenty of sense if we don't need to support the from_root style of stripping, but we do have valid use cases for this.

@aiuto
Copy link
Collaborator

aiuto commented Feb 15, 2022

I guess a problem for me is that I can't use the strip_prefix.foo() functions when I import into Google, because that won't match the semantics of some widely used internal rules and that will confuse my user base. I'm going to have to wrap the rules in macros that use pure string values, even if it means multiple ones.

FWIW. The current model we have moved to is
strip_prefix = '.' is flatten
strip_prefix = 'foo' is strip from package. srcs = glob([foo/**.h] ... strips the foo
strip_prefix = '%workspace%/foo strips from root.

FWIW... the patterns of usage I see in my code base are
strip_prefix='%workspace%/...' only about 200 times
strip_prefix='.' about 1000 times
strip_prefix='foo' about 45K times
out of ~80K places where this could be used.
=> I have about 35K instances using the default of strip_prefix='.' (flatten).

So I am going to have a lot of headwind against strip_prefix='string' not being the normal behavior.

But those numbers to inform us as to live usage patterns. For example, the disuse of %workspace% stripping confirms our expectation that few people ever want the full path to their package in the file names of the things they are packaging. Project layout != product layout.

@nacl
Copy link
Collaborator Author

nacl commented Feb 22, 2022

So I am going to have a lot of headwind against strip_prefix='string' not being the normal behavior.

I'm afraid I have the inverse problem. My codebase exclusively relies on the pseudomodule. I don't explicitly need it to be used, and a transformation would not be difficult. I also don't think my users would mind all that much (see below).

That being said, moving back to a string directly here feels like a step backward: from "this is saying what it is doing" to "this is the magic incantation to do this thing".

Regardless, something that looks nice is not necessarily a requirement. Empirically, as long as there is convenient a way to specify the strip_prefix, users will do what is needed regardless of how pretty it looks. Even if it looks ugly, users can just look at the documentation.

Also empirically, users will avoid looking at the documentation and even understanding what's going on in favor of looking at nearby examples and copying/pasting relevant code segments that look correct enough.

I guess the string might be easier to intuit in this case, but it limits our flexibility. The whole point of the pseudomodule (due to my interpretation of the original specification in #36) was that people didn't know what the string meant by looking at it, and that implementing an abstraction that just calls out what it's doing

I guess I don't mind going the string route here, but it doesn't feel right to me.

But those numbers to inform us as to live usage patterns. For example, the disuse of %workspace% stripping confirms our expectation that few people ever want the full path to their package in the file names of the things they are packaging. Project layout != product layout.

Fair point, but while the %workspace% usage is minimal, it, or something like it is still valuable to have. We both have numerous uses for it.

@aiuto
Copy link
Collaborator

aiuto commented Feb 23, 2022

Regardless, something that looks nice is not necessarily a requirement. Empirically, as long as there is convenient a way to specify the strip_prefix, users will do what is needed regardless of how pretty it looks. Even if it looks ugly, users can just look at the documentation.

Actually, I think looking nice, or at least being obvious to understand is a key requirement. People don't read documentation unless the default behavior is not doing what they want. And the default behavior should do the most commonly requrested thing, based on past experience. I think the default, without any strip_prefix, should be unflattened file names without the path to the package. That is //a/b:c should be c and //a/b:d/e should be d/e.
Then you can add flattening. And you can do something else special to get the /a/b back in there.

Now we get to the question of if /a/b ... flattened to c,e is ever an option we should provide? If so, then
it would be hard to do that if flatten were not a distinct attribute.

@nacl
Copy link
Collaborator Author

nacl commented Feb 23, 2022

Actually, I think looking nice, or at least being obvious to understand is a key requirement. People don't read documentation unless the default behavior is not doing what they want. And the default behavior should do the most commonly requrested thing, based on past experience. I think the default, without any strip_prefix, should be unflattened file names without the path to the package. That is //a/b:c should be c and //a/b:d/e should be d/e.

We are still in agreement about this, but I think we're talking around each other and have rather different visions and somewhat different requirements for this interface.

Here are what I believe to be the remaining questions. I'll leave my answers here:

What should this attribute be named?

srcs_strip_prefix, as discussed.

How should its contents be specified?

A macro that creates a string value that represents how the prefix stripping should occur.

Should flattening even be an option?

Yes. It's a convenient shorthand for "remove everything" without having to know what it is. This is especially useful for filegroups.

A good compromise, perhaps, would be to consider an option where flattening is banned if all files don't start with the same directory prefix, but I don't know what sort of ripple effects this might have. It could make updating the tree I manage (which relies heavily on flattening) difficult. I'm about to finish a project that will get our tree up to an instance of rules_pkg from December 2021, and I'm not going to be able to do it again for some time.

If so, how should flattening be specified?

An additional value to the srcs_strip_prefix macro.

I don't really like the idea of making a new distinct flatten attribute. It seems that the interactions with (srcs_)strip_prefix could be potentially confusing to users.

@nacl
Copy link
Collaborator Author

nacl commented Oct 31, 2022

@aiuto I'm looking into this again. Given the points in my previous (very old) message, do you have additional comments or concerns?

@aiuto
Copy link
Collaborator

aiuto commented Oct 31, 2022 via email

@nacl
Copy link
Collaborator Author

nacl commented Jan 5, 2023

I have this about ready to go, but I'll make a new PR to keep the flow cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pkg_files(strip_prefix) behavior is confusing w.r.t. flattening a tree of files
2 participants