Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about file ordering in pkg_tar #828

Open
lukedirtwalker opened this issue Mar 7, 2024 · 9 comments
Open

question about file ordering in pkg_tar #828

lukedirtwalker opened this issue Mar 7, 2024 · 9 comments
Labels
feature-request p4 An idea that we are not considering working on at this time.

Comments

@lukedirtwalker
Copy link

I'm wondering about the file order in pkg_tar. When I create a normal tar file via the linux command line the order of files in the tar is like specified on the command line:

touch {a,b}.txt; tar -cvf test.tar {b,a}.txt;  tar -tvf test.tar
# output
-rw-rw-r-- luke/luke         0 2024-03-07 08:29 b.txt
-rw-rw-r-- luke/luke         0 2024-03-07 08:29 a.txt

However if I do something similar with rules_pkg this doesn't apply:

# BUILD.bazel
load("@rules_pkg//:pkg.bzl", "pkg_tar")
pkg_tar(name = "a")
pkg_tar(name = "b")
pkg_tar(
    name = "test",
    srcs = [":b", ":a"],
)

# building and testing:
bazel build //:test; tar -tvf bazel-bin/test.tar
-r-xr-xr-x 0/0           10240 2000-01-01 01:00 a.tar
-r-xr-xr-x 0/0           10240 2000-01-01 01:00 b.tar

Is there any way to make this ordered the same as the srcs are specified?

One workaround I found is to pack a and b into an extra tar and then specify via deps:

pkg_tar(name = "a")
pkg_tar(name = "b")
pkg_tar(name = "a_tar", srcs = ["a"])
pkg_tar(name = "b_tar", srcs = ["b"])
pkg_tar(
    name = "test2",
    deps = [":b_tar", ":a_tar"],
)

# building and testing:
bazel build //:test2; tar -tvf bazel-bin/test2.tar
-r-xr-xr-x 0/0           10240 2000-01-01 01:00 b.tar
-r-xr-xr-x 0/0           10240 2000-01-01 01:00 a.tar

however this seems a bit cumbersome.

I assume the order gets lost here when we assign to the map

mapping_context.content_map[dest] = _DestFile(

?

is that the expected/desired behavior?

@aiuto
Copy link
Collaborator

aiuto commented Mar 7, 2024

The intended behavior is that files are sorted by final location path rather than ordered by srcs.

Your example of using deps looks like a bug. In that case it should have unpacked the tarballs rather than including the entire file intact. But, let's not get distracted by that. Let's assume it was fixed and the files come out as b, a. The reason they do that is not from a true requirement. It is a side effect of the easiest implementation. While writing the result tarball, we open up b.tar and read individual files in and place them. It's fast, but has some footguns if a.tar and b.tar have intermixed file paths. If I had infinite time, I would like to build a pkg_untar() rule that let you specify a tarball as input, provide filtering and path remapping, and have that as input to pkg_tar. That could provide high performance with more flexibility. But that is a different feature request than yours.

We could make an FR for treating srcs in order via another attribute. There might even be someone willing to build it. It should not be that hard, but it does require some strict definition about the corner cases. There are several potential ones around auto-creating directories for something early in srcs when an explicit create with an owner and perms happens later. We could start with it failing hard if that happens. You could work around it by pre-creating them with pkg_dirs earlier in the list. In the more complex case, we would have yet another option to have it ignore the problem.

@aiuto aiuto added the p4 An idea that we are not considering working on at this time. label Mar 7, 2024
@lukedirtwalker
Copy link
Author

Your example of using deps looks like a bug. In that case it should have unpacked the tarballs rather than including the entire file intact.

Note that I do double taring (just because I was to lazy to use genrule or assume external files are around), I think it's therefore correct that the final result contains tars.

In anycase I would consider to invest some time to build that feature (putting srcs in order). Would you have some ideas on how to name the attribute? I could draft a PR.

@aiuto
Copy link
Collaborator

aiuto commented Mar 8, 2024

Note that I do double taring ...
Oh. I see what you did. That's a confusing sample.

Would you have some ideas on how to name the attribute?
Maybe emit_srcs_in_list_order?

FWIW, you are going to have to fight with the rest of the ecosystem on that. Many projects use buildifier, which will alpha sort label lists like srcs. My hunch is that eventually you will find yourself modifying buildifier to suppress the sort on pkg_tar.

@lukedirtwalker
Copy link
Author

Ah good point about buildifier. We use that as well. Hmm I'll have to think more about this.

@lukedirtwalker
Copy link
Author

So I wonder if we should add an srcs_ordered attribute:

pkg_tar(
    name = "test-tar-ordered",
    srcs = [
        "//tests:testdata/config",
        "//tests:testdata/hello.txt",
        "//tests:testdata/loremipsum.txt",
    ],
    srcs_ordered = {
        "1": "//tests:testdata/hello.txt",
        "2": "//tests:testdata/loremipsum.txt",
        "3": "//tests:testdata/config",
    },
)

The semantics would be that the keys determine the order of the values. And the values need to match the srcs attribute (this would be checked).
Would that make sense? That would be compatible with buildifier.

I'm not 100% how to integrate this in the _MappingContext, but probably this would also need some order notation added and it would need to be handled correctly in the tar binary.

@aiuto
Copy link
Collaborator

aiuto commented Mar 15, 2024

Please no. Unless you want to write an extensive specification for how the two will merge and how they are sorted. E.g. what is the output order of "1", "11", "2"? Obviously unchanged, because then you can insert a new thing in the middle of an existing list. Right? Oh, someone expects it strict numeric, so they try "1", "1.5", "2" and then all the code assuming ints breaks. Sorting is very easy, yet very hard do do in a way that seems right to everyone.

@lukedirtwalker
Copy link
Author

Fair enough. I see that's problematic. I wonder how we could approach this that is buildifier friendly and not weird in some other way.

@lukedirtwalker
Copy link
Author

Ah actually adding a # do not sort comment on the arguments prevents buildifier from sorting the lists. So we can simply have srcs with the comment # do not sort and a property emit_srcs_in_list_order = True, like so:

pkg_tar(
    name = "test-tar-ordered",
    srcs = [
       # do not sort
        "//tests:testdata/hello.txt",
        "//tests:testdata/loremipsum.txt",
        "//tests:testdata/config",
    ],
    emit_srcs_in_list_order = True,
)

Sorry I wasn't aware that it was that easy to disable the buildifier sorting, that's why I proposed the weird dict approach 🙈

@aiuto
Copy link
Collaborator

aiuto commented Mar 21, 2024

Yup. # do not sort is a viable tool for people needing this. I was never a fan of buildifier sorting things by default. Lists of things belong in the order people specified them. Text editors can sort if you want it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request p4 An idea that we are not considering working on at this time.
Projects
None yet
Development

No branches or pull requests

2 participants