Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support duplicate file paths in tar archives #850

Merged
merged 1 commit into from
May 29, 2024

Conversation

eejayes
Copy link
Contributor

@eejayes eejayes commented Apr 11, 2024

Duplicate path entries are made possible within tar archives, as per feature request described in #849.

RELNOTES: Duplicate path entry support within tar archives

Copy link

google-cla bot commented Apr 11, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@eejayes
Copy link
Contributor Author

eejayes commented Apr 11, 2024

I will await design feedback before updating tests.

@aiuto
Copy link
Collaborator

aiuto commented Apr 11, 2024

Let's work out behavior in the issue before continuing with this.

@aiuto
Copy link
Collaborator

aiuto commented Apr 24, 2024

I think the discussion in #849 came up with a good proposal. Whenever you're ready.

@aiuto aiuto added the P1 An issue that must be resolved. Must have an assignee label Apr 24, 2024
@eejayes
Copy link
Contributor Author

eejayes commented Apr 29, 2024 via email

@eejayes eejayes requested a review from jylinv0 as a code owner May 2, 2024 07:29
@eejayes eejayes force-pushed the duplicate_file_paths branch 2 times, most recently from a2cbc88 to b49f33f Compare May 2, 2024 08:38
Duplicate path entries are made possible within tar archives as
discussed in feature request bazelbuild#849. This includes an interaction with
create parents, where the only logical scenario which would require
inference of a parent directory is when one does not already exist.
This is because allowance of duplicates is only useful when explicit
paths are declared.

RELNOTES: Duplicate path entries supported within tar archives
@eejayes
Copy link
Contributor Author

eejayes commented May 2, 2024

The commit above provides pkg_tar with support for duplicate file paths. Tests added show that:

  • an externally defined archive that is propagated through pkg_tar simply has it's structure preserved
  • adding a path via srcs which conflicts with a path within an archive in deps results in duplicates
  • adding two archives via deps which have conflicting paths results in duplicates

In the pre-existing code, deps are added after srcs. Thus for paths appearing both in deps and srcs, it is the one in deps which will appear later in the archive list and will thus prevail when an archive is unpacked. This conflicts with common command line tar tools where:

  1. users may specify which archive is subject to modification during an archive concatenation operation
  2. the archive is implicitly the modification subject in an operation combining a file and archive

It would be straight-forward to adapt the code to the second case. The first case would however require a mechanism to specify which archive (i.e. dep) is the modification subject, i.e. a sequence of steps of an archive's construction. The proposed iteration does not solve these. Instead it requires that multiple sequential pkg_tar operations be declared when override behavior for path conflicts between any pair of deps or srcs is desired.

The purpose of duplicate support is to allow evolution of artifacts with implicit override behavior consistent with typical tar utilities. Since files can never be artifacts and instead serve to specify incremental change, srcs conflicts which could exist within the scope of a single evolutionary step as defined by a pkg_tar instance should be solved prior to it's declaration. Thus adding duplicates to srcs should be an error.

Outside of the discussion in issue #849, there were some unforeseen interactions with create_parents. Since duplicates are not allowed in the current state of the code, a duplicate of a pre-existing directory will not be created if create_parents is specified. However with the support of allow_duplicates_from_deps, an additional directory with the arbitrary default permissions could conceivably be created. Default permissions represent a guess made in the absence of better information. Since pre-existing directory permissions represent information that is either equally good if originating from a create_parents operation, or better in the case that a user has declared them, a duplicate should not result from a create_parents operation even if allow_duplicates_from_deps is set.

Future improvement considerations are:

  1. Reversal of file and tar addition ordering to address (2) above
  2. Sequencing mechanism for tar construction (1) above
  3. Adding duplicates file paths to srcs is an error.

@aiuto
Copy link
Collaborator

aiuto commented May 6, 2024

I can look at this later today.

Copy link
Collaborator

@aiuto aiuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slowness. Things got busy here.

@aiuto aiuto merged commit 2247f5d into bazelbuild:main May 29, 2024
2 checks passed
@eejayes eejayes deleted the duplicate_file_paths branch May 31, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 An issue that must be resolved. Must have an assignee
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants