Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip_if_exists interaction with deleted files on update #1718

Closed
lkubb opened this issue Jul 31, 2024 · 7 comments
Closed

skip_if_exists interaction with deleted files on update #1718

lkubb opened this issue Jul 31, 2024 · 7 comments
Labels
bug triage Trying to make sure if this is valid or not

Comments

@lkubb
Copy link
Contributor

lkubb commented Jul 31, 2024

Describe the problem

When a path that is affected by skip_if_exists is deleted in a generated project, an update regenerates it anyways.

Template

https://github.com/salt-extensions/salt-extension-copier (sadly requires --trust, but this could be reproduced easily with a simple demo template)

To Reproduce

  1. copier copy --trust --defaults --data project_name=foo --data author='Foo Bar' --data author_email='[email protected]' https://github.com/salt-extensions/salt-extension-copier foo
  2. cd foo; git init && git add . && git commit -m "initial commit"
  3. rm tests/functional/modules/test_foo.py && git commit -am rm
  4. copier update --trust --skip-answered
  5. git status [shows new, untracked tests/functional/modules/test_foo.py]

Logs

No response

Expected behavior

The path to stay absent if it was deleted after project generation, as it's usually the case.

Screenshots/screencasts/logs

No response

Operating system

macOS

Operating system distribution and version

14.6

Copier version

copier 9.1.0

Python version

CPython 3.11

Installation method

pipx+pypi

Additional context

The following code recreates the already deleted file:

copier/copier/main.py

Lines 954 to 963 in 6fd3ad4

# Do a normal update in final destination
with replace(
self,
# Files can change due to the historical diff, and those
# changes are not detected in this process, so it's better to
# say nothing than lie.
# TODO
quiet=True,
) as current_worker:
current_worker.run_copy()

The diff contains the deleted file, it would be removed usually:

copier/copier/main.py

Lines 1006 to 1010 in 6fd3ad4

diff_cmd = git[
"diff-tree", f"--unified={self.context_lines}", "HEAD...dst_copy"
]
try:
diff = diff_cmd("--inter-hunk-context=-1")

But git apply is told to ignore changes to it, hence it remains:

copier/copier/main.py

Lines 1029 to 1033 in 6fd3ad4

for skip_pattern in chain(
self.skip_if_exists, self.template.skip_if_exists, extra_exclude
):
apply_cmd = apply_cmd["--exclude", skip_pattern]
(apply_cmd << diff)(retcode=None)

A solution could be to:

  1. Create another diff between the old copy and the real dest (edit: dest copy, the code changed a bit between 9.1 and current HEAD) with --diff-filter=D
  2. When applying the changes, first run git apply --include <skip_if_exists-path-1> [...] with the second diff on stdin (only if there are any skip_if_exists paths)
  3. Then continue as usual.
@lkubb lkubb added bug triage Trying to make sure if this is valid or not labels Jul 31, 2024
@sisp
Copy link
Member

sisp commented Aug 4, 2024

Thanks for raising this problem, @lkubb! 🙏

I left a comment in your PR with an alternative implementation idea. But I've also been thinking about the meaning and expected behavior of skip_if_exists. I think the original intention was to skip updating an existing file, e.g. because the generated content is non-determinstic. Perhaps also avoiding to overwrite an existing file during both copy and update is conceivable when generating a subproject (i.e., applying multiple templates to the same directory). If we never re-create a user-deleted file which is listed in skip_if_exists, then it'll be impossible to recover from this deletion using Copier, i.e. Copier will never re-create it. I wonder whether that's good behavior.

If you want to delete a file that is listed in skip_if_exists, how about passing the --exclude flag to the copier {copy,update} ... command instead to avoid generating it in the first place? This value is not recorded in the answers file though, perhaps something we should consider; otherwise, you need to remember to pass the same list of path patterns to --exclude on every update. If we recorded these path patterns, we might need a way to remove some of them again later, perhaps using a new --include flag.

WDYT?

@lkubb
Copy link
Contributor Author

lkubb commented Aug 4, 2024

I left a comment in your PR with an alternative implementation idea.

@sisp Thank you! I'm currently looking into your proposal.

But I've also been thinking about the meaning and expected behavior of skip_if_exists. I think the original intention was to skip updating an existing file, e.g. because the generated content is non-determinstic.

This would be one example. Another one, which the above-mentioned template intends to do, is to generate example code which should always be amended in the final project. skip_if_exists then prevents update conflicts when the template's examples are changed.

If we never re-create a user-deleted file which is listed in skip_if_exists, then it'll be impossible to recover from this deletion using Copier, i.e. Copier will never re-create it. I wonder whether that's good behavior.

I'm only proposing to align the update behavior between paths that are listed in skip_if_exists with those that are not. Both can still be recovered using copier recopy.

Edit: Oh, I think there is a misunderstanding on my part here. When a file was deleted in the rendered project and the corresponding file in the template is updated, it is recreated regardless. Having used cruft before, this is very unexpected. I have to think about the implications further, will edit this comment later. #1721

If you want to delete a file that is listed in skip_if_exists, how about passing the --exclude flag to the copier {copy,update} ... command instead to avoid generating it in the first place?

This would break the example file use case I described above.

This value is not recorded in the answers file though, perhaps something we should consider; otherwise, you need to remember to pass the same list of path patterns to --exclude on every update. If we recorded these path patterns, we might need a way to remove some of them again later, perhaps using a new --include flag.

I like this proposal in general since it is project-specific configuration, just not as a workaround for this behavior. Copier not remembering those after project generation is something that came up while writing Renovate support for Copier (renovatebot/renovate#25538 (comment), see details at the bottom).

@yajo
Copy link
Member

yajo commented Aug 7, 2024

I'm not sure this is a bug. ˋskip_if_existsˋ shouldn't skip if it doesn't exist (obviously). If it did, that'd be a bug.

One use case is when the data you supply is just a foolish bootstrapping template that you expect the user will change a lot and those changes won't have any importance for the template itself.

Another use case is when dumping out secrets. Those questions won't be stored on copier-answers, so the next update won't know the previous one's secrets. Usually these files are written in a non-git-tracked path. If they exist, you probably already wrote those secrets, so we just skip them.

What you really want is ˋexcludeˋ, which really excludes the thing. Or, probably, just don't add it to ˋskip_if_existsˋ, and IIUC copier won't recreate it unless changed in the template.

@lkubb
Copy link
Contributor Author

lkubb commented Aug 7, 2024

I'm not sure this is a bug. ˋskip_if_existsˋ shouldn't skip if it doesn't exist (obviously). If it did, that'd be a bug.

I agree for the copy/recopy commands. An update has different semantics since its intention is to apply custom changes on top of a newer (or the same) base. I would expect custom deletions to be honored as well. Mind that this is the case for files not listed in skip_if_exists (at least until the corresponding template changes).

Which problems does the current behavior of always recreating the file on update solve (if it's intentional)? And why should it be different for files that are not listed there? (Edit: If this is intentional, #1725 is what I would have expected from skip_if_exists. It should then be clarified that skip_if_exists really means: Ensure the file is always there.)

One use case is when the data you supply is just a foolish bootstrapping template that you expect the user will change a lot and those changes won't have any importance for the template itself.

In essence, this is why this behavior stumped me. I'm not saying the mentioned template is just a "foolish" bootstrapping template, but it has these bootstrapping parts. Different Salt module types have different predefined required methods (they are not changing). It helps to have an initial template that can be filled by users, but they can add more than one module of the same type or name it differently.

What you really want is ˋexcludeˋ, which really excludes the thing. Or, probably, just don't add it to ˋskip_if_existsˋ, and IIUC copier won't recreate it unless changed in the template.

Not adding it to skip_if_exists would help until an example file changes for some reason, correct. Then it causes unnecessary merge conflicts (if changed in the project) or is recreated silently (if deleted in the project).

When examples can have small changes from time to time, there are several issues with exclude:

  • Copier does not remember exclude in .copier-answers.yml, so each copier update invocation needs it.
  • I need to teach users of the template about having to exclude the files during updates (but not initial the copy) if they want to get rid of them.
  • I can't use the newly released RenovateBot support for Copier since it does not support passing exclude.

@yajo
Copy link
Member

yajo commented Aug 11, 2024

So, the whole point of this is: what should copier do if one file listed to skip if exists happens to be deleted in the subproject but changed in the template?

Should it behave like if it were listed in exclude? (What you propose) Or should it behave as a merge conflict and write the file? (Current behavior)

IMHO current behavior is correct because the option is called "skip if exists", not "skip if updating". The fact is it doesn't exist, so it shouldn't be skipped. This behavior is the same for any other file under the same circumstances.

However then the next question is: how to give you the power to have a "skip if updating"? 😉

I think we could simply add a _copier_conf.operation variable that contains copy, recopy or update. Then you can use it to template _exclude and effectively "skip if updating". Does that look good to you? 😊

@yajo
Copy link
Member

yajo commented Aug 12, 2024

Closing as duplicate of #1725.

@yajo yajo closed this as not planned Won't fix, can't repro, duplicate, stale Aug 12, 2024
@lkubb
Copy link
Contributor Author

lkubb commented Aug 12, 2024

Thanks, your proposal would provide a way to do what I initially wanted.

Just some comments:

Should it behave like if it were listed in exclude? (What you propose)

My proposal was to make the update behavior of deleted files consistent. Additionally, I think all project-deleted files should be excluded during update (#1721) (or there should be a setting to enable this behavior), but I can see the case for a merge conflict as well.

The current update behavior is as follows:

not changed in template changed in template
skip_if_exists recreate, no merge conflict recreate, no merge conflict
not skip_if_exists stays absent recreate, no merge conflict

Or should it behave as a merge conflict and write the file? (Current behavior)

The current behavior does not report a merge conflict though.

IMHO current behavior is correct because the option is called "skip if exists", not "skip if updating".

I agree with the linguistic logic, but imho it's surprising that a switch named "skip if exists" increases the likelihood of a file to be recreated. This should be documented. I'm still not sure which problem this behavior solves though, the code seemed like it was unintentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug triage Trying to make sure if this is valid or not
Projects
None yet
Development

No branches or pull requests

3 participants