Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] RFC: Patch dependencies using unidiff patchfiles #3177

Closed
wants to merge 5 commits into from

Conversation

TotalKrill
Copy link

@TotalKrill TotalKrill commented Sep 20, 2021

This being my first RFC for rust, I got some feedback that I should open a PR, even though the PR and the RFC is not completed.

rendered

Feedback on the RFC itself, the writing and the process are all welcome. I find it easier to discuss around a PR ( thanks to githubs discussion system around them ) than trying to gather feedback on my fork.

I realise that patchfiles are an incredible powerful tool, and the usecase I am presenting here might not be the optimal representation of this feature. But it was the solution I wanted when encountering this problem for the N-th time, where N > 3.

This RFC draft is based upon discussions in: rust-lang/cargo#4648 and rust-lang/cargo#3830

Clarified more in the examples, as well as some language fixes. 
Made the Quality of life improvements a bit more clear
Filled in missing fields
Preparing for removal of the question text
@TotalKrill
Copy link
Author

I feel the Reference-level explanation is lacking, and would like it if someone helped out here

@bjorn3
Copy link
Member

bjorn3 commented Sep 24, 2021

I think the patch file should be specified in the [patch] section too, not [dependencies].

@pickfire
Copy link
Contributor

@TotalKrill can you please add a rendered link like other rfc pull requests?

@TotalKrill
Copy link
Author

@bjorn3 Generally I am a bit unhappy with the [patches] section, I feel it is hard to direct the patch to the specific dependency one would want changed.

@roblabla
Copy link

This should definitely go in the [patch] section IMO. When working on workspaces, the [patch] section is centralized at the workspace level, ensuring all the members of the workspace use the same patched version of the package. I feel like this is super important. Furthermore, I feel that the proposed patchfile feature would actually fix some of the UX problems that the patch section has WRT it being nitpicky about the version it patches.


FWIW, I have a slightly different motivation for such a feature. At work, our codebase has a bunch of crates in the [patch] section (around 20 currently) containing various bugfixes and whatnot that are in the progress of being upstreamed. We use github forks to maintain our patches, but the workflow around this is complicated, especially with regards to the long-term buildability of our project.

To ensure we can always rebuild old versions of our codebase, we need to ensure that the pinned hashes of our patches stay alive forever, which means making sure we never push --force those branches, and creating a new branch whenever we need to update the crate version.

The only current alternative to our current workflow is to vendor those patched dependencies, but this has its own set of downsides. Having patchfiles would be ideal for us: it'd trivially ensure old versions are reproducible, as we'd store the patchfiles in the same repo as the rest of the project.

@TotalKrill
Copy link
Author

@roblabla That is very valid points, and we are in the same boat basically. If you go to my branch, I think you could expand the motivation text for this with those points, and also making the note about the workspace case, which I had forgotten, and Ill merge them and this RFC get that update?

@ehuss ehuss added the T-cargo Relevant to the Cargo team, which will review and decide on the RFC. label Oct 5, 2021
@alexcrichton
Copy link
Member

Thanks for taking the time to write this up, personally I think this would be a neat feature to add to Cargo.

One of my main questions about a feature like this is how will the patch files get generated? The patch file format is relatively standard so I'm not worried about specifying something like that, but this would be one of Cargo's first features where you can't use Cargo for the whole feature, but rather it has to be supplemented with something else like git. I think care needs to be given to how we're going to document how users generated patch files and how such a method works with Cargo. For example if we say "just use git" then Windows users without git installed will have no recourse to use this feature. That's a tradeoff that can be made, but in my opinion it needs to be acknowledged.

Another question about how to actually generated patch files is where is the source coming from to actually generate a diff against? There seems to be two primary sources to patch, one being git and one being crates.io. For crates.io crates there's not really an official method to actually check out and view the source of any particular version. For git crates you can also run into the problem where the revision used in the lock file is not easily looked up to get a checkout elsewhere to generate a patch against. I think this basically boils down to that it's actually pretty involved to check out the precisely correct version of the source code to generate a patch against, and I think there's a good deal of risk of generating a patch against a slightly wrong version of the source.

I would agree with other folks here as well that this information most likely should live in the [patch] section. Changing a dependency's source is a global decision that needs to affect the entire crate graph, and tweaks to the crate graph typically happen in the [patch] section today.

Finally one other thing I'd like to discuss here is how patches are actually applied. While I think that there's some leeway of generating patches can require something like git, I personally don't feel that there's as much leeway with the actual application of the patch itself. To fit well in Cargo I believe this functionality would need to be baked in to Cargo itself. Previous PRs to Cargo have called out to the git CLI, but I personally don't think that's a solution we can land. Otherwise though I don't actually know how to apply patches with just Rust code (or perhaps a C library).

In any case personally I think there's still somewhat fundamental design decisions to sift through and consider before more leaf concerns like the text of the RFC are considered. I think that these kinds of the decisions end up being the meat of the RFC and it sort of writes itself once all of this is documented.

@TotalKrill
Copy link
Author

Thank you @alexcrichton for taking the time to review and come with feedback regarding this feature

I wholeheartedly agree that these design decisions needs to be acknowledged and solved before the RFC gets finalized, and my idea was that this PR is the discussion forum for that, I believe it makes it easy for people to find and contribute.

One of my main questions about a feature like this is how will the patch files get generated? ... but this would be one of Cargo's first features where you can't use Cargo for the whole feature, but rather it has to be supplemented with something else like git

Not sure I agree here though, I will liken it to the usage and creation of source code. Cargo can use source code and make it compile, but it cannot create it. But yes, as one of the implementation steps the documentation should be updated to acknowledge this. "it is recommended to use git" would be an acceptable way to do it IMO. But as future work, we could suggest a cargo tool that would generate these patchfiles.

Another question about how to actually generated patch files is where is the source coming from to actually generate a diff against?

This is a huge hassle indeed, determining the exact version of a source code when using crates.io, I would argue that this is actually out of scope for this RFC since I believe solutions here are more into the workflow for obtaining the source code, while I would like to put the scope touched subjects of this RFC between "generating patch file <-> getting patchfile applied"

A lot of the dangers of applying the patchfiles to the "wrong source" are actually handled in the format itself, so a patch will either apply cleanly ( the source code is very likely the one we wanted to patch ), or with errors since the lines to remove, add or the span of said changes along with the contextual lines are not correct, and this should in my opinion refuse to compile, the same way that errors in the dependencies will not even start the compilation.

There could be another option where the [patch] section entry can actually contain a resulting hash of the file before or/and after the patch has been applied. This would however generate quite a workflow so I would argue against it.

I would agree with other folks here as well that this information most likely should live in the [patch] section. ...

I Agree, will change this.

Finally one other thing I'd like to discuss here is how patches are actually applied ... Otherwise though I don't actually know how to apply patches with just Rust code (or perhaps a C library).

Looking at the unidiff standard, I believe that this patching functionality is actually quite trivial to implement. I would suggest rolling our own patch apply library for this. Rolling our own would additionally allow us to set the strictness level of applying patches.

in short

  • recommend git to create patchfiles in documentation, otherwise google/bing/duckduckgo can help other users
  • patchfiles only apply cleanly to very similar files, thus if the source differs to much, compilation should fail, and the thus using crates.io even without being 100% sure about which code is being downloaded should be acceptable
  • I agree that the information should live in the [patch] section
  • Rolling our own patch application library or reusing some rust-one is what I would suggest, the unidiff standard looks quite simple, it would also help control compilation failure by setting how strict patch application should be.

Moved the new entry into the [patch] section, this added the requirement of a version key to specify on top of which software a patch should be applied
@TotalKrill
Copy link
Author

TotalKrill commented Oct 6, 2021

I updated the RFC to move it into the [patch] section, that in turn added the requirement of a version key into said section, and I added some logic to handle that in a way that would not break how it is utilized today, as well as a suggestion for future work that could extend the usage of that new key

Also I edited the reference level explanation to be more explicit and motivate some design decisions I believe would be important

@TotalKrill TotalKrill changed the title [Help wanted] First draft of patchfile RFC [Draft] RFC: Patch dependencies using unidiff patchfiles Oct 6, 2021
@nagisa
Copy link
Member

nagisa commented Oct 6, 2021

The RFC mentions unidiff as being a standardized format in some form or shape. You also mentioned “Looking at the unidiff standard” in one of your earlier comments. Is there a canonical description of this format anywhere? And if so, could it be linked from the RFC?

@TotalKrill
Copy link
Author

TotalKrill commented Oct 7, 2021

@nagisa You are correct, I did mention the unidiff standard, but that was massively overselling it on my part. What I meant by those words, is that I was looking at the "detailed specifcation" found at GNU, and generating and looking at some diffs made by git on a project I had lying around.

These two resources were what I was refering to, and the format is indeed quite simple, even if the "detailed" specification is not very detailed..

https://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html
https://www.artima.com/weblogs/viewpost.jsp?thread=164293

@bjorn3
Copy link
Member

bjorn3 commented Oct 7, 2021

Can you mention that patching Cargo.toml is not supported? The Cargo.toml in a crate uploaded to crates.io is autogenerated and not the original one the user wrote. The original file stays as Cargo.toml.orig, but it may no longer be possible to create the autogenerated Cargo.toml from it even if it isn't patched.

@TotalKrill
Copy link
Author

@bjorn3 I had no idea, this would invalidate much of what I have written in the RFC...

@bjorn3
Copy link
Member

bjorn3 commented Oct 7, 2021

I think patching the generated Cargo.toml may be possible, but it would require the user to look at the generated Cargo.toml in the .crate file.

@alexcrichton
Copy link
Member

I'm not sure if I entirely agree that we can wave away concerns about generating patch files with "well if the patch doesn't apply cleanly that's an error". There is no supported way really to actually get the source of a crates.io crate. You can't use git repositories since Cargo or cargo package may move files around or drop some files. There's basically no way to patch a crates.io crate and if you're patching a git dependency you still have to know the exact revision, not just the branch/etc, to know what to patch.

I'm happy to be proven wrong here, but I don't think that this feature would be a great fit for Cargo if all the documentation says is "if you hand Cargo a patch it'll apply it" without actually providing any guidance at all to actually generate said patch.

@roblabla
Copy link

roblabla commented Oct 7, 2021

There is no supported way really to actually get the source of a crates.io crate.

I mean, there are supported ways for 3rd party tooling to acquire it? Afaik the crates.io registry protocol is documented and stable, and there are existing cargo subcommands to download them. Wouldn't guiding users towards those work?

Related issue: rust-lang/cargo#1861


[patch.crates-io]
foo = { path = "../path/to/foo" }
bar = { version = "2.0", patchfiles = ["patches/bar-update-foo-dependency.patch"] }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bar = { version = "2.0", patchfiles = ["patches/bar-update-foo-dependency.patch"] }
bar = { version = "=2.0", patchfiles = ["patches/bar-update-foo-dependency.patch"] }

I'd suggest requiring that anything with a patchfiles entry needs to have an exact version specifier, to help ensure that the source being patched is the same as the source the patch was generated against.

On @alexcrichton's "source of truth for patch generation" question, I think we can give some guidance in documentation for how a patch may be generated against a particular crate's source, but at the end of the day it's up to the user to generate the patch that they need - requiring either exact version number specifiers, or exact git commit version specifiers, helps us to make this easier to describe and enact.

Copy link
Author

@TotalKrill TotalKrill Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing!

Worrying to much about which source a patch would apply to would severly limit the value of allowing patchfiles in my opinion.

Patchfiles as I have used them has always been more of an "apply this change to code that looks like this" compared to the "apply this change to this exact version and edition of a software package"

If one would want to apply the patch to an exact version of a source code or package, I would not want to stop them from specifing the exact git hash or force the patch to apply to only a specific version from crates.io.

But forcing all patchfile users to always specify exactly which software package or version I want to apply this change to just seems rude and would severly limit the usefulness of allowing patchfiles in the first case.

As a real life example of this: https://github.com/rust-windowing/winit/pull/2025/files

I would like to have that addition as a patchfile, instead of as now, patching winit to use the one from my git branch, that I have to actually rebase with every change. That way I could be using the master of the original git branch, with that patch applied on top of whatever changes the owner(s) of the repository are deciding to merge or not.

@TotalKrill
Copy link
Author

TotalKrill commented Jan 19, 2022

.... There's basically no way to patch a crates.io crate and if you're patching a git dependency you still have to know the exact revision, not just the branch/etc, to know what to patch ....

Long time since this was last discussed, had hoped some more interest would have gathered for this, but patching using patchfiles doesnt require any knowledge about git or anything in the usual case, it requires knowledge about where the file to patch is located, which lines, and some optionally configurable settings that can define that the previous lines should be something, or what the lines following the change should be.

Basically a patchfile is doing similar to what the changelog in merge requests does, it applies changes to files that looks a certain way. In no way does that require intricate knowledge of what git revision or the exact code it acts on.

If the files the patchfile wants to change does not look like the patchfile specifies ( or the files does not exist ) then a clear error should be thrown during compilation. something along the lines of "patch XXXXX does not apply: {filename does not exist, patchfile XXX does not exist, lines 119-120 in filename does not match }"

@da-x
Copy link
Member

da-x commented Feb 27, 2022

For prior art, here an equivalent functionality from NodeJS packaging context: patch-package. It also uses unidiff files. Seems it is widely used (6k stars), and we can learn from its implementation and how to teach users the usage of this new functionality.

@TotalKrill
Copy link
Author

Here is a tool that has sprung up allowing patching.

https://crates.io/crates/patch

I would like some kind of feedback on how to get this RFC moving towards a decision.

@roblabla
Copy link

I've been thinking about this RFC, (which would really help us at work) and I think something we could do to push this forward is to experiment with this workflow out-of-tree, by (ab)using cargo vendor to build the dependencies. E.G. you could have a cargo plugin crate (say, cargo build-patched) that:

  1. Calls cargo vendor to get all the deps in one place
  2. Applies the patchfiles to the patched crates
  3. Runs cargo build with a config override to build with the patched vendor directory.

It wouldn't be ideal (especially the whole "needs a custom build command" bit), but it'd allow us to work out the details of how the patches are generated and applied.

Comment on lines +212 to +213
the dependency declaration, if a patch does not apply cleanly, it would result in an compilation error
that shows which patch that did not apply cleanly to which library. This would make sure that patchfiles are not to dangerous.
Copy link
Member

@joshtriplett joshtriplett Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the dependency declaration, if a patch does not apply cleanly, it would result in an compilation error
that shows which patch that did not apply cleanly to which library. This would make sure that patchfiles are not to dangerous.
the dependency declaration, if a patch has any "fuzz" (mismatched context lines), it would result in an compilation error
that shows which patch that did not apply cleanly to which library. This would make sure that patchfiles are not too dangerous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshtriplett imho it should also specify other mismatches too otherwise it'd technically be legal to not fail if a patch deletes lines that don't match since they're not context lines

@joshtriplett
Copy link
Member

joshtriplett commented Nov 29, 2022

I'd love to see this happen. A few bits of feedback:

  • This needs to specify the patch format.
  • I think this should support git-format-patch patches, which provides some helpful extensions such as file moves/copies and patching binary files.
  • Per my other comment, this should require that there be no fuzz.
  • This is absolutely bikeshedding, but: patches rather than patchfiles, please? 🙂

@cptpiepmatz
Copy link

What is the state on this?

@ambasta
Copy link

ambasta commented Nov 3, 2023

How would this work if the patchfile patches the dependency Cargo.toml?

For example, adding feature flags to dependency that it doesn't expose. In this case, dependency resolution would fail to find a package with added features, which would need the patchfile to be applied before dependency resolution, which would in turn require dependency resolution.

@ehuss
Copy link
Contributor

ehuss commented Dec 11, 2023

I'm going to propose to postpone this RFC. The Cargo team agrees that this seems like it would be useful and a good idea. However, there are some details to work out, and this RFC has stalled somewhat. Primarily, there isn't anyone on the team who has the capacity at this time to champion this feature.

We would be interested in possibly seeing this land as an experimental feature, where we can explore options for the solution. This would be with the intent that an RFC would follow up before stabilization. If anyone is interested in working on that, feel free to reach out to the Cargo team on #t-cargo on Zulip, or on the issue rust-lang/cargo#4648, or come to Office hours.

@rfcbot fcp postpone

@rfcbot
Copy link
Collaborator

rfcbot commented Dec 11, 2023

Team member @ehuss has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Dec 11, 2023
@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jan 4, 2024
@rfcbot
Copy link
Collaborator

rfcbot commented Jan 4, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Jan 4, 2024
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Jan 14, 2024
@rfcbot
Copy link
Collaborator

rfcbot commented Jan 14, 2024

The final comment period, with a disposition to postpone, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This is now postponed.

@rfcbot rfcbot added to-announce postponed RFCs that have been postponed and may be revisited at a later time. and removed disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Jan 14, 2024
@rfcbot rfcbot closed this Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
finished-final-comment-period The final comment period is finished for this RFC. postponed RFCs that have been postponed and may be revisited at a later time. T-cargo Relevant to the Cargo team, which will review and decide on the RFC. to-announce
Projects
Archived in project
Status: Final Comment
Development

Successfully merging this pull request may close these issues.