-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Look into ways to decouple some crate metadata from Cargo.toml
#3167
Comments
I would still like the non-versioned information to be managed through cargo, not through a crates.io-specific interface. (Although it's fine if crates.io also has a way to manage that as well). This would allow that information to be easily updated by CI/CD pipelines, and could be managed in a way that is not tied specifically to crates.io. This is why I suggested putting that information in a separate file, which is also read by |
Random thought: the information could also be written in Not sure whether this idea holds its weight, though, but it'd avoid most if not all the issues around deprecating the authors field I saw discussed in the RFC there. |
Currently crates.io cleans Cargo.toml with Cargo.toml.orig in a separate file. What about changing them to Cargo-version.toml and Cargo-package.toml in two separate files? |
This is done by cargo actually.
I would also like to provide a stable API and maybe a |
Please keep alternative registries in mind. If the solution is crates-io-specific, then other registries will struggle to manage this data. This data could theoretically be added to the git-based registry index, but the amount of extra data could be painful. However, the planned HTTP-based registry protocol could accommodate extra data. rust-lang/rfcs#2789 |
Regarding the |
It might help to think of this abstractly. We've got some data that is:
I think this is roughly how we're thinking of it at the moment. However, this isn't expressive enough to effectively represent all the information we want. There are other properties to consider:
(This list might not be exhaustive.) I've had a go at placing some things into this framework:
I haven't described indirection well enough here, because it has three levels of potential mutability:
Are there any glaring holes in this framework? |
Hmm… Really, we only need to keep things that affect builds immutable:
We'd probably need guarantees from Though comments are most likely to contain metadata that should logically be mutable, so maybe we don't even need to go that far. |
Everything downloaded from crates.io as part of a build must be immutable, even if it doesn't affect the final result. I would not use a registry that does anything else. It's fine for there to be additional files/metadata that are "mutable" (although mutability is a lie anyway...) but that should not be downloaded when I run When you publish a crate you are publishing a snapshot for people to depend on. You don't get to later change that snapshot, at all, that's what "publishing" means. If you publish a book, and make a mistake, you can't just locate all the copies and update them (and legally you would have no right to). All you can do is publish a new revised copy. |
Does Cargo necessarily have to download the original source code for dependencies? I can imagine a The main reason I raised this is that many licenses recommend adding a snippet of license text at the top of the source file. That, and other practices of embedding metadata in source files, puts a limit on how much we can do this. Is the documentation logically part of the program? I could make arguments either way – consistency with |
Digital distribution is different from physical distribution in some pretty fundamental ways, in that you can make emendations at any time so that all future distributions of the product reflect those changes. And it's absurd to assert that changing ancillary metadata necessitates a new publication when there are no substantial changes to the work. What I'm proposing, in effect, is that crate-wide metadata be stapled to the package when it is downloaded, rather than requiring packages to contain a fixed version of that metadata that is then immutable. |
I understand what you're proposing. I'm saying that, as a consumer of a crate, I did not agree to download your new ancillary metadata. When I add a dependency to my lockfile, I'm choosing to add it as it was. You don't get to decide in future that actually I wanted something else, even if the change should not affect the output of the build. crates.io serves both crate publishers and crate consumers, it doesn't just serve the publishers. |
That ancillary metadata is already present in existing crates; moving it to a location where it can be more readily modified does not actually change what you're downloading, nor should it require additional network requests. |
It's not about additional network requests, it's about the content of what I'm downloading. At the moment, if I add a crate to a lockfile, then I know what will be downloaded for that crate. It's never going to change unless I update the lockfile. That's the one thing I require as a crate consumer from a registry. |
Surely the “What you're downloading” is actually somewhat nebulous already. It's not the TLS or HTTPS, it's not the file attributes – it's already some subset of what you end up with. Everything's a trade-off:
Focusing on only the first of these questions is, imo, a mistake – especially if you're assuming guarantees that aren't actually there, just because you haven't encountered a violation of those guarantees yet. (I don't actually know where these guarantees are currently documented, if they even are.) |
No, it's something the registry gets to decide. I'm saying I would not use crates from a registry that does not guarantee this.
My personal preference would be for crate contents to be completely immutable unless compelled by law to be removed, in which case they are deleted. If the crates.io team want to permit crate deletion in more circumstances, then that's their prerogative: as long as it continues to be exceedingly rare then I have no real issue with that, although I think it won't scale, and it will be difficult to enforce rules consistently. Changing crate contents is a different thing entirely: I would rather it never happen, but if it has to happen then it should be an extreme circumstance, and the change should be one made by the crates.io team, it shouldn't be something the crate author can do. (By "crate contents" I am specifically referring to everything that is downloaded from the registry for that crate version when you run
Not really: the part that the crate author has control over is very well defined. In using crates.io I'm trusting the crates.io team and involved protocols. That doesn't mean I trust every crate author, and even if I trusted them I wouldn't trust that their API tokens will never be compromised in the future. |
I don't think mutable crate metadata needs to be downloaded by Still, I'd love to better understand @Diggsey's concerns around |
Well, crates.io already doesn't. Realistically, only a registry you control can guarantee this, where “in perpetuity” means “for as long as you need it”.
“Always X, unless Y” isn't an elegant system. How much will break when the first fairly popular crate has illegal numbers snuck into the comments by some developer protesting something, and versions 0.2.3 to 1.4.5 get DMCA'd? Or, more realistically, if somebody copied sections of an ISO standard's text to describe the algorithm they were implementing, or if a chunk of text was grabbed from Wikipedia but the author forgot to add attribution… I completely understand wanting some guarantee as to when a version is identical (for which there's already a I think I can justify treating comments as somewhat metadata-like, though I'm not sure whether exposing a second “ |
Not sure where you're going with this: unless we build crates.io on a blockchain, then legally crates.io can be compelled to remove crates and there's nothing we can do about that. The fact that we still have to remove crates in some cases is not an argument to remove them in more cases, or an argument to allow them to be modified. Are you suggesting that, by modifying a crate, we can remove the offending content without breaking builds? Perhaps, but having modifiable metadata does not help unless the offending content happens to be in that metadata, so are you now suggesting that the whole crate should be modifiable?
There are a ton of issues here, to list a few:
And all of this to solve a non-existent problem.
|
I'm suggesting that there are cases where it's useful to view everything that doesn't affect the build as modifiable metadata. (Even crate names, though the cases where that's something you can actually change are very, very rare.) Some are more convoluted than others, but I can make a case for all the ones I've thought of, and the chances of none of this coming up in practice is pretty slim. (Zero, if you count historical precedent, though I did use the future tense.) I'm not saying it's a good idea for all cases (and hence, I'm not saying that all metadata should be modifiable by the crate author) but I think that something that actually happens – and that might be legally mandated even if everyone agrees that not doing it is more important than the (very important) reasons it currently happens – shouldn't be a tacked-on “kinda breaks stuff but we ignore it” aspect of the registry system.
The timestamps also affect the build, so no new problem is introduced. (And a crate version going missing entirely – which mutable metadata is in part a solution to – would affect the build a lot more.)
What if the build script for the crate contains a back-door which is only triggered after a change to the system clock? My advice: don't put malware on your computer. Anyone with access to the registry can already do these. (What if the registry gets cracked?) Again, no new problem is introduced.
Good points! The biggest difficulty, as you (and others) have pointed out, is build scripts. No thanks to those nasty mathematicians (Gödel, Tarski, Church, Turing, etc.), it's not actually possible, in the general case, to know what a build script is doing. Fortunately for us, though, we don't need to consider the general case: most build scripts will call For crates without build scripts, I'm guessing that non-doc comments and Getting the definition of “changes that do not affect the build” right is already really important – there are many places in the toolchain where non-determinism can crop up as-is, so the changes I'm proposing wouldn't be that special in that regard. It's just occurred to me that debug symbols are part of the build, so line numbers and internal variable names are probably not up for grabs – but I see no reason that modifying comments shouldn't be okay, unless
If it were non-existent, we wouldn't be talking about it. |
|
I would like to avoid putting malware on my computer, but your proposal is making that difficult by breaking the link between crate versions and their contents. This is something which
The biggest difficulty is the shear amount of unnecessary complexity that this would add.
Well nobody's actually given a reason to do this, other than an emotional aversion to publishing new crate versions. Maybe you can explain why information that is mutable needs to be downloaded by Pros:
Cons:
TBH, even if none of the cons existed, being able to change comments under these specific rules is still an anti-feature, because having a feature that only works in such narrow circumstances is a footgun by itself. |
Up to now, author information has been attached to crate releases in a way which makes changes to that field require a new release. The problem with this approach is that name changes (for various cultural reasons) or address changes (for other reasons) means that ensuring that all releases have correct author information requires making a new release and does nothing about existing releases. I want to decouple this in a way that allows cargo to (continue to?) be able to provide such information. I fully recognize that there are other approaches (e.g. fetching such information on demand), but on e.g. a metered connection I don't want tools making network requests without my knowledge. |
Requirements of the licenses, I think. Though it doesn't have to be associated with the crate contents (if you ignore my “comments” proposal), and so wouldn't have to be “part of the version”; it could be part of the registry instead (or something else entirely). Apart from that, mere backwards-compatibility is the only reason for it. It's information that was available via the
Yeah. I have no idea how much relies on those, so that's probably enough to ignore my comment proposal. (Though decoupling other metadata from
But it's a feature that just so happens to line up with nearly all the reasons for deleting (rather than yanking) a crate version. Deleting crate versions is also a footgun. |
Side-stepping this issue: can anyone think of a way to completely separate metadata from the crate versions? (Preferably in a way that lets us have mutable crate metadata and correctable version metadata.) |
Thanks for the examples @Diggsey, I now agree that In general I think that changing parts of the source code are out of scope for this proposal, and in my personal opinion that's not a feature I'd like to see implemented. Can we focus on how to provide mutable access to the metadata mentioned by Justin in the issue body instead? |
I'd be OK with some kind of "what Crates.io displays may be newer information, similar to staging stuff in git, and it will reject Compromising that feels like setting yourself on the road to people maintaining their own Crates.io proxy-caches to restore that property and possibly sabotaging efforts to assure enterprise users that the purpose of their internal processes relates to availability/uptime and auditing procedures/processes/workflow, not stability of contents. |
Could this also include deprecation information a la nuget.org and npmjs.org (and related tools)? That is, a package in its entirety could be marked deprecated (with an optional array of packages that replace it, since a package may be refactored into multiple) or a range of versions e.g., a 1.x is deprecated and only 2.x is supported. We are looking to deprecate a bunch of packages in the not-so-distant future and many will have supported replacements. In most other languages we support, we have either direct support from package managers (the site and tools, though I agree with the statement above that it's not critical for crates.io to have UI for this just yet) or a work around like adding a deprecation error message in Python's More context may be found in https://rust-lang.zulipchat.com/#narrow/stream/246057-t-cargo/topic/Deprecating.20crates/near/434808268. |
An outline of our current scheme for crate and version level metadata (with
quoted
fields coming fromCargo.toml
:description
homepage
documentation
repository
categories
keywords
badges
(largely deprecated)features
license
authors
With the exception of
authors
, all of the version specific metadata is clearly immutably tied to the specific crate that was published under that name and version. For the crate-wide metadata, it can be annoying to be required to publish a new version to update these fields.It would be nice if the crate-wide metadata was managed through the web interface (Edit: and eventually a stable API to support
cargo
integration and other tooling), instead of viaCargo.toml
.Suggestions
For optional fields, we can probably do something like:
Cargo.toml
, then we overwrite the value and return the previous contents in a warning to the user. (If this was a mistake, then the user has to publish a new crate that is otherwise a no-op. This is a bit unfortunate, but seems better than forcing a new publish to change any piece of metadata.)The only required crate-wide field is
description
. I don't see changing this as high-priority, so we can focus on the optional fields.A few additional notes
We've had bugs in the past where publishing
0.2.1
and backporting to0.1.12
would overwrite metadata unexpectedly if they were published in that order. Off the top of my head, I don't know exactly how we handle pre-releases and metadata updates. My point is, saving the metadata from the "right" release isn't entirely straightforward and might cause unexpected results. It would be nice if we could reduce the user impact of this complexity.The
documentation
field is a bit interesting, because there is no way for a crate version to link directly to version specific information. We fixup any docs.rs links automatically in the frontend so that they automatically point to the correct version specific documentation (matching the behavior for when a value is not provided). This means that user can either have automatic version-specific links to docs.rs, or one link to their most recent 3rd party documentation for all their releases. (Edit: I think we could improve this field by supporting some type of{$version}
placeholder where crates.io would automatically customize the URL for each release but it would only need to be set once globally for the crate and could be easily modified if a project's 3rd party documentation moves.)The text was updated successfully, but these errors were encountered: