Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Nested Cargo packages #3452

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open

Conversation

kpreid
Copy link

@kpreid kpreid commented Jul 1, 2023

Rendered

This is my first Rust RFC. The idea was previously discussed on IRLO (Private nested Cargo packages), and the feedback was generally positive except for confusion about exactly what was being changed, which I believe I have now precisely defined in this RFC draft.

rust-lang/cargo#2203

@epage epage added the T-cargo Relevant to the Cargo team, which will review and decide on the RFC. label Jul 1, 2023
Comment on lines 20 to 24
Currently, developers must publish these packages separately. This has several disadvantages (see the [Rationale](#rationale-and-alternatives) section for further details):

* Clutters the public view of the registry with packages not intended to be usable on their own, and which may even become obsolete as internal architecture changes.
* Requires multiple `cargo publish` operations (this could be fixed with bulk publication) and writing public metadata for each package.
* Can result in semver violations and thus compilation failures, due to the developer not thinking about semver compatibility within the group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While some of these indirectly touch on it, one I'd explicitly add is sheer boilerplate.

In working on #3424, one of the things I've noticed is the commentary from people who are looking to further drop boilerplate. This also came up in a recent blogpost and HN discussion of it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal will still require Cargo.tomls for each nested package. What boilerplate do you see removing (besides e.g. explanatory README.md files)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any of the standard manifest fields that crates.io requires. Granted workspace inheritance helps with those (which will automatically be used in cargo new in 1.72) but much nicer if we can just leave them out

Combine that with "cargo script" (if we support [lib] packages) and you might not even need manifests (even ones embedded in source)

Copy link
Author

@kpreid kpreid Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a definitive statement of exactly which fields crates.io requires to compare to when discussing boilerplate reduction. https://doc.rust-lang.org/cargo/reference/publishing.html#before-publishing-a-new-crate implies it is one, but isn't really (e.g. homepage is not mandatory).

text/0000-nested-publish.md Outdated Show resolved Hide resolved
text/0000-nested-publish.md Outdated Show resolved Hide resolved
text/0000-nested-publish.md Outdated Show resolved Hide resolved
@epage
Copy link
Contributor

epage commented Jul 1, 2023

Thanks @kpreid for taking the time to write this up!

text/0000-nested-publish.md Outdated Show resolved Hide resolved
text/0000-nested-publish.md Outdated Show resolved Hide resolved
text/0000-nested-publish.md Outdated Show resolved Hide resolved
text/0000-nested-publish.md Outdated Show resolved Hide resolved
text/0000-nested-publish.md Outdated Show resolved Hide resolved
The following changes must be made across Cargo and `crates.io`:

* **Manifest schema**
* The Cargo manifest now allows `"nested"` as a value for the `package.publish` key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is already a term in cargo, I actually lean towards "vendor"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the similarity, but vendoring normally means making a copy of a package that is available by other means, and one of the design goes here is to discourage any such copies existing (because they are likely to be accidental, and if they aren't, then they may create the same kinds of problems as multiple major versions do). I think reusing the term would create more confusion than it avoids.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vendoring normally means making a copy of a package that is available by other means, an

The perspective I was using when I came up with "vendor" was that instead of getting a dependency through the registry, we are copying it into our package. Its not vendored within the repo but in the .crate file.

This also ties into whether we should generalize this across dependency sources at which point it feels like it becomes even more applicable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think that vendoring is the right term, especially as one of the things that came up during my review of prior art is the concept of using nested-packages-or-whatever for vendoring, e.g. to fix a bug before upstream accepts the patch — I think these need to be kept distinct ideas.

That being considered, what do we need to do here with the RFC text to resolve this thread? Should there be an unresolved question for terminology, or can we just proceed as-is?

Copy link

@sam0x17 sam0x17 Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "vendoring" isn't all-inclusive of the situations where one might use this. Usually vendoring refers to "taking some third party dependency's entire source code and jamming it into to some vendor/whatever directory (or sometimes via a git submodule) for reasons". I don't think it makes sense to use this term in scenarios where the nested crate isn't third party, which I think will actually be most of the time with this feature, so "vendoring" at best will be a misnomer, and at worst might also carry some negative connotations for people recalling some really crazy repo setups.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen any "seconding" of support for the "vendor" name, so I'm going to keep "nested" and resolve this. (Of course, there might be some third or fourth better idea to be found…)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least keeping this unresolved to centralize any name bikeshedding conversations

text/0000-nested-publish.md Outdated Show resolved Hide resolved
text/0000-nested-publish.md Outdated Show resolved Hide resolved
@repi repi mentioned this pull request Jul 7, 2023
35 tasks
text/0000-nested-publish.md Outdated Show resolved Hide resolved
text/0000-nested-publish.md Show resolved Hide resolved
@kpreid
Copy link
Author

kpreid commented Jul 26, 2023

Progress on the RFC has been stalled mainly because I have been pondering how to specify the behavior given the premise that inclusion should be based on the presence of dependencies, not just directory nesting (particularly to permit nested “utility” packages which, in the workspace they are developed, are not directory children of the published package, because they are shared between multiple published packages).

I'm currently thinking that:

  • Each dependencies entry must explicitly request nesting of the named package
  • If the nested package is within the parent's directory, keep it there. If not, put it in a designated location, perhaps ./.cargo-package/packages/ (the ./.cargo-package/ parent directory being reserved for future extensions in case something like this comes up again).
  • Sub-packages declaring package.publish = "nested" is still required, to prevent mistakes. But there might be e.g. a flag to cargo publish to override that requirement.
  • Sketch out some means by which "public binary packages" could be achieved, to make sure that we are not making it difficult to add later. Since such packages will often depend on the parent package, not vice versa, they will need explicit manifest information pointing at them, unlike the current version.
    • Perhaps something like package.include-packages = ["..."], and if you specify it then it has to be complete (i.e. it is an error if a dependency entry asks for nesting but include-packages does not mention that package.

I think that covers most of the necessary ground, but I still need to write the corresponding RFC text.

@epage
Copy link
Contributor

epage commented Jul 28, 2023

Huh, just found #2224. @kpreid I'd recommend going through that old RFC discussion to see what is relevant or not.

@sam0x17
Copy link

sam0x17 commented Aug 18, 2023

This would dramatically improve the current state of publishing and managing proc macro crates, and just would be a huge QOL improvement for large-workspace projects, of which there are many.

text/0000-nested-publish.md Outdated Show resolved Hide resolved
@@ -119,9 +119,10 @@ Then you can `cargo publish` from within the parent directory `foo/`, and this w

Two new possible values are added to the manifest.

* The `package.publish` field allows `"nested"` as a value, in addition to existing `false` and `true`. This value affects `cargo publish` and nested dependencies as described below.
* The `package.publish` or `workspace.package.publish` field allows `"nested"` as a value, in addition to existing `false` and `true`. This value affects `cargo publish` and nested dependencies as described below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings on allowing nested to be inherited. In part, I feel like its a weird default for the full workspace but I also appreciate leaving the individual building blocks and letting users decide what to do.

text/0000-nested-publish.md Outdated Show resolved Hide resolved
text/0000-nested-publish.md Outdated Show resolved Hide resolved
@epage
Copy link
Contributor

epage commented Mar 18, 2024

Trying to raise visibility of this on the respective teams so we can see what more problems we can uncover.

Comment on lines +265 to +267
* Instead of introducing `package.publish.nested = true`, we could only require that dependencies be declared as nested. The disadvantages of this are:
* May unintentionally duplicate published code between a standalone published package and a nested package
* Does not make both ends of the relationship explicit to readers of the code.
Copy link
Contributor

@epage epage Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason: we want nested packages to have an explicit opt-in is we don't want a happy path for forking other people's packages (if we expanded this feature to more than path-dependencies). We should instead encourage people to be working with upstream and putting the end-user in charge of what versions get selected.

Comment on lines +16 to +18
* A trait declaration and a corresponding derive macro (which must be defined in a separate proc-macro library).
* A library that uses a build script that uses another library or binary (e.g. for precomputation or bindings generation).
* A logically singular library broken into multiple parts to speed up compilation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the value of this RFC, and these pain points are truely painful. I am a bit not comfortable about the complexity exposed to Cargo users. With the upcoming public/private dependencies in Edition 2024. The situation becomes way awkward.

# in a `foo` package
foo-priv-types = { path = "priv-types", public = false, publish = "nested" }
foo-core= { version = "0.1", path = "core", public = true }
foo-util = { version = "0.1", path = "util", public = false }
foo-derive = { path = "derive", public = true, publish = "nested" }

The above example is very likely to happen, but it not immediately clear the mixed meaning of public and nested.

  • public = false + nested
    • types are not exposed (private) in public API, and that package is published as a private module
    • 👍🏾 make sense
  • public = true + version
    • types are exposed in public API, and that package is published separately
    • 👍🏾 make sense
  • public = false + version
    • types are not exposed in public API, and that package is published separately
    • 🤔 seems awkward; this RFC addresses it
  • public + nested
    • types are exposed in public API but that package is published as it is a private module
    • 🤔 looks a bit more awkward; this RFC addresses it,

I may have over-complicated the situation, but it indeed introduces cognitive overhead to understand when combining different concept together. I don't know how complex inline-module would be, but that might be a chance to changing to compilation unit from crate to module (don't bash on my head, just an idea).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No need to say when open namespace comes and joins the party. While it's a pretty independent feature, the learning curve doesn't look too good when everything gathers…)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, the big concern with public + nested is when two workspace members do that for the same dependency. Locally, they will be interchangeable. When published, they will not. This delays testing and could confuse users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor question: If a Git repository contains a package with nested packages, can the other package depends on any of those nested packages as a Git dependency? Currently Git dependency searches packages whose name matches recursively inside the repository.

* It is not possible to publish a bug fix to a nested package without republishing the entire parent package; this is the cost we pay for the benefit of not needing to take care with versioning for nested packages.

* Suppose `foo` has a nested package `foo-core`. Multiple major versions of `foo` cannot share the same instance of `foo-core` as they could if `foo-core` were separately published and the `foo`s depended on the same version of `foo-core`. Thus, choosing nested publishing may lead to type incompatibilities (and greater compile times) that would not occur if the same libraries had been separately published.
* If this situation comes up, it can be recovered from by newly publishing `foo-core` separately (as would have been done if nested publishing were not used) and using the [semver trick](https://github.com/dtolnay/semver-trick) to maintain compatibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semver trick does solve this problem, but only when those package bumps their MSRV, so this feature won't get adopted until some point. Some popular crates using proc-macros hold a relatively conservative MSRV. The adoption rate might be mild annoying in the short-mid term.
(Granted, other alternatives would have the same issue)

* If this situation comes up, it can be recovered from by newly publishing `foo-core` separately (as would have been done if nested publishing were not used) and using the [semver trick](https://github.com/dtolnay/semver-trick) to maintain compatibility.

* Support for duplicative nested publishing (that is, nested packages that are nested within more than one parent package) has the following consequences:
* May increase the amount of source code duplicated between different published packages, increasing download sizes and compilation time. It's currently possible to duplicate code into multiple packages via symlinks, but this would make it an “official feature”.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with this, as a side effect, this might provide an easier route for people to vendor code more than it was. The Rust ecosystem is different from 5 years ago, but the initial idea of Cargo/crates.io is to encourage everybody to publish and share codes. And whenever possible, play nice and help fix bugs in upstream.

Again, Rust is not Rust anymore. The old rules might not suitable nowadays since enterprise and other large projects starts using it more. Regardless, the potential consequence of paradigm shift is something we need to be aware of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with this, as a side effect, this might provide an easier route for people to vendor code more than it was.

At first I wasn't too concerned about it but then I thought of what insta had to do to vendor yaml-rust. They have to copy the code in, merge any dependencies, and update any crate:: relative references.

With this feature, its copy the package directory in, mark it as nested, and you are done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking over #2224, two of the participants specifically called out wanting to use it for private forks :/

Comment on lines +14 to +18
There are a number of reasons why a Rust developer currently may feel the need to create multiple library crates, and therefore multiple Cargo packages (since one package contains at most one library crate). These multiple libraries could be:

* A trait declaration and a corresponding derive macro (which must be defined in a separate proc-macro library).
* A library that uses a build script that uses another library or binary (e.g. for precomputation or bindings generation).
* A logically singular library broken into multiple parts to speed up compilation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another use case is that this provides another way for us to break dependency cycles that involve dev-dependencies.

Currently, the solution involves dropping the dependency on publish (by not specifying a version). This lacked discovery so by default cargo add does it for all path dev-dependencies. This negatively impacts crater because it means that any packages with dev-dependency cycles or where cargo add was used to add a path dev-dependency, we lose out on a lot of testing with crater.

With this feature, we can instead nest the path dev-dependency.


If `package.publish` is a table, then `package.publish.registries` defaults to `false`, regardless of the value or presence of `package.publish.nested`.

Note: This dual-publishing-mode functionality is permitted mainly to keep the functionality composable/orthogonal. We hope that in most cases, packages are either published nested exactly once, or to a registry alone, to avoid duplicating code in the registry and compiling it redundantly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A use case to call out for this is breaking cycles that come about from dev-dependencies.


* This increases the number of differences between “Cargo package (on disk)” from “Cargo package (that may be published in a registry, or downloaded as a unit)” in a way which may be confusing; it would be good if we have different words for these two entities, but we don't.

* It is not possible to publish a bug fix to a nested package without republishing the entire parent package; this is the cost we pay for the benefit of not needing to take care with versioning for nested packages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, for the proc-macro case, = version requirements are frequently used and so both would need to be published anyways in that case.


This check is intended only to prevent accidents (such as vendoring a third-party package without considering the implications of redistributing it). It is always valid to omit `package.license` from the nested package, thus making no machine-readable claims about its licensing.

It is an error for a nested package to have the same package name as the parent package or any other nested package with the same parent package. This is validated by all Cargo operations that would generate or read a lockfile. Rationale: This should ensure that whenever a nested package must be named, such as in an `.crate` archive, potentially in lock files, and potentially in Cargo user interface, the pair of (parent package name, nested package name) is sufficient to uniquely identify the package.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just leave this as a conflict validation check during publish? That means one less thing to have to validate on every command and this would allow nested packages in nested directories to not be subject to this limitation


The package index, and the `crates.io` user interface, do not explicitly represent nested packages; the package is presented as if it were a single package:

* Nested packages’ dependencies are flattened into the listed dependencies of the parent package.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned that this seems like it could be a fairly complex operation to flatten dependencies. For example, how does it handle name collisions? How does it handle fields that aren't normally unified, like the public field? I'm not quite sure I see how this could work.

If the intent is to have the resolver not know about nested packages, it seems like that would make it difficult for other parts of cargo to know about those nested packages since they are all driven by the resolver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another field that can't be readily merged is links. Is it an error to have more than one links in the tree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't hide this from the resolver, that opens up other areas of complexity

iiuc

  • we would need a way for source ids to refer to these packages
  • we would need a way to include these entries in the Index without conficting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By merging dependencies we probably have to mean that the index will list all of the dependencies from the merged list. I.E. the list of dependencies will have more than one entry for foo. Rather than us merging all references to foo into one dependency. I Think, the resolver already has to deal with index entries with duplicate package names, for target specific dependencies and rename dependencies.

If we actually want to merge into one dependency entry, semver constraints are going to make life difficult. >=1.0.0-alpha, <3.0.0 is not the merger of >=1.0.0-alpha with <3.0.0. (1.0.0-alpha matches the first two but not the third one).

@ehuss
Copy link
Contributor

ehuss commented Apr 2, 2024

One drawback that I think would be good to include is the risk around having publicly exposed shared types be incompatible if a dependency is built multiple times.

For example: If I have three packages, "a", "b", and "shared". There is a dependency from a→shared and b→shared, and I publish a and b with shared being nested in both, and a and b expose types from shared, then they cannot be used together. That might not be evident even after publishing, since publishing won't validate that a and b work together. Also, the error message from rustc could be fairly difficult to understand.


* A trait declaration and a corresponding derive macro (which must be defined in a separate proc-macro library).
* A library that uses a build script that uses another library or binary (e.g. for precomputation or bindings generation).
* A logically singular library broken into multiple parts to speed up compilation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A logically singular library broken into multiple parts to speed up compilation.

Similarly, a bin might want to split out a lib for local development and testing but not consider it public and not offer semver guarantees for the lib. cargo-edit and cargo-release are like this.

* When packages are implementation details, it makes a permanent mark on the `crates.io` registry even if the implementation of the parent package stops needing that particular subdivision. By allowing sub-packages we can allow package authors to create whatever sub-packages they imagine might be useful, and delete them in later versions with no consequences.
* It is possible to depend on a published package that is intended as an implementation detail. Ideally, library authors would document this clearly and library users would obey the documentation, but that doesn't always happen. By allowing nested packages, we introduce a simple “visibility” system that is useful in the same way that `pub` and `pub(crate)` are useful within Rust crates.

## Alternatives to nested packages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is to add a form of access control related to open namespaces so you can have published packages with APIs scoped to the namespace

  • Perform this access control at the cargo/crates.io level by allowing a published package to be marked as private, disallowing any package to depend on it unless its participating in the namespace.
    • However, namespace membership control is only enforced at the registry level, so for any local development, you can access the private packages by putting yourself in the namespace
  • Language level support with a pub(namespace) (name to be bike-shedded)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, neither helps with the publishing overhead.


* We could choose to explicitly prohibit nested packages from specifying a `package.version`, to avoid giving the misleading impression that it means anything. This would be notably stricter than the current meaning of absent `package.version` as of Cargo 1.75, which is that it is completely equivalent to `version = "0.0.0"`. It would also prohibit having a package that is both nested and published to a registry, if that is desired.

# Future possibilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined with bin deps, we could allow delegating build scripts to a nested package, allowing a more complete environment for its development.


There are a number of reasons why a Rust developer currently may feel the need to create multiple library crates, and therefore multiple Cargo packages (since one package contains at most one library crate). These multiple libraries could be:

* A trait declaration and a corresponding derive macro (which must be defined in a separate proc-macro library).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should -sys packages be a motivation with using this being a recommended path or should we discourage using this with -sys

This was talked about a little at #2224 (comment)

@epage
Copy link
Contributor

epage commented Apr 2, 2024

We discussed this further in today's cargo team meeting. The above comments opened represent some of the discussion.

Several areas of concern were focused on:

The original appeal of this solution is that is relatively trivial in thought as path dependencies exist. The problem is when getting into the details, particularly dependency resolution / the Index. On top of that, crates.io would likely want to duplicate the Index Summary generation from the .crate file because they want to work with one source of truth, rather than multiple (and its unclear if we can flatten in just Cargo.toml).

This also adds more design choices to the user, making it harder for them to navigate the choice of what tools to use which would include

  • workspaces
  • packages
  • build-targets / crates within a package
  • open namespaces
  • public / private packages
  • nested packages

While this helps to streamline people's workflows by reducing publishing overhead, there isn't any inherent blockers from people solving this today. The closest is likely in communicating semver guarantees and there is interest in something like pub(namespace) as a way solving the problem. For the friction in publishing these packages, we should likely work to improve that larger problem. In several workflows discussed, it seemed likely that a workspace would have more than one non-nested package, meaning workspace workflow improvements are still necessary.

All of that said, the unease wasn't enough to block this proposal at this stage but we would be open to this being explored further to see how much these concerns are addressed or validated when this is put into practice.

As for next steps, we didn't fully resolve that. The RFC affects two teams with crates.io mostly being involved for the Index side of things and would generally be implemented after a go/no-go from the above exploration. We didn't get to the point of discussing whether we'd want to push this RFC forward, experiment with it, and possibly reject it after approval if the concerns were shown to be blocking, or if we'd want to authorize an experiment to be done as part of the writing of this RFC, much like cargo-script.

The big risk would be to where you put your time as this would likely be a big experiment with enough uncertainty around it that the odds of it being stabilized are iffy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action from the author. T-cargo Relevant to the Cargo team, which will review and decide on the RFC. T-crates-io Relevant to the crates.io team, which will review and decide on the RFC.
Projects
Status: RFC needs review
Status: Unreviewed
Development

Successfully merging this pull request may close these issues.

None yet