Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add workspaces to Cargo #1525

Merged
merged 10 commits into from
May 4, 2016
345 changes: 345 additions & 0 deletions text/0000-cargo-workspace.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,345 @@
- Feature Name: N/A
- Start Date: 2015-09-15
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary

Improve Cargo's story around multi-crate single-repo project management by
introducing the concept of workspaces. All packages in a workspace will share
`Cargo.lock` and an output directory for artifacts.

# Motivation

A common method to organize a multi-crate project is to have one
repository which contains all of the crates. Each crate has a corresponding
subdirectory along with a `Cargo.toml` describing how to build it. There are a
number of downsides to this approach, however:

* Each sub-crate will have its own `Cargo.lock`, so it's difficult to ensure
that the entire project is using the same version of all dependencies. This is
desired as the main crate (often a binary) is often the one that has the
`Cargo.lock` "which counts", but it needs to be kept in sync with all
dependencies.

* When building or testing sub-crates, all dependencies will be recompiled as
the target directory will be changing as you move around the source tree. This
can be overridden with `build.target-dir` or `CARGO_TARGET_DIR`, but this
isn't always convenient to set.

Solving these two problems should help ease the development of large Rust
projects by ensuring that all dependencies remain in sync and builds by default
use already-built artifacts if available.

# Detailed design

Cargo will grow the concept of a **workspace** for managing repositories of
multiple crates. Workspaces will then have the properties:

* A workspace can contain multiple local crates.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could "multiple" perhaps better be replaced with "zero or more" (if a workspace with zero local crates is not forbidden)?

* Each workspace will have a root.
* Whenever any crate in the workspace is compiled, output will be placed in the
`target` directory next to the root.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we worry about name collision?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not any more so than we already have today.

* One `Cargo.lock` for the entire workspace will reside next to the workspace
root and encompass the dependencies (and dev-dependencies) for all packages
in the workspace.

With workspaces, Cargo can now solve the problems set forth in the motivation
section. Next, however, workspaces need to be defined. In the spirit of much of
the rest of Cargo's configuration today this will largely be automatic for
conventional project layouts but will have explicit controls for configuration.

### New manifest keys

First, let's look at the new manifest keys which will be added to `Cargo.toml`:

```toml
[workspace]
members = ["relative/path/to/child1", "../child2"]

# or ...

[package]
workspace = "../foo"
```

The root of a workspace, indicated by the presence of `[workspace]`, is
responsible for defining the entire workspace (listing all members).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listing all members

I'd prefer to have "implicitly or explicitly" here

This example here means that two extra crates will members of the workspace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "will members" is probably supposed to say "will be members"

(which also includes the root).

The `package.workspace` key is used to point at a workspace root. For
example this Cargo.toml indicates that the Cargo.toml in `../foo` is the
workspace root that this package is a member of.

These keys are mutually exclusive when applied in `Cargo.toml`. A crate may
*either* specify `package.workspace` or specify `[workspace]`. That is, a
crate cannot both be a root in a workspace (contain `[workspace]`) and also be
member of another workspace (contain `package.workspace`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"be a member" reads more easily to me than "be member"


### "Virtual" `Cargo.toml`

A good number of projects do not necessarily have a "root `Cargo.toml`" which is
an appropriate root for a workspace. To accommodate these projects and allow for
the output of a workspace to be configured regardless of where crates are
located, Cargo will now allow for "virtual manifest" files. These manifests will
currently **only** contains the `[workspace]` table and will notably be lacking
a `[project]` or `[package]` top level key.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was [project] ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alias for [package] with planned different semantics that never got off the ground.


Cargo will for the time being disallow many commands against a virtual manifest,
for example `cargo build` will be rejected. Arguments that take a package,
however, such as `cargo test -p foo` will be allowed. Workspaces can eventually
get extended with `--all` flags so in a workspace root you could execute
Copy link

@nodakai nodakai Apr 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we achieving by requiring --all for "virtual" projects? A user will have to first look inside the top Cargo.toml and see if it's "virtual" or not for a successful build of a given Cargo project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand the question here. The point of this is to say that cargo build doesn't work, not that --all is being added.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, why should cargo build not work for a "virtual workspace" ? My point is, you are suggesting to have two types of Cargo.toml; one accepts cargo build and one rejects cargo build. This can be a source of confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly erring on the side of being conservative, we can always alter the meaning of cargo build on a virtual crate to be an alias for cargo build --all later.

`cargo build --all` to compile all crates.

### Validating a workspace

A workspace is valid if these two properties hold:

1. A workspace has only one root crate (that with `[workspace]` in
`Cargo.toml`).
2. All workspace crates defined in `workspace.members` point back to the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have "explicitly or implicitly" for workspace.members and package.workspace for the sake of clarification.

workspace root with `package.workspace`.

While the restriction of one-root-per workspace may make sense, the restriction
of crates pointing back to the root may not. If, however, this restriction were
not in place then the set of crates in a workspace may differ depending on
which crate it was viewed from. For example if workspace root A includes B then
it will think B is in A's workspace. If, however, B does ont point back to A,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: ont should be not.

(Please tell me if I'm not allowed to point out typos here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

then B would not think that A was in its workspace. This would in turn cause the
set of crates in each workspace to be different, futher causing `Cargo.lock` to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "futher" is missing an "r" ("further")

get out of sync if it were allowed. By ensuring that all crates have edges to
each other in a workspace Cargo can prevent this situation and guarantee robust
builds no matter where they're executed in the workspace.

To alleviate misconfiguration Cargo will emit an error if the two properties
above hold for any crate attempting to be part of a workspace. For example, if

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to say that Cargo will emit an error if the two properties do not hold?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it says no crates other than "the" root should satisfy the conditions in a workspace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcuddeback oops, indeed!

the `package.workspace` key is specified, but the crate is not a workspace root
or doesn't point back to the original crate an error is emitted.

### Implicit relations

The combination of the `package.workspace` key and `[workspace]` table is enough
to specify any workspace in Cargo. Having to annotate all crates with a
`package.workspace` parent or a `workspace.members` list can get quite tedious,
however! To alleviate this configuration burden Cargo will allow these keys to
be implicitly defined in some situations.

The `package.workspace` can be omitted if it would only contain `../` (or some
repetition of it). That is, if the root of a workspace is hierarchically the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this case:

ws0
 +- Cargo.toml  // [workspace] members = ["ws1/util"]
 +- src
 +- ws1
     +- Cargo.toml // [workspace] members = []
     +- src
     +- util
         +- Cargo.toml // package.workspace is omitted
         +- src

So, although util/../.. does point back to ws0 and the workspace ws0 contains a single root, it is invalid due to the "that is" sentence; is my understanding correct? I mean, the "that is" sentence is actually a stronger restriction than what it tries to rephrase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ws0/ws1/Cargo.toml depends on ws0/ws1/util/Cargo.toml, then yes this is an invalid workspace.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I found "hierarchically the first" confusing; did it mean the closest to the root directory of the system (/) or the furthest from it?

I interpreted it as "the furthest" (because it would be the "first" to be found by a repeated application of ../); then the root of the workspace ws0 is not "the first" Cargo.toml with [workspace] regardless of ws0/ws1's (lack of) dependency on ws0/ws1/util.

My question could be rephrased as: can an unrelated Cargo.toml (ws0/ws1/Cargo.toml in this case) break the validity of a workspace (ws0) that depends on implicit package.workspace ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

first `Cargo.toml` with `[workspace]` above a crate in the filesystem, then that
crate can omit the `package.workspace` key.

Next, a crate which specifies `[workspace]` **without a `members` key** will
transitively crawl `path` dependencies to fill in this key. This way all `path`
dependencies (and recursively their own `path` dependencies) will inherently
become the default value for `workspace.members`.

Note that these implicit relations will be subject to the same validations
mentioned above for all of the explicit configuration as well.

### Workspaces in practice

Many Rust projects today already have `Cargo.toml` at the root of a repository,
and with the small addition of `[workspace]` in the root a workspace will be
Copy link

@nodakai nodakai Apr 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I didn't misread the section "Validating a workspace," the workspace is not valid unless we add [package] workspace = ... to children crates. Update: This comment is invalid. I misread the "Implicit relations" section.

ready for all crates in that repository. For example:

* An FFI crate with a sub-crate for FFI bindings

```
Cargo.toml
src/
foo-sys/
Cargo.toml
src/
```

* A crate with multiple in-tree dependencies

```
Cargo.toml
src/
dep1/
Cargo.toml
src/
dep2/
Cargo.toml
src/
```

Some examples of layouts that will require extra configuration, along with the
configuration necessary, are:

* Trees without any root crate

```
crate1/
Cargo.toml
src/
crate2/
Cargo.toml
src/
crate3/
Cargo.toml
src/
```

these crates can all join the same workspace via a `Cargo.toml` file at the
root looking like:

```toml
[workspace]
members = ["crate1", "crate2", "crate3"]
```

* Trees with multiple workspaces

```
ws1/
crate1/
Cargo.toml
src/
crate2/
Cargo.toml
src/
ws2/
Cargo.toml
src/
crate3/
Cargo.toml
src/
```

The two workspaces here can be configured by placing the following in the
manifests:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to understand that the two workspaces just happen to be in the same "tree" (undefined word) and shouldn't share the common .lock file? If that is the case, this example seems to me to serve more for confusion rather than explanation and I'd rather not include it in the RFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is intended to showcase two workspaces as part of the same development tree, which is what the compiler will have, for example.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is such a situation can arise too often, for example, when I check out two unrelated (but workspace-aware) Cargo packages from GitHub under the common directory, say, ~/dev/. I'd naturally take their independence for granted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what change you'd like as a result of these comments then? The intention is that of course two independent trees can have workspaces that don't mess with one another, and the point of this example is simply to show that it can happen in one repo.


```toml
# ws1/Cargo.toml
[workspace]
members = ["crate1", "crate2"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Note that members aren't necessary ..." ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required here as the newly created manifest does not otherwise depend on the crates.

```

```toml
# ws2/Cargo.toml
[workspace]
Copy link

@nodakai nodakai Apr 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't members be here? Update: again, the "Implicit relations" obviates them.

```

* Trees with non-hierarchical workspaces

```
root/
Cargo.toml
src/
crates/
crate1/
Cargo.toml
src/
crate2/
Cargo.toml
src/
```

The workspace here can be configured by placing the following in the
manifests:

```toml
# root/Cargo.toml
#
# Note that `members` aren't necessary if these are otherwise path
# dependencies.
[workspace]
members = ["../crates/crate1", "../crates/crate2"]
```

```toml
# crates/crate1/Cargo.toml
[package]
workspace = "../root"
```

```toml
# crates/crate2/Cargo.toml
[package]
workspace = "../root"
```

Projects like the compiler will likely need exhaustively explicit configuration.
The `rust` repo conceptually has two workspaces, the standard library and the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood a "workspace" as a unit for sharing the common lock file. Could you elaborate on why the compiler and the stdlib should not share the common lock file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somewhat unrelated to this RFC, unfortunately, bit the gist of it is that we want crates.io deps to be part of the compiler but they do not explicitly depend on the standard library, so they need to be built in two phases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then I'd suggest you to omit the sentence. You are simply saying "a complex project is likely to require a complex hand-written configuration"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another solution for that problem ;)

compiler, and these would need to be manually configured with
`workspace.members` and `package.workspace` keys amongst all crates.

### Lockfile and override interactions

One of the main features of a workspace is that only one `Cargo.lock` is
generated for the entire workspace. This lock file can be affected, however,
with both [`[replace]` overrides][replace] as well as `paths` overrides.

[replace]: https://github.com/rust-lang/cargo/pull/2385

Primarily, the `Cargo.lock` generate will not simply be the concatenation of the
lock files from each project. Instead the entire workspace will be resolved
together all at once, minimizing versions of crates used and sharing
dependencies as much as possible. For example one `path` dependency will always
have the same set of dependencies no matter which crate is being compiled.

When interacting with overrides, workspaces will be modified to only allow
`[replace]` to exist in the workspace root. This Cargo.toml will affect lock
file generation, but no other workspace members will be allowed to have a
`[replace]` directive (with an informative error message being produced).

Finally, the `paths` overrides will be applied as usual, and they'll continue to
be applied relative to whatever crate is being compiled (not the workspace
root). These are intended for much more local testing, so no restriction of
"must be in the root" should be necessary.

Note that this change to the lockfile format is technically incompatible with
older versions of Cargo.lock, but the entire workspaces feature is also
incompatible with older versions of Cargo. This will require projects that wish
to work with workspaces and multiple versions of Cargo to check in multiple
`Cargo.lock` files, but if projects avoid workspaces then Cargo will remain
forwards and backwards compatible.

### Future Extensions

Once Cargo understands a workspace of crates, we could easily extend various
subcommands with a `--all` flag to perform tasks such as:

* Test all crates within a workspace (run all unit tests, doc tests, etc)
* Build all binaries for a set of crates within a workspace
* Publish all crates in a workspace if necessary to crates.io

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this use case, another possible extension is to share a version across all the crates in a workspace. Some projects act as a collection of crates that are published at the same time (e.g., foo and foo-sys) with the same version. It'd be a nice convenience to not have to update the version property in every Cargo.toml.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this could apply to authors, homepage, repository, license, and keywords properties, too, as those are likely to be the same for crates in a single-repo workspace. version just changes more often than the other properties. It would be nice if member crates could inherit these properties from the workspace root.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea! I'll add this in here as well.


This support isn't proposed to be added in this RFC specifically, but simply to
show that workspaces can be used to solve other existing issues in Cargo.

# Drawbacks

* As proposed there is no method to disable implicit actions taken by Cargo.
It's unclear what the use case for this is, but it could in theory arise.

* No crate will implicitly benefit from workspaces after this is implemented.
Existing crates must opt-in with a `[workspace]` key somewhere at least.

# Alternatives

* The `workspace.members` key could support globs to define a number of
directories at once. For example one could imagine:

```toml
[workspace]
members = ["crates/*"]
```

as an ergonomic method of slurping up all sub-folders in the `crates` folder
as crates.

* Cargo could attempt to perform more inference of workspace members by simply
walking the entire directory tree starting at `Cargo.toml`. All children found
could implicitly be members of the workspace. Walking entire trees,
unfortunately, isn't always efficient to do and it would be unfortunate to
have to unconditionally do this.

# Unresolved questions

* Does this approach scale well to repositories with a large number of crates?
For example does the winapi-rs repository experience a slowdown on standard
`cargo build` as a result?