Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support local mirrors of registries #2361

Closed
wants to merge 9 commits into from

Conversation

alexcrichton
Copy link
Member

This series of commits culminates in first class support in Cargo for local mirrors of registries. This is implemented through a number of other more generic mechanisms, and extra support was added along the way. The highlights of this PR, however, are:

Source redirection

New .cargo/config keys have been added to enable replacing one source with another. This functionality is intended to be used for mirrors of the main registry or otherwise one to one source correspondences. The support looks like:

# in .cargo/config
[source.crates-io]
replace-with = 'my-awesome-registry'

[source.my-awesome-registry]
registry = 'https://github.com/my-awesome/registry-index'

This configuration means that instead of using crates-io (e.g. https://github.com/rust-lang/crates.io-index), Cargo will query the my-awesome-registry source instead (configured to a different index here). This alternate source must be the exact same as the crates.io index. Cargo assumes that replacement sources are exact 1:1 mirrors in this respect, and the following support is designed around that assumption.

When generating a lock file for crate using a replacement registry, the original registry will be encoded into the lock file. For example in the configuration above, all lock files will still mention crates.io as the registry that packages originated from. This semantically represents how crates.io is the source of truth for all crates, and this is upheld because all replacements have a 1:1 correspondance.

Overall, this means that no matter what replacement source you're working with, you can ship your lock file to anyone else and you'll all still have verifiably reproducible builds!

Adding sha256 checksums to the lock file

With the above support for custom registries, it's now possible for a project to be downloading crates from any number of sources. One of Cargo's core goals is reproducible builds, and with all these new sources of information it may be easy for a few situations to arise:

  1. A local replacement of crates.io could be corrupt
  2. A local replacement of crates.io could have made subtle changes to crates

In both of these cases, Cargo would today simply give non-reproducible builds. To help assuage these concerns, Cargo will now track the sha256 checksum of all crates from registries in the lock file. Whenever a Cargo.lock is generated from now on it will contain a [metadata] section which lists the sha256 checksum of all crates in the lock file (or <none> if the sha256 checksum isn't known).

Cargo already checks registry checksums against what's actually downloaded, and Cargo will now verify between iterations of the lock file that checksum remain the same as well. This means that if a local replacement registry is not in a 1:1 correspondance with crates.io, the lock file will prevent the build from progressing until the discrepancy is resolved.

Local Registries

In addition to the support above, there is now a new kind of source in Cargo, a "local registry", which is intended to be a subset of the crates.io ecosystem purposed for a local build for any particular project here or there. The way to enable this looks like:

# in .cargo/config
[source.crates-io]
replace-with = 'my-awesome-registry'

[source.my-awesome-registry]
local-registry = 'path/to/my/local/registry'

This local registry is expected to have two components:

  1. A directory called index which matches the same structure as the crates.io index. The config.json file is not required here.
  2. Inside the registry directory are any number of .crate files (downloaded from crates.io). Each crate file has the name <package>-<version>.crate.

This local registry must currently be managed manually, but I plan on publishing and maintaining a Cargo subcommand to manage a local registry. It will have options to do things like:

  1. Sync a local registry with a Cargo.lock
  2. Add a registry package to a local registry
  3. Remove a package from a local registry

What's all this for?

This is quite a bit of new features! What's all this meant to do? Some example scenarios that this is envisioned to solve are:

  1. Supporting mirrors for crates.io in a first class fashion. Once we have the ability to spin up your own local registry, it should be easy to locally select a new mirror.
  2. Supporting round-robin mirrors, this provides an easy vector for configuration of "instead of crates.io hit the first source in this list that works"
  3. Build environments where network access is not an option. Preparing a local registry ahead-of-time (from a known good lock file) will be a vector to ensure that all Rust dependencies are locally available.
    • Note this is intended to include use cases like Debian and Gecko

What's next?

Even with the new goodies here, there's some more vectors through which this can be expanded:

  • Support for running your own mirror of crates.io needs to be implemented to be "easy to do". There should for example be a cargo install foo available to have everything "Just Work".
  • Replacing a source with a list of sources (attempted in round robin fashion) needs to be implemented
  • Eventually this support will be extended to the Cargo.toml file itself. For example:
    • packages should be downloadable from multiple registries
    • replacement sources should be encodable into Cargo.toml (note that these replacements, unlike the ones above, would be encoded into Cargo.lock)
    • adding multiple mirrors to a Cargo.toml should be supported
  • Implementing the subcommand above to manage local registries needs to happen (I will attend to this shortly)

@rust-highfive
Copy link

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

r? @wycats, we had lots of discussion offline about this

also r? @brson, I'm sure you'll also have many opinions here! I'm curious on your take specifically on the debian/Gecko use cases

@rust-highfive rust-highfive assigned wycats and unassigned huonw Feb 6, 2016
@alexcrichton
Copy link
Member Author

additionally, this is targeted at closing #2111

@Kimundi
Copy link
Member

Kimundi commented Feb 7, 2016

Seeing this, I wonder if it would be possible and desirable to extend cargo to support multiple different package registries at the same time, like different package repositories in stand alone package managers.

It would be useful in cases where you have a set of crates that don't belong into the central registry because they are not general purpose libraries, or where you only want to use them for internal testing and development.

* newer Cargo implementations know how to checksum this source, but this
older implementation does not
* the lock file is corrupt
", id, id.source_id())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a short sentence telling the cargo enduser what to do when such an error occurred? (Same for the messages below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a bit of an interesting error case in the sense that I'm not really sure what can be done here. If Cargo never changed then this should never be seen except in the case that something is corrupt or it's an internal error. This may be able to be fixed by upgrading the Cargo in use, but that's also not necessarily guaranteed to work.

Ah and for the messages below it's kinda the same thing as well, I'm not really sure what can be done. The most likely-to-be-seen one, mismatching checksums, will likely be attributed to something outside the user's control, e.g.:

  • The replacement registry legitimately has a different crate
  • The network interfered with the download at some point

In either of these situations there's unfortunately not really a lot that can be done :(

@Manishearth
Copy link
Member

Can the registry be any git repo, or does this only work for github?

but `{}` does not

a lock file compatible with `{}` cannot be generated in this situation
", orig_name, name, supports, no_support, orig_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Five substitutions are hard to read, I'd try to use named params (or however bail!("{orig}", orig=orig_name) is called)

@alexcrichton
Copy link
Member Author

@Kimundi ah yeah Cargo definitely wants to support crates coming from multiple registries! There's a few design questions that range from bike-shed to fundamentals, however, so it likely won't happen as part of this PR for for a bit. That being said, Cargo's backend has always been able to support multiple registries, and the features added here were designed in mind with supporting multiple registries in the future in one Cargo.toml

@Manishearth the index of a registry is required to be a git repository, but beyond that it can be hosted anywhere. I'm not actually sure how one would write a piece of code that actively required a git repository to be on github vs any other location...

@Manishearth
Copy link
Member

@alexcrichton Ah, the .git-less URL made me uneasy (you can clone from a GitHub https url omitting the .git, but without it doesn't look like a generic git repo url)

@bors
Copy link
Contributor

bors commented Feb 11, 2016

☔ The latest upstream changes (presumably #2370) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton alexcrichton force-pushed the redirect-sources branch 4 times, most recently from ff9fe8a to 1fe9619 Compare February 12, 2016 20:55
@alexcrichton
Copy link
Member Author

Note that I've also started prototyping the cargo local-registry subcommand to manage these local registries.

@marcbowes
Copy link
Contributor

Thanks, this is awesome.

When generating a lock file for crate using a replacement registry, the original registry will be encoded into the lock file.

When you build a registry you call empty which inserts crates-io as a base source in a non-configurable way. Does this mean you always want to override crates-io if you only want to use a local registry?

The problem with this is that some tools, e.g. racer, use Cargo.lock as a mechanism to find dependency source code (https://github.com/phildawes/racer/blob/master/src/racer/cargo.rs). This would mean that if I have some package that is available in my local registry but not in crates.io, a cargo build would succeed but a racer-find-definition would not work because the Cargo.lock file would point to crates.io which would not have the crate in the local registry. Or it might have a similarly named and versioned crate, with no guarantee it is actually the same thing (e.g. my organisation publishes 'log' to our local registry which then masquerades as the crates.io 'log' to racer).

(I'm not sure if this is possible, but this also might leak private information, e.g. if the package name itself is something that should not be public.)

Hopefully that wasn't super confusing. The TL;DR, I think, is that I'm not sure if the crates-io-by-default-non-configurable decision interacts well with the replace-with feature.

@alexcrichton
Copy link
Member Author

This would mean that if I have some package that is available in my local registry but not in crates.io

This sounds quite bad! In that case this is the wrong use case of replace-with because the replacement source isn't a replacement for crates.io (it has content crates.io doesn't have). This is what the checksum in the lock file is also intended for. If a different crate from crates.io is downloaded then Cargo will print an error indicating that the code wasn't the same both before and afterwards.

It is intended, however, that overriding crates.io works via overriding the crates-io source

@bors
Copy link
Contributor

bors commented Feb 16, 2016

☔ The latest upstream changes (presumably #2328) made this pull request unmergeable. Please resolve the merge conflicts.

@marcbowes
Copy link
Contributor

this is the wrong use case of replace-with because the replacement source isn't a replacement for crates.io

Agreed. So, how do we use the new registry type without replacing crates.io?

@alexcrichton
Copy link
Member Author

@marcbowes

So, how do we use the new registry type without replacing crates.io?

Currently that's the only vector through which this can be used, but it's planned in the future to list independent registries in Cargo.toml where you may be able to leverage this support.

@brson
Copy link
Contributor

brson commented Feb 16, 2016

It seems like we should get feedback from packagers like @anguslees, @lucab, @sylvestre @fabiand @jauhien. This is the feature that makes it possible to build from local source instead of crates.io.

This alternate source must be the exact same as the crates.io index

What does this mean specifically? Surely I don't have to clone every package on crates.io in order to create a local registry. I think you mean that if a crate version is mentioned in the crate dag that it must exist in the alternate registry and have the same hash.

From the discussion it sounds like it is a requirement that all packages in any registry must be registered with crates.io. Why must that be? It's an important use case for e.g. companies to be able to host there own private registries. Does this design leave that possibility open for the future?

If this mechanism requires a .cargo/config how will Gecko use it? Should they be checking in this config file? What is the overlap between this solution and the hypothetical solution for Gecko's problems?

This hard codes the name crates.io as the registry, but there may plausibly be corporate installations where crates.io is not allowed at all. By some of the renamings here, this patch seems to be doubling down on crates.io being the one central registry, when it seems like we should be going the other direction to support broader use cases.

A directory called index which matches the same structure as the crates.io index. The config.json file is not required here.

Does this mean that when you create a local registry you must artificially subdivide the directories into two letter prefixes like the official registry? This is a hack to avoid big directories, but it isn't a concern for local registries (it it was the crates wouldn't themselves be stored in a single directory).

Local registries containing crate files appears undesirable to me for either the distro or Gecko use case. Distros I would imagine want to store the source in their format, not in ours; so they either need to stuff our tarball packages inside their tarball packages, or reproduce the .crate files when building the local registry from packages.

Asking shas of local crates to be identical to crates.io precludes distros deploying security fixes (or any patches at all) on their own. I imagine they will do it anyway, updating all the lockfiles, and ignoring the requirement that local registries contain the same code as on crates.io.

@marcbowes
Copy link
Contributor

@brson, +1 to pretty much everything you said (for my use case - corporate build system).

@alexcrichton
Copy link
Member Author

Lots to digest, thanks for the review @brson! I'll try to answer your questions
here:

This alternate source must be the exact same as the crates.io index

What does this mean specifically? [..] I think you mean that if a crate
version is mentioned in the crate dag that it must exist in the alternate
registry and have the same hash.

Ah yes what I basically mean by this is that an alternate source must be a
subset of the crates.io index. The actual crates must be exactly the same (same
hash), but you don't have to clone the entire index of course.

From the discussion it sounds like it is a requirement that all packages in
any registry must be registered with crates.io. Why must that be?

The key part of this is that it's replacing the crates.io source, and a
replacement is expected to be the exact same as the original source. This is why
in the Cargo.lock we encode the original source and can get away with it
(intentionally so).

It's an important use case for e.g. companies to be able to host there own
private registries. Does this design leave that possibility open for the
future?

It does indeed! The plan is to support defining registries in Cargo.toml
itself. These registries would then be encoded into the lockfile, and they could
be replaced as necessary if need be via the same mechanisms as crates.io.

If this mechanism requires a .cargo/config how will Gecko use it? Should
they be checking in this config file? What is the overlap between this
solution and the hypothetical solution for Gecko's problems?

Yes they'd either check in .cargo/config or generate it at some point. This PR
is intended to be the solution to Gecko's needs, so the overlap is 100%!

This hard codes the name crates.io as the registry, but there may plausibly
be corporate installations where crates.io is not allowed at all.

It does? Cargo is the same as today with respect to the default registry, and
otherwise this just gives a shorter name to
https://github.com/rust-lang/crates.io-index to replace the source with
another.

It is always intended for Cargo to support multiple registries in the future,
but it will always have some default which will be crates.io.

Could you clarify where you're thinking this is becoming less amenable to
other registries in the future?

A directory called index which matches the same structure as the crates.io
index. The config.json file is not required here.

Does this mean that when you create a local registry you must artificially
subdivide the directories into two letter prefixes like the official registry?

Yes, the same format is expected. It's just easier to have one format instead of
two. This is why I'm working on a local tool to manage local registries
so users don't have to think about this.

Local registries containing crate files appears undesirable to me for either
the distro or Gecko use case. Distros I would imagine want to store the
source in their format, not in ours; so they either need to stuff our tarball
packages inside their tarball packages, or reproduce the .crate files when
building the local registry from packages.

Perhaps, but after brainstorming with @wycats this seems to strike the best
balance between "offline use" and reproducibility. I don't think Debian is
going to use lock files anyway (per our previous discussions), so this point is
likely moot. We can always add a "directory source" which is just a bunch of
unpacked tarballs (this PR used to have that) if it became necessary.

The major reason this uses .crate files instead of unpacked installations is
that we're able to get a checksum. This checksum allows us to help guarantee
reproducible builds and guard against unintended changing of source code. The
source can always intentionally be changed, of course, but it needs to be
expressed in Cargo.toml, not via replacement sources.

Asking shas of local crates to be identical to crates.io precludes distros
deploying security fixes (or any patches at all) on their own. I imagine they
will do it anyway, updating all the lockfiles, and ignoring the requirement
that local registries contain the same code as on crates.io.

Yes, and this is intentional. Any security updates may indeed cause breakage,
and this needs to be acknowledged in Cargo.toml somehow. But as before, I
doubt distros will use lock files anyway because of how they have indicated they
wish to package libraries (they're already buying into "the versions we select
may break"). In this case, checksum mismatches don't matter.

Should help easily mapping packages from one source to another
This commit implements a scheme for .cargo/config files where sources can be
redirected to other sources. The purpose of this will be to override crates.io
for a few use cases:

  * Replace it with a mirror site that is sync'd to crates.io
  * Replace it with a "directory source" or some other local source

This major feature of this redirection, however, is that none of it is encoded
into the lock file. If one source is redirected to another then it is assumed
that packages from both are exactly the same (e.g. `foo v0.0.1` is the same in
both location). The lock file simply encodes the canonical soure (e.g.
crates.io) rather than the replacement source. In the end this means that
Cargo.lock files can be generated from any replacement source and shipped to
other locations without the lockfile oscillating about where packages came from.

Eventually this support will be extended to `Cargo.toml` itself (which will be
encoded into the lock file), but that support is not implemented today. The
syntax for what was implemented today looks like:

    # .cargo/config
    [source.my-awesome-registry]
    registry = 'https://example.com/path/to/index'

    [source.crates-io]
    replace-with = 'my-awesome-registry'

Each source will have a canonical name and will be configured with the various
keys underneath it (today just 'registry' and 'directory' will be accepted). The
global `crates-io` source represents crates from the standard registry, and this
can be replaced with other mirror sources.

All tests have been modified to use this new infrastructure instead of the old
`registry.index` configuration. This configuration is now also deprecated and
will emit an unconditional warning about how it will no longer be used in the
future.

Finally, all subcommands now use this "source map" except for `cargo publish`,
which will always publish to the default registry (in this case crates.io).
This commit changes how lock files are encoded by checksums for each package in
the lockfile to the `[metadata]` section. The previous commit implemented the
ability to redirect sources, but the core assumption there was that a package
coming from two different locations was always the same. An inevitable case,
however, is that a source gets corrupted or, worse, ships a modified version of
a crate to introduce instability between two "mirrors".

The purpose of adding checksums will be to resolve this discrepancy. Each crate
coming from crates.io will now record its sha256 checksum in the lock file. When
a lock file already exists, the new checksum for a crate will be checked against
it, and if they differ compilation will be aborted. Currently only registry
crates will have sha256 checksums listed, all other sources do not have
checksums at this time.

The astute may notice that if the lock file format is changing, then a lock file
generated by a newer Cargo might be mangled by an older Cargo. In anticipation
of this, however, all Cargo versions published support a `[metadata]` section of
the lock file which is transparently carried forward if encountered. This means
that older Cargos compiling with a newer lock file will not verify checksums in
the lock file, but they will carry forward the checksum information and prevent
it from being removed.

There are, however, a few situations where problems may still arise:

1. If an older Cargo takes a newer lockfile (with checksums) and updates it with
   a modified `Cargo.toml` (e.g. a package was added, removed, or updated), then
   the `[metadata]` section will not be updated appropriately. This modification
   would require a newer Cargo to come in and update the checksums for such a
   modification.

2. Today Cargo can only calculate checksums for registry sources, but we may
   eventually want to support other sources like git (or just straight-up path
   sources). If future Cargo implements support for this sort of checksum, then
   it's the same problem as above where older Cargos will not know how to keep
   the checksum in sync
Add an abstraction over which the index can be updated and downloads can be
made. This is currently implemented for "remote" registries (e.g. crates.io),
but soon there will be one for "local" registries as well.
This flavor of registry is intended to behave very similarly to the standard
remote registry, except everything is contained locally on the filesystem
instead. There are a few components to this new flavor of registry:

1. The registry itself is rooted at a particular directory, owning all structure
   beneath it.
2. There is an `index` folder with the same structure as the crates.io index
   describing the local registry (e.g. contents, versions, checksums, etc).
3. Inside the root will also be a list of `.crate` files which correspond to
   those described in the index. All crates must be of the form
   `name-version.crate` and be the same `.crate` files from crates.io itself.

This support can currently be used via the previous implementation of source
overrides with the new type:

```toml
[source.crates-io]
replace-with = 'my-awesome-registry'

[source.my-awesome-registry]
local-registry = 'path/to/registry'
```

I will soon follow up with a tool which can be used to manage these local
registries externally.
@DanielKeep
Copy link

Currently, I maintain a local mirror of the index because my internet is slow, metered, and drops out. It's enabled at the top level for everything in my Rust code folder, but I have to then switch back to the "real" crates.io for binary packages like cargo-script because of the checked-in lock files.

This sounds like it fits my case nicely; I can mirror the index locally and have Cargo pretend it's really crates.io. So I, at least, would totally use it, though I understand I'm not exactly your intended audience for this...

@bors
Copy link
Contributor

bors commented Mar 17, 2016

☔ The latest upstream changes (presumably #2484) made this pull request unmergeable. Please resolve the merge conflicts.

@pwrdwnsys
Copy link

@alexcrichton In terms of use case, I work in a development environment which is behind a data diode which allows files from the Internet-connected network to be pushed one-way into the isolated network. We already have Windows updates, FreeBSD patches, and anti-virus signatures coming in using this mechanism. For our Rust projects, we have been taking the libraries we really need and cloning the Github repos, changing the library names to include an internal prefix, and tar-ing up the directories, and pushing it through the diode. On the high-side of the network the repos are unpacked to a common mount point and we modify our Cargo.toml dependencies to point to them. This hack doesn't scale well with lots of libraries/dependencies, and also means we are likely out of step with the official crates.io releases. Being able to simply push a mirror/clone of the needed subset of crates through the diode into the development network would be superb.

@alexcrichton
Copy link
Member Author

Closing due to inactivity, I'll resubmit in the future with a rebase if we see some more activity on this front.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jul 12, 2016

Ok, ok. So registries can be used for different things, we do need to be clear about that in Cargo.lock, but let's first nail down what those things are: mirroring and overriding.

Mirroring of course means that each mirror contains a subset of whatever it is mirroring. Mirror chains of any length should be allowed, because not doing so is just more work. As @acrichto points out, hashes in lockfiles are not enough to guarantee mirrors are actual mirrors when those mirrors cannot be reached. But they do allow post-hoc verification when connectivity is restored---one can check that the hash gotten from mirrors do indeed match the original. This is crucial.

Overriding is a little more subtle. At first glance, one picks and existing registry and overrides it with another---there is is no subset restriction and anything that isn't resolved by the override goes back to the original. Naturally, overides can also be chained arbitrarily. A less obvious point is one isn't overriding repositories, but mirror chains: an override chain is a chain of mirror chains.

A simple generation of pkg = { version = "ver" } is pkg = { version = "ver", registry = ... }. General overriding would allow overriding packages even when a registry is specified, but perhaps we only need overriding when a registry isn't specified. crates.io only allowing packages that didn't hardcode registries would assist with this more rigid being practical.

In summery let me do a new config.toml mockup. In line with http://doc.crates.io/config.html#hierarchical-structure, assume /office/project/.cargo/config, /office/.cargo/config, /.cargo/config, and a third imaginary file representing Cargo's defaults.

# the default config, lowest priority

# when no registry is specified in dep
default-registry = [ "crates-io" ]

[registry.crates-io]
# url, path, or git  just like dependency  
url = "http:s//crates.io/path/to/something/or/another/i/dont/know"
# in /.cargo/config

# don't use crates.io implicitly
default-registry = [ "ibm" ]

# override crates.io with company packages  + crates.io forks
[registry.ibm]
url = "https://internal.ibm.com/something"
# in /office/.cargo/config

[registry.ibm-au]
url = "https://au.internal.ibm.com/something"
provides = "ibm" # mirror, must be subset of ibm

[registry.ibm-au-sidney]
url = "https://sidney.au.internal.ibm.com/something"
# need only be subset of ibm, not ibm-au too
# ibm-au is chosen for shape of mirror chain like cache levels.
provides = "ibm.au" 
# in /office/project/.cargo/config

# override company registry with local forks 
registry-default = [ 
    "project",
    "ibm",
]

# override company forks with local forks and also use crates.io
[registry.project]
local = "/secret/sauce/goes/here"

I bring this up because for my explicit stdlib deps RFC (rust-lang/rfcs#1133) we need package without a specified registry to fall back on a "compiler-specific" registry for packages not on crates.io (more broadly, lowest priority in the default-registry chain).

The default would become (in pseduo-toml) something like:

default-registry = [ "crates-io", "compiler" ]

[registry.crates-io]
# url, path, or git  just like dependency  
url = "http:s//crates.io/path/to/something/or/another/i/dont/know"

[registry.compiler]
path = "${$CARGO_RUSTC --print sysroot}/src/rust"

@Ericson2314
Copy link
Contributor

Additionally, I find funny all the mentioning of adding registries to Cargo.toml --- we basically already have them with [[replace]]. Specifically, [[replace]] (among other things) adds an anonymous registry in front of crates.io with the odd restriction that any package it includes must already exist in crates.io (and yes it also overrides location deps). I understand that for location deps, cargo must fetch the dep to see whether it would be overridden, but for version deps this is not necessary.

I was talking to @wycats as to whether it should be allowed for a location dep and crates.io/registry dep to both resolve to the same version, and I think we both agreed "probably not". The idea is if cargo will "merge" semver-compatible version deps so force a single crate to satisfy them (e.g. by merging required features), it should also merge semver-compatible version deps into location deps. By the same reasoning it should complain if two different location deps refer to packages with the same name version.

With that change, we can actually account for all of [[replace]] in terms of a registry overriding. First, all location dependencies are gathered into a single imaginary registry. Second, all location deps are replaced with version using their resolved version. Third, the default registry override chain becomes [ "replacements", "location-deps", everything-else... ]. This plan might give us registries and make cargo net less complex, a pretty good deal :).

@Ericson2314
Copy link
Contributor

It's feels somewhat redundant to specify workspace members (by their locations), and use path dependencies in each crate---one ends up writing the same paths many times. If the workspace members were also turned into a registry (say with priority below location deps but above everything else, given the above plan), then crates in the workspace could refer to each other by version. The idea is after each release, one would bump the number to signify that they were working on the next version.

This would make workspaces way more ergonomic not only because there is no path duplication, but also because each project Cargo.toml is already in the right form for uploading to crates.io---it has no path dependencies and refers to the others with their to-be-uploaded versions.

@alexcrichton
Copy link
Member Author

@Ericson2314 sorry but I don't think I'm 100% following your comments. I think it's rooted in how we're looking at the intended purpose of each of these pieces differently. I think it may help to first briefly recap what each piece is and what we think it's going to be used for:

  • A registry is currently intended to be very much like crates.io. A source of packages with an index that is efficient to perform incremental updates and full checkouts of along with efficient searching of the entire packages space for fast resolution. Additionally the source can ship *.crate files which have associated integrity checks which match those found in the index.
  • A mirror isn't really a first-class concept in Cargo just yet, but the intention is that mirrors are one to one with what the distribute from some upstream registry. That is, they are not allowed to either add new packages or change the contents of existing packages. It is only thought that mirrors will be used in situations where crates.io goes down, for example.
  • The [replace] section is intended for overriding dependencies during development, not a long-lived and permanent source. This method of replacement work robustly with alterations in the dependency graph along with communicating to others that the replacement is required to compile a crate.
  • The distinction between Cargo.toml and .cargo as to where configuration goes is quite important. Configuration in a manifest (Cargo.toml) is that which is required to compile a crate, and configuration in .cargo is intended to be purely optional configuration. For example [replace], reflected in the lock file, is in Cargo.toml because it is required to compile the crate. On the other hand configuration like mirrors are not in the manifest because they are not required to compile the crate (a mirror isn't the source of truth). Put another way, you commit Cargo.toml to a repo but it's an antipattern to commit .cargo to a repo.

Ok, with that in mind, I'd like to respond to a few of your points:

As @acrichto points out, hashes in lockfiles are not enough to guarantee mirrors are actual mirrors when those mirrors cannot be reached.

I think I may have miscommunicated here by accident. A checksum in the lock file should be enough to guarantee a reproducible build of a crate. If you start from crates.io (the source of truth) and then record a checksum, whenever you get a crate from a mirror you can be sure that you have the exact same source by ensuring the checksums match. So in that sense hashes in lockfiles should be enough to guarantee mirrors are actual mirrors.

The situation you might be referring to, however, is that if you first form a dependency edge and you pull the dependency in from a mirror. In that case, yes, you don't actually check in with the source of truth to ensure the checksum is the same. The discrepancy will arise quickly, however, when you ship that Cargo.lock to someone else who doesn't have that mirror configuration.

Overriding is a little more subtle. At first glance, one picks and existing registry and overrides it with another

I'm not sure I fully understand this tangent because [replace] in Cargo.toml and source replacing are intended to be two orthogonal features. The [replace] section does not wholesale replace an entire source, just one selective node for overriding.

# don't use crates.io implicitly
default-registry = [ "ibm" ]

In many of these examples, this goes against the distinction between Cargo.toml and .cargo/config above where this configuration is required to compile a crate most likely. We definitely want to support depending on crates from custom registries, but this will be done in Cargo.toml, not in .cargo/config. It's a planned feature, but is relatively orthogonal to support proposed here.

we need package without a specified registry to fall back on a "compiler-specific" registry for packages not on crates.io

I think that trying to fit registries into this hole may be the wrong abstraction, instead we can always just create another type of source which Cargo reads crates from.


I don't really follow how you ended up at the conclusion along the lines of "everything should be a registry" (if I'm interpreting that correctly), but it seems like it may not be entirely related to this PR or this feature, so perhaps discussion about that can happen in separate threads?

@alexcrichton
Copy link
Member Author

Also, for those interested, I've rebased this PR and implemented one extra feature, located at #2857

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jul 12, 2016

@alexcrichton The .cargo/config vs Cargo.toml difference makes sense, and I agree mirroring is a more obvious fit. My initial concern was that with something like the "compiler registry" that I propose in the stdlib deps RFC is really an implantation detail, but packages wouldn't need to refer to that registry by name anyways. In fact, even if they did, perhaps a middle ground would be for packages to e.g. use { registry = "ibm", .. } deps while leaving .cargo/config to define where the ibm registry is found, analogous to how crates don't hard-code the url for crates.io.

A better concern would be distros that need to patch some crates.io---it would be nice to have them not need to lie that they are just mirroring crates.io rather than modifying it. But I don't find this too pressing.

The situation you might be referring to, however, is that if you first form a dependency edge and you pull the dependency in from a mirror. In that case, yes, you don't actually check in with the source of truth to ensure the checksum is the same. The discrepancy will arise quickly, however, when you ship that Cargo.lock to someone else who doesn't have that mirror configuration.

Exactly what I was referring to.

I think that trying to fit registries into this hole may be the wrong abstraction, instead we can always just create another type of source which Cargo reads crates from.

I forgot about the Source trait. I guess re-interpret all that as there should be a single fallback-chain of registries in the sense of http://doc.crates.io/cargo/core/registry/trait.Registry.html from which packages are resolved from. AFIAK, neither [[replace]] nor member = [..] creates such a registry

@Ericson2314
Copy link
Contributor

Some of this probably could be moved to a new thread, I'll take a look at the new version, and stop commenting here.

@alexcrichton
Copy link
Member Author

perhaps a middle ground would be for packages to e.g. use { registry = "ibm", .. } deps while leaving .cargo/config to define where the ibm registry is found, analogous to how crates don't hard-code

This I currently consider a non-starter because it makes .cargo/config required, and due to the motivation above where .cargo isn't checked in that doesn't jive with the current priorities. The default registry being crates.io basically isn't ever going to be changed without manual configuration, although we plan to add the ability to add knowledge about other registries in Cargo.toml and then new packages can be pulled in.

A better concern would be distros that need to patch some crates.io---it would be nice to have them not need to lie that they are just mirroring crates.io rather than modifying it.

Note that all the support here is intended to prevent exactly this situation. If a distro modifies a crate then it changes the checksum, so Cargo will refuse to replace a crates.io crate with a distro crate (using mirroring support) because the lock file will disagree. If instead the Cargo.toml explicitly indicates that it's pulling a crate from the distro rather than crates.io, however, then that's perfectly fine.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jul 12, 2016

Hmm, my thought for both of those was that the developer was leaving something unspecified as a parameter. But if they want to do that, users could provide the arguments with a workspace Cargo.toml that just contains the one package. Contrary to my initial concern with leveraging Cargo.toml, this works for both libraries and binaries alike. No more interest in abusing .cargo/config from me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.