-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support local mirrors of registries #2361
Conversation
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
additionally, this is targeted at closing #2111 |
Seeing this, I wonder if it would be possible and desirable to extend cargo to support multiple different package registries at the same time, like different package repositories in stand alone package managers. It would be useful in cases where you have a set of crates that don't belong into the central registry because they are not general purpose libraries, or where you only want to use them for internal testing and development. |
* newer Cargo implementations know how to checksum this source, but this | ||
older implementation does not | ||
* the lock file is corrupt | ||
", id, id.source_id()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a short sentence telling the cargo enduser what to do when such an error occurred? (Same for the messages below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a bit of an interesting error case in the sense that I'm not really sure what can be done here. If Cargo never changed then this should never be seen except in the case that something is corrupt or it's an internal error. This may be able to be fixed by upgrading the Cargo in use, but that's also not necessarily guaranteed to work.
Ah and for the messages below it's kinda the same thing as well, I'm not really sure what can be done. The most likely-to-be-seen one, mismatching checksums, will likely be attributed to something outside the user's control, e.g.:
- The replacement registry legitimately has a different crate
- The network interfered with the download at some point
In either of these situations there's unfortunately not really a lot that can be done :(
Can the registry be any git repo, or does this only work for github? |
but `{}` does not | ||
|
||
a lock file compatible with `{}` cannot be generated in this situation | ||
", orig_name, name, supports, no_support, orig_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Five substitutions are hard to read, I'd try to use named params (or however bail!("{orig}", orig=orig_name)
is called)
0f18cf6
to
bacd985
Compare
@Kimundi ah yeah Cargo definitely wants to support crates coming from multiple registries! There's a few design questions that range from bike-shed to fundamentals, however, so it likely won't happen as part of this PR for for a bit. That being said, Cargo's backend has always been able to support multiple registries, and the features added here were designed in mind with supporting multiple registries in the future in one Cargo.toml @Manishearth the index of a registry is required to be a git repository, but beyond that it can be hosted anywhere. I'm not actually sure how one would write a piece of code that actively required a git repository to be on github vs any other location... |
@alexcrichton Ah, the |
☔ The latest upstream changes (presumably #2370) made this pull request unmergeable. Please resolve the merge conflicts. |
ff9fe8a
to
1fe9619
Compare
Note that I've also started prototyping the |
Thanks, this is awesome.
When you build a registry you call The problem with this is that some tools, e.g. racer, use Cargo.lock as a mechanism to find dependency source code (https://github.com/phildawes/racer/blob/master/src/racer/cargo.rs). This would mean that if I have some package that is available in my local registry but not in crates.io, a (I'm not sure if this is possible, but this also might leak private information, e.g. if the package name itself is something that should not be public.) Hopefully that wasn't super confusing. The TL;DR, I think, is that I'm not sure if the crates-io-by-default-non-configurable decision interacts well with the replace-with feature. |
This sounds quite bad! In that case this is the wrong use case of It is intended, however, that overriding crates.io works via overriding the |
☔ The latest upstream changes (presumably #2328) made this pull request unmergeable. Please resolve the merge conflicts. |
Agreed. So, how do we use the new registry type without replacing crates.io? |
Currently that's the only vector through which this can be used, but it's planned in the future to list independent registries in Cargo.toml where you may be able to leverage this support. |
It seems like we should get feedback from packagers like @anguslees, @lucab, @sylvestre @fabiand @jauhien. This is the feature that makes it possible to build from local source instead of crates.io.
What does this mean specifically? Surely I don't have to clone every package on crates.io in order to create a local registry. I think you mean that if a crate version is mentioned in the crate dag that it must exist in the alternate registry and have the same hash. From the discussion it sounds like it is a requirement that all packages in any registry must be registered with crates.io. Why must that be? It's an important use case for e.g. companies to be able to host there own private registries. Does this design leave that possibility open for the future? If this mechanism requires a This hard codes the name crates.io as the registry, but there may plausibly be corporate installations where crates.io is not allowed at all. By some of the renamings here, this patch seems to be doubling down on crates.io being the one central registry, when it seems like we should be going the other direction to support broader use cases.
Does this mean that when you create a local registry you must artificially subdivide the directories into two letter prefixes like the official registry? This is a hack to avoid big directories, but it isn't a concern for local registries (it it was the crates wouldn't themselves be stored in a single directory). Local registries containing crate files appears undesirable to me for either the distro or Gecko use case. Distros I would imagine want to store the source in their format, not in ours; so they either need to stuff our tarball packages inside their tarball packages, or reproduce the Asking shas of local crates to be identical to crates.io precludes distros deploying security fixes (or any patches at all) on their own. I imagine they will do it anyway, updating all the lockfiles, and ignoring the requirement that local registries contain the same code as on crates.io. |
@brson, +1 to pretty much everything you said (for my use case - corporate build system). |
1fe9619
to
26cc1d5
Compare
Lots to digest, thanks for the review @brson! I'll try to answer your questions
Ah yes what I basically mean by this is that an alternate source must be a
The key part of this is that it's replacing the crates.io source, and a
It does indeed! The plan is to support defining registries in
Yes they'd either check in
It does? Cargo is the same as today with respect to the default registry, and It is always intended for Cargo to support multiple registries in the future, Could you clarify where you're thinking this is becoming less amenable to
Yes, the same format is expected. It's just easier to have one format instead of
Perhaps, but after brainstorming with @wycats this seems to strike the best The major reason this uses
Yes, and this is intentional. Any security updates may indeed cause breakage, |
26cc1d5
to
f298f8b
Compare
Should help easily mapping packages from one source to another
This commit implements a scheme for .cargo/config files where sources can be redirected to other sources. The purpose of this will be to override crates.io for a few use cases: * Replace it with a mirror site that is sync'd to crates.io * Replace it with a "directory source" or some other local source This major feature of this redirection, however, is that none of it is encoded into the lock file. If one source is redirected to another then it is assumed that packages from both are exactly the same (e.g. `foo v0.0.1` is the same in both location). The lock file simply encodes the canonical soure (e.g. crates.io) rather than the replacement source. In the end this means that Cargo.lock files can be generated from any replacement source and shipped to other locations without the lockfile oscillating about where packages came from. Eventually this support will be extended to `Cargo.toml` itself (which will be encoded into the lock file), but that support is not implemented today. The syntax for what was implemented today looks like: # .cargo/config [source.my-awesome-registry] registry = 'https://example.com/path/to/index' [source.crates-io] replace-with = 'my-awesome-registry' Each source will have a canonical name and will be configured with the various keys underneath it (today just 'registry' and 'directory' will be accepted). The global `crates-io` source represents crates from the standard registry, and this can be replaced with other mirror sources. All tests have been modified to use this new infrastructure instead of the old `registry.index` configuration. This configuration is now also deprecated and will emit an unconditional warning about how it will no longer be used in the future. Finally, all subcommands now use this "source map" except for `cargo publish`, which will always publish to the default registry (in this case crates.io).
This commit changes how lock files are encoded by checksums for each package in the lockfile to the `[metadata]` section. The previous commit implemented the ability to redirect sources, but the core assumption there was that a package coming from two different locations was always the same. An inevitable case, however, is that a source gets corrupted or, worse, ships a modified version of a crate to introduce instability between two "mirrors". The purpose of adding checksums will be to resolve this discrepancy. Each crate coming from crates.io will now record its sha256 checksum in the lock file. When a lock file already exists, the new checksum for a crate will be checked against it, and if they differ compilation will be aborted. Currently only registry crates will have sha256 checksums listed, all other sources do not have checksums at this time. The astute may notice that if the lock file format is changing, then a lock file generated by a newer Cargo might be mangled by an older Cargo. In anticipation of this, however, all Cargo versions published support a `[metadata]` section of the lock file which is transparently carried forward if encountered. This means that older Cargos compiling with a newer lock file will not verify checksums in the lock file, but they will carry forward the checksum information and prevent it from being removed. There are, however, a few situations where problems may still arise: 1. If an older Cargo takes a newer lockfile (with checksums) and updates it with a modified `Cargo.toml` (e.g. a package was added, removed, or updated), then the `[metadata]` section will not be updated appropriately. This modification would require a newer Cargo to come in and update the checksums for such a modification. 2. Today Cargo can only calculate checksums for registry sources, but we may eventually want to support other sources like git (or just straight-up path sources). If future Cargo implements support for this sort of checksum, then it's the same problem as above where older Cargos will not know how to keep the checksum in sync
Add an abstraction over which the index can be updated and downloads can be made. This is currently implemented for "remote" registries (e.g. crates.io), but soon there will be one for "local" registries as well.
This flavor of registry is intended to behave very similarly to the standard remote registry, except everything is contained locally on the filesystem instead. There are a few components to this new flavor of registry: 1. The registry itself is rooted at a particular directory, owning all structure beneath it. 2. There is an `index` folder with the same structure as the crates.io index describing the local registry (e.g. contents, versions, checksums, etc). 3. Inside the root will also be a list of `.crate` files which correspond to those described in the index. All crates must be of the form `name-version.crate` and be the same `.crate` files from crates.io itself. This support can currently be used via the previous implementation of source overrides with the new type: ```toml [source.crates-io] replace-with = 'my-awesome-registry' [source.my-awesome-registry] local-registry = 'path/to/registry' ``` I will soon follow up with a tool which can be used to manage these local registries externally.
79fc557
to
515aa46
Compare
Currently, I maintain a local mirror of the index because my internet is slow, metered, and drops out. It's enabled at the top level for everything in my Rust code folder, but I have to then switch back to the "real" This sounds like it fits my case nicely; I can mirror the index locally and have Cargo pretend it's really crates.io. So I, at least, would totally use it, though I understand I'm not exactly your intended audience for this... |
☔ The latest upstream changes (presumably #2484) made this pull request unmergeable. Please resolve the merge conflicts. |
@alexcrichton In terms of use case, I work in a development environment which is behind a data diode which allows files from the Internet-connected network to be pushed one-way into the isolated network. We already have Windows updates, FreeBSD patches, and anti-virus signatures coming in using this mechanism. For our Rust projects, we have been taking the libraries we really need and cloning the Github repos, changing the library names to include an internal prefix, and tar-ing up the directories, and pushing it through the diode. On the high-side of the network the repos are unpacked to a common mount point and we modify our Cargo.toml dependencies to point to them. This hack doesn't scale well with lots of libraries/dependencies, and also means we are likely out of step with the official crates.io releases. Being able to simply push a mirror/clone of the needed subset of crates through the diode into the development network would be superb. |
Closing due to inactivity, I'll resubmit in the future with a rebase if we see some more activity on this front. |
Ok, ok. So registries can be used for different things, we do need to be clear about that in Mirroring of course means that each mirror contains a subset of whatever it is mirroring. Mirror chains of any length should be allowed, because not doing so is just more work. As @acrichto points out, hashes in lockfiles are not enough to guarantee mirrors are actual mirrors when those mirrors cannot be reached. But they do allow post-hoc verification when connectivity is restored---one can check that the hash gotten from mirrors do indeed match the original. This is crucial. Overriding is a little more subtle. At first glance, one picks and existing registry and overrides it with another---there is is no subset restriction and anything that isn't resolved by the override goes back to the original. Naturally, overides can also be chained arbitrarily. A less obvious point is one isn't overriding repositories, but mirror chains: an override chain is a chain of mirror chains. A simple generation of In summery let me do a new # the default config, lowest priority
# when no registry is specified in dep
default-registry = [ "crates-io" ]
[registry.crates-io]
# url, path, or git just like dependency
url = "http:s//crates.io/path/to/something/or/another/i/dont/know" # in /.cargo/config
# don't use crates.io implicitly
default-registry = [ "ibm" ]
# override crates.io with company packages + crates.io forks
[registry.ibm]
url = "https://internal.ibm.com/something" # in /office/.cargo/config
[registry.ibm-au]
url = "https://au.internal.ibm.com/something"
provides = "ibm" # mirror, must be subset of ibm
[registry.ibm-au-sidney]
url = "https://sidney.au.internal.ibm.com/something"
# need only be subset of ibm, not ibm-au too
# ibm-au is chosen for shape of mirror chain like cache levels.
provides = "ibm.au" # in /office/project/.cargo/config
# override company registry with local forks
registry-default = [
"project",
"ibm",
]
# override company forks with local forks and also use crates.io
[registry.project]
local = "/secret/sauce/goes/here" I bring this up because for my explicit stdlib deps RFC (rust-lang/rfcs#1133) we need package without a specified registry to fall back on a "compiler-specific" registry for packages not on crates.io (more broadly, lowest priority in the default-registry chain). The default would become (in pseduo-toml) something like: default-registry = [ "crates-io", "compiler" ]
[registry.crates-io]
# url, path, or git just like dependency
url = "http:s//crates.io/path/to/something/or/another/i/dont/know"
[registry.compiler]
path = "${$CARGO_RUSTC --print sysroot}/src/rust" |
Additionally, I find funny all the mentioning of adding registries to I was talking to @wycats as to whether it should be allowed for a location dep and crates.io/registry dep to both resolve to the same version, and I think we both agreed "probably not". The idea is if cargo will "merge" semver-compatible version deps so force a single crate to satisfy them (e.g. by merging required features), it should also merge semver-compatible version deps into location deps. By the same reasoning it should complain if two different location deps refer to packages with the same name version. With that change, we can actually account for all of |
It's feels somewhat redundant to specify workspace members (by their locations), and use path dependencies in each crate---one ends up writing the same paths many times. If the workspace members were also turned into a registry (say with priority below location deps but above everything else, given the above plan), then crates in the workspace could refer to each other by version. The idea is after each release, one would bump the number to signify that they were working on the next version. This would make workspaces way more ergonomic not only because there is no path duplication, but also because each project Cargo.toml is already in the right form for uploading to crates.io---it has no path dependencies and refers to the others with their to-be-uploaded versions. |
@Ericson2314 sorry but I don't think I'm 100% following your comments. I think it's rooted in how we're looking at the intended purpose of each of these pieces differently. I think it may help to first briefly recap what each piece is and what we think it's going to be used for:
Ok, with that in mind, I'd like to respond to a few of your points:
I think I may have miscommunicated here by accident. A checksum in the lock file should be enough to guarantee a reproducible build of a crate. If you start from crates.io (the source of truth) and then record a checksum, whenever you get a crate from a mirror you can be sure that you have the exact same source by ensuring the checksums match. So in that sense hashes in lockfiles should be enough to guarantee mirrors are actual mirrors. The situation you might be referring to, however, is that if you first form a dependency edge and you pull the dependency in from a mirror. In that case, yes, you don't actually check in with the source of truth to ensure the checksum is the same. The discrepancy will arise quickly, however, when you ship that
I'm not sure I fully understand this tangent because
In many of these examples, this goes against the distinction between
I think that trying to fit registries into this hole may be the wrong abstraction, instead we can always just create another type of source which Cargo reads crates from. I don't really follow how you ended up at the conclusion along the lines of "everything should be a registry" (if I'm interpreting that correctly), but it seems like it may not be entirely related to this PR or this feature, so perhaps discussion about that can happen in separate threads? |
Also, for those interested, I've rebased this PR and implemented one extra feature, located at #2857 |
@alexcrichton The A better concern would be distros that need to patch some crates.io---it would be nice to have them not need to lie that they are just mirroring crates.io rather than modifying it. But I don't find this too pressing.
Exactly what I was referring to.
I forgot about the |
Some of this probably could be moved to a new thread, I'll take a look at the new version, and stop commenting here. |
This I currently consider a non-starter because it makes
Note that all the support here is intended to prevent exactly this situation. If a distro modifies a crate then it changes the checksum, so Cargo will refuse to replace a crates.io crate with a distro crate (using mirroring support) because the lock file will disagree. If instead the Cargo.toml explicitly indicates that it's pulling a crate from the distro rather than crates.io, however, then that's perfectly fine. |
Hmm, my thought for both of those was that the developer was leaving something unspecified as a parameter. But if they want to do that, users could provide the arguments with a workspace Cargo.toml that just contains the one package. Contrary to my initial concern with leveraging Cargo.toml, this works for both libraries and binaries alike. No more interest in abusing |
This series of commits culminates in first class support in Cargo for local mirrors of registries. This is implemented through a number of other more generic mechanisms, and extra support was added along the way. The highlights of this PR, however, are:
Source redirection
New
.cargo/config
keys have been added to enable replacing one source with another. This functionality is intended to be used for mirrors of the main registry or otherwise one to one source correspondences. The support looks like:This configuration means that instead of using
crates-io
(e.g.https://github.com/rust-lang/crates.io-index
), Cargo will query themy-awesome-registry
source instead (configured to a different index here). This alternate source must be the exact same as the crates.io index. Cargo assumes that replacement sources are exact 1:1 mirrors in this respect, and the following support is designed around that assumption.When generating a lock file for crate using a replacement registry, the original registry will be encoded into the lock file. For example in the configuration above, all lock files will still mention crates.io as the registry that packages originated from. This semantically represents how crates.io is the source of truth for all crates, and this is upheld because all replacements have a 1:1 correspondance.
Overall, this means that no matter what replacement source you're working with, you can ship your lock file to anyone else and you'll all still have verifiably reproducible builds!
Adding sha256 checksums to the lock file
With the above support for custom registries, it's now possible for a project to be downloading crates from any number of sources. One of Cargo's core goals is reproducible builds, and with all these new sources of information it may be easy for a few situations to arise:
In both of these cases, Cargo would today simply give non-reproducible builds. To help assuage these concerns, Cargo will now track the sha256 checksum of all crates from registries in the lock file. Whenever a
Cargo.lock
is generated from now on it will contain a[metadata]
section which lists the sha256 checksum of all crates in the lock file (or<none>
if the sha256 checksum isn't known).Cargo already checks registry checksums against what's actually downloaded, and Cargo will now verify between iterations of the lock file that checksum remain the same as well. This means that if a local replacement registry is not in a 1:1 correspondance with crates.io, the lock file will prevent the build from progressing until the discrepancy is resolved.
Local Registries
In addition to the support above, there is now a new kind of source in Cargo, a "local registry", which is intended to be a subset of the crates.io ecosystem purposed for a local build for any particular project here or there. The way to enable this looks like:
This local registry is expected to have two components:
index
which matches the same structure as the crates.io index. Theconfig.json
file is not required here..crate
files (downloaded from crates.io). Each crate file has the name<package>-<version>.crate
.This local registry must currently be managed manually, but I plan on publishing and maintaining a Cargo subcommand to manage a local registry. It will have options to do things like:
Cargo.lock
What's all this for?
This is quite a bit of new features! What's all this meant to do? Some example scenarios that this is envisioned to solve are:
What's next?
Even with the new goodies here, there's some more vectors through which this can be expanded:
cargo install foo
available to have everything "Just Work".Cargo.toml
file itself. For example:Cargo.toml
(note that these replacements, unlike the ones above, would be encoded intoCargo.lock
)Cargo.toml
should be supported