Skip to content

Commit

Permalink
Auto merge of #9125 - ehuss:index-docs, r=alexcrichton
Browse files Browse the repository at this point in the history
Add some documentation for index and registry stuff.

This adds some internal docs for index and registry things.

Split out of #9111.
  • Loading branch information
bors committed Feb 3, 2021
2 parents 3875bbb + 838e538 commit 398514a
Show file tree
Hide file tree
Showing 5 changed files with 132 additions and 14 deletions.
18 changes: 18 additions & 0 deletions src/cargo/sources/registry/index.rs
Original file line number Diff line number Diff line change
Expand Up @@ -164,10 +164,28 @@ fn overflow_hyphen() {
)
}

/// Manager for handling the on-disk index.
///
/// Note that local and remote registries store the index differently. Local
/// is a simple on-disk tree of files of the raw index. Remote registries are
/// stored as a raw git repository. The different means of access are handled
/// via the [`RegistryData`] trait abstraction.
///
/// This transparently handles caching of the index in a more efficient format.
pub struct RegistryIndex<'cfg> {
source_id: SourceId,
/// Root directory of the index for the registry.
path: Filesystem,
/// Cache of summary data.
///
/// This is keyed off the package name. The [`Summaries`] value handles
/// loading the summary data. It keeps an optimized on-disk representation
/// of the JSON files, which is created in an as-needed fashion. If it
/// hasn't been cached already, it uses [`RegistryData::load`] to access
/// to JSON files from the index, and the creates the optimized on-disk
/// summary cache.
summaries_cache: HashMap<InternedString, Summaries>,
/// [`Config`] reference for convenience.
config: &'cfg Config,
}

Expand Down
3 changes: 3 additions & 0 deletions src/cargo/sources/registry/local.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ use std::io::prelude::*;
use std::io::SeekFrom;
use std::path::Path;

/// A local registry is a registry that lives on the filesystem as a set of
/// `.crate` files with an `index` directory in the same format as a remote
/// registry.
pub struct LocalRegistry<'cfg> {
index_path: Filesystem,
root: Filesystem,
Expand Down
119 changes: 106 additions & 13 deletions src/cargo/sources/registry/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@
//! ```
//!
//! The root of the index contains a `config.json` file with a few entries
//! corresponding to the registry (see `RegistryConfig` below).
//! corresponding to the registry (see [`RegistryConfig`] below).
//!
//! Otherwise, there are three numbered directories (1, 2, 3) for crates with
//! names 1, 2, and 3 characters in length. The 1/2 directories simply have the
Expand Down Expand Up @@ -189,16 +189,42 @@ const VERSION_TEMPLATE: &str = "{version}";
const PREFIX_TEMPLATE: &str = "{prefix}";
const LOWER_PREFIX_TEMPLATE: &str = "{lowerprefix}";

/// A "source" for a [local](local::LocalRegistry) or
/// [remote](remote::RemoteRegistry) registry.
///
/// This contains common functionality that is shared between the two registry
/// kinds, with the registry-specific logic implemented as part of the
/// [`RegistryData`] trait referenced via the `ops` field.
pub struct RegistrySource<'cfg> {
source_id: SourceId,
/// The path where crate files are extracted (`$CARGO_HOME/registry/src/$REG-HASH`).
src_path: Filesystem,
/// Local reference to [`Config`] for convenience.
config: &'cfg Config,
/// Whether or not the index has been updated.
///
/// This is used as an optimization to avoid updating if not needed, such
/// as `Cargo.lock` already exists and the index already contains the
/// locked entries. Or, to avoid updating multiple times.
///
/// Only remote registries really need to update. Local registries only
/// check that the index exists.
updated: bool,
/// Abstraction for interfacing to the different registry kinds.
ops: Box<dyn RegistryData + 'cfg>,
/// Interface for managing the on-disk index.
index: index::RegistryIndex<'cfg>,
/// A set of packages that should be allowed to be used, even if they are
/// yanked.
///
/// This is populated from the entries in `Cargo.lock` to ensure that
/// `cargo update -p somepkg` won't unlock yanked entries in `Cargo.lock`.
/// Otherwise, the resolver would think that those entries no longer
/// exist, and it would trigger updates to unrelated packages.
yanked_whitelist: HashSet<PackageId>,
}

/// The `config.json` file stored in the index.
#[derive(Deserialize)]
pub struct RegistryConfig {
/// Download endpoint for all crates.
Expand Down Expand Up @@ -278,18 +304,7 @@ fn escaped_char_in_json() {
.unwrap();
}

#[derive(Deserialize)]
#[serde(field_identifier, rename_all = "lowercase")]
enum Field {
Name,
Vers,
Deps,
Features,
Cksum,
Yanked,
Links,
}

/// A dependency as encoded in the index JSON.
#[derive(Deserialize)]
struct RegistryDependency<'a> {
name: InternedString,
Expand Down Expand Up @@ -369,30 +384,108 @@ impl<'a> RegistryDependency<'a> {
}
}

/// An abstract interface to handle both a [local](local::LocalRegistry) and
/// [remote](remote::RemoteRegistry) registry.
///
/// This allows [`RegistrySource`] to abstractly handle both registry kinds.
pub trait RegistryData {
/// Performs initialization for the registry.
///
/// This should be safe to call multiple times, the implementation is
/// expected to not do any work if it is already prepared.
fn prepare(&self) -> CargoResult<()>;

/// Returns the path to the index.
///
/// Note that different registries store the index in different formats
/// (remote=git, local=files).
fn index_path(&self) -> &Filesystem;

/// Loads the JSON for a specific named package from the index.
///
/// * `root` is the root path to the index.
/// * `path` is the relative path to the package to load (like `ca/rg/cargo`).
/// * `data` is a callback that will receive the raw bytes of the index JSON file.
fn load(
&self,
root: &Path,
path: &Path,
data: &mut dyn FnMut(&[u8]) -> CargoResult<()>,
) -> CargoResult<()>;

/// Loads the `config.json` file and returns it.
///
/// Local registries don't have a config, and return `None`.
fn config(&mut self) -> CargoResult<Option<RegistryConfig>>;

/// Updates the index.
///
/// For a remote registry, this updates the index over the network. Local
/// registries only check that the index exists.
fn update_index(&mut self) -> CargoResult<()>;

/// Prepare to start downloading a `.crate` file.
///
/// Despite the name, this doesn't actually download anything. If the
/// `.crate` is already downloaded, then it returns [`MaybeLock::Ready`].
/// If it hasn't been downloaded, then it returns [`MaybeLock::Download`]
/// which contains the URL to download. The [`crate::core::package::Download`]
/// system handles the actual download process. After downloading, it
/// calls [`finish_download`] to save the downloaded file.
///
/// `checksum` is currently only used by local registries to verify the
/// file contents (because local registries never actually download
/// anything). Remote registries will validate the checksum in
/// `finish_download`. For already downloaded `.crate` files, it does not
/// validate the checksum, assuming the filesystem does not suffer from
/// corruption or manipulation.
fn download(&mut self, pkg: PackageId, checksum: &str) -> CargoResult<MaybeLock>;

/// Finish a download by saving a `.crate` file to disk.
///
/// After [`crate::core::package::Download`] has finished a download,
/// it will call this to save the `.crate` file. This is only relevant
/// for remote registries. This should validate the checksum and save
/// the given data to the on-disk cache.
///
/// Returns a [`File`] handle to the `.crate` file, positioned at the start.
fn finish_download(&mut self, pkg: PackageId, checksum: &str, data: &[u8])
-> CargoResult<File>;

/// Returns whether or not the `.crate` file is already downloaded.
fn is_crate_downloaded(&self, _pkg: PackageId) -> bool {
true
}

/// Validates that the global package cache lock is held.
///
/// Given the [`Filesystem`], this will make sure that the package cache
/// lock is held. If not, it will panic. See
/// [`Config::acquire_package_cache_lock`] for acquiring the global lock.
///
/// Returns the [`Path`] to the [`Filesystem`].
fn assert_index_locked<'a>(&self, path: &'a Filesystem) -> &'a Path;

/// Returns the current "version" of the index.
///
/// For local registries, this returns `None` because there is no
/// versioning. For remote registries, this returns the SHA hash of the
/// git index on disk (or None if the index hasn't been downloaded yet).
///
/// This is used by index caching to check if the cache is out of date.
fn current_version(&self) -> Option<InternedString>;
}

/// The status of [`RegistryData::download`] which indicates if a `.crate`
/// file has already been downloaded, or if not then the URL to download.
pub enum MaybeLock {
/// The `.crate` file is already downloaded. [`File`] is a handle to the
/// opened `.crate` file on the filesystem.
Ready(File),
/// The `.crate` file is not downloaded, here's the URL to download it from.
///
/// `descriptor` is just a text string to display to the user of what is
/// being downloaded.
Download { url: String, descriptor: String },
}

Expand Down
4 changes: 4 additions & 0 deletions src/cargo/sources/registry/remote.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,12 @@ fn make_dep_prefix(name: &str) -> String {
}
}

/// A remote registry is a registry that lives at a remote URL (such as
/// crates.io). The git index is cloned locally, and `.crate` files are
/// downloaded as needed and cached locally.
pub struct RemoteRegistry<'cfg> {
index_path: Filesystem,
/// Path to the cache of `.crate` files (`$CARGO_HOME/registry/path/$REG-HASH`).
cache_path: Filesystem,
source_id: SourceId,
index_git_ref: GitReference,
Expand Down
2 changes: 1 addition & 1 deletion src/cargo/util/toml/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -876,7 +876,7 @@ struct Context<'a, 'b> {
}

impl TomlManifest {
/// Prepares the manfiest for publishing.
/// Prepares the manifest for publishing.
// - Path and git components of dependency specifications are removed.
// - License path is updated to point within the package.
pub fn prepare_for_publish(
Expand Down

0 comments on commit 398514a

Please sign in to comment.