Skip to content

PackageRegistry is extremely slow on many-to-one source replacements #15751

@dtolnay

Description

@dtolnay

I have a workspace with a large number of git dependencies vendored by cargo vendor. Concretely: the vendor directory contains 3662 packages and the .cargo/config.toml looks like this:

[source.crates-io]
replace-with = "vendored-sources"

# 61 of these:
[source."git+https://github.com/...?rev=..."]
git = "https://github.com/..."
rev = "..."
replace-with = "vendored-sources"

[source.vendored-sources]
directory = "vendor"

Within this workspace, every Cargo command that touches a resolver (for example cargo metadata) takes over 60 seconds, for a silly reason.

All of the time is spent inside of this call, which is called 62 times and takes about 1 second per call. What Cargo has done is create a unique SourceId for each git repo, construct an independent Box<dyn Source> for each one (with Source::source_id() = the git repo, and Source::replaced_source_id() = the vendor directory), then serially call Source::block_until_ready() on each one. Each of these calls spends 1 second parsing all 3662 packages in the source. But of course they are the same packages each time, just with different SourceId.

If I bypass most of PackageRegistry by preloading it in the following way, everything speeds up from 60 seconds to 1 second. This uses a single shared Box<dyn Source> for the vendor directory to back all of the replaced git sources, so packages only need to load once.

use std::cell::RefCell;
use std::collections::{hash_map, HashMap, HashSet};
use std::rc::Rc;
use std::task::Poll;

let shell = cargo::core::Shell::new();
let gctx = cargo::GlobalContext::new(shell, cwd, homedir);
let source_config = cargo::sources::SourceConfigMap::new(&gctx)?;
let workspace = cargo::core::Workspace::new(manifest_path, &gctx)?;
let previous_resolve = cargo::ops::load_pkg_lockfile(&workspace)?;

let mut source_map = HashMap::new();
if let Some(resolve) = &previous_resolve {
    let yanked_whitelist = HashSet::new();
    let mut shared_sources = HashMap::new();
    for pkg_id in resolve.iter() {
        let source_id = pkg_id.source_id();
        let hash_map::Entry::Vacant(entry) = source_map.entry(source_id) else {
            continue;
        };
        let source = source_config.load(source_id, &yanked_whitelist)?;
        assert_eq!(source.source_id(), source_id);
        let replaced_source_id = source.replaced_source_id();
        let delegate = match shared_sources.entry(replaced_source_id) {
            hash_map::Entry::Vacant(entry) => {
                let source = source_config.load(replaced_source_id, &yanked_whitelist)?;
                assert_eq!(source.source_id(), replaced_source_id);
                let rc = Rc::new(RefCell::new(source));
                Rc::clone(entry.insert(rc))
            }
            hash_map::Entry::Occupied(entry) => Rc::clone(entry.get()),
        };
        entry.insert(SharedSource { source_id, delegate });
    }
}

let mut registry = cargo::core::registry::PackageRegistry::new_with_source_config(
    &gctx,
    source_config.clone(),
)?;

for mut source in source_map.into_values() {
    cargo::sources::source::Source::block_until_ready(&mut source)?;
    registry.add_preloaded(Box::new(source));
}

let keep_previous = None;
let specs = [];
let register_patches = false;
let resolve = cargo::ops::resolve_with_previous(
    &mut registry,
    &workspace,
    &cargo::core::resolver::CliFeatures::new_all(true),
    cargo::core::resolver::HasDevUnits::Yes,
    previous_resolve.as_ref(),
    keep_previous,
    &specs,
    register_patches,
)?;

struct SharedSource<'gctx> {
    source_id: cargo::core::SourceId,
    delegate: Rc<RefCell<Box<dyn cargo::sources::source::Source + 'gctx>>>,
}

impl<'gctx> cargo::sources::source::Source for SharedSource<'gctx> {
    fn source_id(&self) -> cargo::core::SourceId {
        self.source_id
    }

    fn query(
        &mut self,
        dep: &cargo::core::Dependency,
        kind: cargo::sources::source::QueryKind,
        f: &mut dyn FnMut(cargo::sources::IndexSummary),
    ) -> Poll<cargo::CargoResult<()>> {
        let mut delegate = self.delegate.borrow_mut();
        let delegate_source_id = delegate.source_id();
        let dep = dep.clone().map_source(self.source_id, delegate_source_id);
        delegate.query(&dep, kind, &mut |summary| {
            f(summary
                .map_summary(|summary| summary.map_source(delegate_source_id, self.source_id)));
        })
    }

    fn download(
        &mut self,
        pkg_id: cargo::core::PackageId,
    ) -> cargo::CargoResult<cargo::sources::source::MaybePackage> {
        let mut delegate = self.delegate.borrow_mut();
        let delegate_source_id = delegate.source_id();
        let delegate_pkg_id = pkg_id.with_source_id(delegate_source_id);
        let package = match delegate.download(delegate_pkg_id)? {
            cargo::sources::source::MaybePackage::Ready(package) => package,
            download @ cargo::sources::source::MaybePackage::Download { .. } => {
                return Ok(download);
            }
        };
        Ok(cargo::sources::source::MaybePackage::Ready(
            package.map_source(delegate_source_id, pkg_id.source_id()),
        ))
    }

    fn add_to_yanked_whitelist(&mut self, pkgs: &[cargo::core::PackageId]) {
        let mut delegate = self.delegate.borrow_mut();
        let delegate_source_id = delegate.source_id();
        let delegate_pkgs = Vec::from_iter(
            pkgs.iter()
                .map(|pkg| pkg.with_source_id(delegate_source_id)),
        );
        delegate.add_to_yanked_whitelist(&delegate_pkgs);
    }

    fn block_until_ready(&mut self) -> cargo::CargoResult<()> {
        self.delegate.borrow_mut().block_until_ready()
    }

    ...
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions