Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resolver speed #14663

Merged
merged 2 commits into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions Cargo.lock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Eh2406 I moved conversion here for us to follow easier.

All of these hashes are being used exclusively in memory. So the concerns about consistency between builds let alone consistency between architectures are not relevant here.

Despite that, the lang team has recently decided to migrate away from weak hash algorithms. I assume that type_id is also mostly an in-memory stuff. Do we need to follow suit?

Copy link
Member

@weihanglo weihanglo Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How easy is it for an attacker to fake a PackageId, and how bad is it when it really happens?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with type_id is that it assumes that if the hash is equal than the object is equal. In its case, it even goes so far as to do an unsafe transmute based on that assumption.

In this PRs case we only use it for a HashMap. If two things spuriously have the same hash than they will be checked by eq. If an attacker could manage to get a lot of things to hash to the same thing then cargo would spend O(n) time finding the right one instead of O(1) amortized.

How easy is it for an attacker to fake a PackageId, and how bad is it when it really happens?

PackageId's are only generated for dependencies. If a dependency is controlled by the attacker, then serious hash collisions are the least of our problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even SipHash is not an choice from t-lang's perspective in that linked issue, which is what Cargo is using now from std. As you said, we have eq so should be even harder to exploit Cargo.

If an attacker could manage to get a lot of things to hash to the same thing then cargo would spend O(n) time finding the right one instead of O(1) amortized.

This might be a problem for companies using either cargo binary or library as a service. I am not sure if we need to take this into account. Without considering this I am 👍🏾 on this PR.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ pulldown-cmark = { version = "0.11.0", default-features = false, features = ["ht
rand = "0.8.5"
regex = "1.10.5"
rusqlite = { version = "0.32.0", features = ["bundled"] }
rustc-hash = "2.0.0"
rustfix = { version = "0.8.2", path = "crates/rustfix" }
same-file = "1.0.6"
security-framework = "2.11.1"
Expand Down Expand Up @@ -187,6 +188,7 @@ pathdiff.workspace = true
rand.workspace = true
regex.workspace = true
rusqlite.workspace = true
rustc-hash.workspace = true
rustfix.workspace = true
same-file.workspace = true
semver.workspace = true
Expand Down
2 changes: 1 addition & 1 deletion crates/resolver-tests/src/sat.rs
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ impl SatResolver {

let mut by_name: HashMap<InternedString, Vec<Summary>> = HashMap::new();
for pkg in registry {
by_name.entry(pkg.name()).or_default().push(pkg.clone())
by_name.entry(pkg.name()).or_default().push(pkg.clone());
}

let mut solver = varisat::Solver::new();
Expand Down
48 changes: 46 additions & 2 deletions crates/resolver-tests/tests/resolve.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ use cargo::util::GlobalContext;

use resolver_tests::{
helpers::{
assert_contains, assert_same, dep, dep_kind, dep_loc, dep_req, loc_names, names, pkg_id,
pkg_loc, registry, ToPkgId,
assert_contains, assert_same, dep, dep_kind, dep_loc, dep_req, loc_names, names, pkg,
pkg_dep, pkg_dep_with, pkg_id, pkg_loc, registry, ToDep, ToPkgId,
},
pkg, resolve, resolve_with_global_context,
};
Expand Down Expand Up @@ -937,6 +937,50 @@ fn large_conflict_cache() {
let _ = resolve(root_deps, &reg);
}

#[test]
fn resolving_slow_case_missing_feature() {
let mut reg = Vec::new();

const LAST_CRATE_VERSION_COUNT: usize = 50;

// increase in resolve time is at least cubic over `INTERMEDIATE_CRATES_VERSION_COUNT`.
// it should be `>= LAST_CRATE_VERSION_COUNT` to reproduce slowdown.
const INTERMEDIATE_CRATES_VERSION_COUNT: usize = LAST_CRATE_VERSION_COUNT + 5;

// should be `>= 2` to reproduce slowdown
const TRANSITIVE_CRATES_COUNT: usize = 3;

reg.push(pkg_dep_with(("last", "1.0.0"), vec![], &[("f", &[])]));
for v in 1..LAST_CRATE_VERSION_COUNT {
reg.push(pkg(("last", format!("1.0.{v}"))));
}

reg.push(pkg_dep(
("dep", "1.0.0"),
vec![
dep("last"), // <-- needed to reproduce slowdown
dep_req("intermediate-1", "1.0.0"),
],
));

for n in 0..INTERMEDIATE_CRATES_VERSION_COUNT {
let version = format!("1.0.{n}");
for c in 1..TRANSITIVE_CRATES_COUNT {
reg.push(pkg_dep(
(format!("intermediate-{c}"), &version),
vec![dep_req(&format!("intermediate-{}", c + 1), &version)],
));
}
reg.push(pkg_dep(
(format!("intermediate-{TRANSITIVE_CRATES_COUNT}"), &version),
vec![dep_req("last", "1.0.0").with(&["f"])],
));
}

let deps = vec![dep("dep")];
let _ = resolve(deps, &reg);
}

#[test]
fn cyclic_good_error_message() {
let input = vec![
Expand Down
18 changes: 8 additions & 10 deletions src/cargo/core/resolver/conflict_cache.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
use std::collections::{BTreeMap, HashMap, HashSet};
use std::collections::{BTreeMap, HashMap};

use rustc_hash::{FxHashMap, FxHashSet};
use tracing::trace;

use super::types::ConflictMap;
Expand Down Expand Up @@ -140,17 +141,17 @@ pub(super) struct ConflictCache {
// as a global cache which we never delete from. Any entry in this map is
// unconditionally true regardless of our resolution history of how we got
// here.
con_from_dep: HashMap<Dependency, ConflictStoreTrie>,
con_from_dep: FxHashMap<Dependency, ConflictStoreTrie>,
// `dep_from_pid` is an inverse-index of `con_from_dep`.
// For every `PackageId` this lists the `Dependency`s that mention it in `dep_from_pid`.
dep_from_pid: HashMap<PackageId, HashSet<Dependency>>,
dep_from_pid: FxHashMap<PackageId, FxHashSet<Dependency>>,
}

impl ConflictCache {
pub fn new() -> ConflictCache {
ConflictCache {
con_from_dep: HashMap::new(),
dep_from_pid: HashMap::new(),
con_from_dep: HashMap::default(),
dep_from_pid: HashMap::default(),
}
}
pub fn find(
Expand Down Expand Up @@ -207,14 +208,11 @@ impl ConflictCache {
);

for c in con.keys() {
self.dep_from_pid
.entry(*c)
.or_insert_with(HashSet::new)
.insert(dep.clone());
self.dep_from_pid.entry(*c).or_default().insert(dep.clone());
}
}

pub fn dependencies_conflicting_with(&self, pid: PackageId) -> Option<&HashSet<Dependency>> {
pub fn dependencies_conflicting_with(&self, pid: PackageId) -> Option<&FxHashSet<Dependency>> {
self.dep_from_pid.get(&pid)
}
}
16 changes: 9 additions & 7 deletions src/cargo/core/resolver/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ pub struct ResolverContext {
pub age: ContextAge,
pub activations: Activations,
/// list the features that are activated for each package
pub resolve_features: im_rc::HashMap<PackageId, FeaturesSet>,
pub resolve_features: im_rc::HashMap<PackageId, FeaturesSet, rustc_hash::FxBuildHasher>,
/// get the package that will be linking to a native library by its links attribute
pub links: im_rc::HashMap<InternedString, PackageId>,
pub links: im_rc::HashMap<InternedString, PackageId, rustc_hash::FxBuildHasher>,

/// a way to look up for a package in activations what packages required it
/// and all of the exact deps that it fulfilled.
pub parents: Graph<PackageId, im_rc::HashSet<Dependency>>,
pub parents: Graph<PackageId, im_rc::HashSet<Dependency, rustc_hash::FxBuildHasher>>,
}

/// When backtracking it can be useful to know how far back to go.
Expand All @@ -40,7 +40,9 @@ pub type ContextAge = usize;
/// semver compatible version of each crate.
/// This all so stores the `ContextAge`.
pub type ActivationsKey = (InternedString, SourceId, SemverCompatibility);
x-hgg-x marked this conversation as resolved.
Show resolved Hide resolved
pub type Activations = im_rc::HashMap<ActivationsKey, (Summary, ContextAge)>;

pub type Activations =
im_rc::HashMap<ActivationsKey, (Summary, ContextAge), rustc_hash::FxBuildHasher>;

/// A type that represents when cargo treats two Versions as compatible.
/// Versions `a` and `b` are compatible if their left-most nonzero digit is the
Expand Down Expand Up @@ -74,10 +76,10 @@ impl ResolverContext {
pub fn new() -> ResolverContext {
ResolverContext {
age: 0,
resolve_features: im_rc::HashMap::new(),
links: im_rc::HashMap::new(),
resolve_features: im_rc::HashMap::default(),
links: im_rc::HashMap::default(),
parents: Graph::new(),
activations: im_rc::HashMap::new(),
activations: im_rc::HashMap::default(),
}
}

Expand Down
2 changes: 1 addition & 1 deletion src/cargo/core/resolver/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -679,7 +679,7 @@ fn activate(
Rc::make_mut(
cx.resolve_features
.entry(candidate.package_id())
.or_insert_with(Rc::default),
.or_default(),
)
.extend(used_features);
}
Expand Down
2 changes: 1 addition & 1 deletion src/cargo/util/graph.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ impl<N: Eq + Ord + Clone, E: Default + Clone> Graph<N, E> {
.entry(node)
.or_insert_with(im_rc::OrdMap::new)
.entry(child)
.or_insert_with(Default::default)
.or_default()
}

/// Returns the graph obtained by reversing all edges.
Expand Down