Skip to content

fix(conda): dedup repodata by archive identifier instead of URL#9831

Merged
jdx merged 4 commits into
mainfrom
claude/adoring-chatterjee-575f6a
May 13, 2026
Merged

fix(conda): dedup repodata by archive identifier instead of URL#9831
jdx merged 4 commits into
mainfrom
claude/adoring-chatterjee-575f6a

Conversation

@jdx

@jdx jdx commented May 13, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes #9829mise use -g imagemagick (and other tools pulling adwaita-icon-theme transitively) fails with:

conda solve failed: encountered duplicate records for adwaita-icon-theme-40.1.1-ha770c72_1.tar.bz2

PR #8337 previously addressed this class of error by deduplicating records by URL in flatten_repodata, but that wasn't sufficient. rattler-solve's resolvo solver detects duplicates by DistArchiveIdentifier (name-version-build + archive type), not by URL — see rattler_solve-6.0.2/src/resolvo/mod.rs:414. When conda-forge serves the same archive under multiple URLs (distinct CDN paths / aliasing), URL-based dedup keeps both, so the solver still rejects them.

This PR switches the dedup key to r.identifier, the exact key rattler-solve uses, so colliding records can no longer reach the solver. .conda vs .tar.bz2 variants of the same package remain distinct (their archive_type differs), preserving the solver's existing archive-type preference logic.

Test plan

  • cargo check -p mise compiles
  • User-reported reproducer: mise use -g imagemagick on Linux64

Note

Medium Risk
Changes conda solver input deduplication to drop records by DistArchiveIdentifier, which can affect which package variants reach the solver and thus impact dependency resolution. Scope is small and covered by new regression tests, but it touches install/solve behavior.

Overview
Fixes conda solve failures caused by duplicate package records being returned under different URLs by deduplicating solver inputs on RepoDataRecord.identifier (archive identifier) instead of URL.

Refactors repodata flattening to reuse a new dedup_records_by_identifier helper, and adds unit tests to ensure identical identifiers across different URLs collapse while distinct .conda vs .tar.bz2 variants are preserved.

Reviewed by Cursor Bugbot for commit 01fc485. Bugbot is set up for automated code reviews on this repo. Configure here.

The previous URL-based dedup in flatten_repodata wasn't sufficient: rattler-solve's
resolvo solver detects duplicates by DistArchiveIdentifier (name-version-build +
archive type), so when conda-forge serves the same archive under multiple URLs
(e.g. distinct CDN paths), URL dedup keeps both and the solver rejects them with
"encountered duplicate records for <filename>". Dedup by r.identifier instead —
the exact key the solver uses — so collisions can no longer slip through. .conda
vs .tar.bz2 variants stay distinct because their archive_type differs.

Fixes the imagemagick install failure reported in #9829.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Fixes a conda solve failure where rattler-solve's resolvo solver rejects duplicate RepoDataRecords that share the same DistArchiveIdentifier (name-version-build + archive type) but were served under different CDN URLs. The previous URL-based dedup in flatten_repodata was insufficient for this case.

  • Extracts a new dedup_records_by_identifier helper that filters on r.identifier (the exact key the solver uses), keeping the first-encountered record when two URLs resolve to the same archive.
  • flatten_repodata now delegates to this helper, preserving .conda vs .tar.bz2 variants as distinct (their CondaArchiveType differs), so the solver's archive-type preference logic is unaffected.
  • Two focused unit tests cover the regression case (same identifier, different URLs → collapse to 1) and the preservation case (.conda vs .tar.bz2 → keep both).

Confidence Score: 5/5

Safe to merge — the change is a single-file, narrowly scoped fix to conda repodata deduplication with no observable behavior change for non-duplicate inputs.

The fix correctly targets the exact key the solver uses to detect duplicates (r.identifier), the refactor into dedup_records_by_identifier preserves all existing behavior for the common case, and the two new unit tests cover both the regression scenario and the archive-type preservation invariant. No other code paths are affected.

No files require special attention.

Important Files Changed

Filename Overview
src/backend/conda.rs Replaces URL-based dedup with identifier-based dedup in flatten_repodata; extracts the logic into dedup_records_by_identifier; adds two regression unit tests. Change is minimal, targeted, and correct.

Reviews (4): Last reviewed commit: "style(conda): apply rustfmt to test addi..." | Re-trigger Greptile

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the deduplication logic in flatten_repodata to use archive identifiers instead of URLs, ensuring consistency with how rattler-solve detects duplicates. Feedback suggests optimizing the implementation by filtering references before cloning to improve performance and reduce unnecessary memory allocations.

Comment thread src/backend/conda.rs Outdated
jdx and others added 2 commits May 13, 2026 08:29
Filter on references and clone only the records that survive dedup, instead
of cloning every record up front. Per gemini-code-assist on #9831.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extract the dedup loop into a private helper so it's testable without
constructing a gateway RepoData (whose records field is pub(crate)).
Adds two regression tests:

- Same name-version-build served under different URLs collapses to one record
  (the #9829 repro).
- .conda and .tar.bz2 variants of the same package are preserved so the
  solver's archive-type preference still applies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown

Hyperfine Performance

mise x -- echo

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.5.6 x -- echo 58.1 ± 11.2 31.1 81.3 1.00
mise x -- echo 59.9 ± 10.5 31.0 81.3 1.03 ± 0.27

mise env

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.5.6 env 56.3 ± 10.3 29.9 72.0 1.00
mise env 57.0 ± 10.3 26.6 76.7 1.01 ± 0.26

mise hook-env

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.5.6 hook-env 60.2 ± 11.8 28.3 79.1 1.00
mise hook-env 61.0 ± 10.5 32.3 84.5 1.01 ± 0.26

mise ls

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.5.6 ls 49.6 ± 8.8 26.0 68.2 1.00
mise ls 49.9 ± 8.6 21.2 66.8 1.01 ± 0.25

xtasks/test/perf

Command mise-2026.5.6 mise Variance
install (cached) 290ms 271ms +7%
ls (cached) 190ms 189ms +0%
bin-paths (cached) 206ms 207ms +0%
task-ls (cached) 706ms 682ms +3%

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jdx jdx merged commit d2eb9f4 into main May 13, 2026
33 checks passed
@jdx jdx deleted the claude/adoring-chatterjee-575f6a branch May 13, 2026 14:21
mise-en-dev added a commit that referenced this pull request May 13, 2026
### 🐛 Bug Fixes

- **(backend)** use runtime paths for backend bin dirs by @risu729 in
[#9606](#9606)
- **(ci)** preserve vendor/aqua-registry/ in PPA publish workflow by
@jdx in [#9782](#9782)
- **(ci)** set UTF-8 locale in e2e Docker image by @jdx in
[#9820](#9820)
- **(ci)** pass UTF-8 locale through to e2e tests by @jdx in
[#9823](#9823)
- **(conda)** dedup repodata by archive identifier instead of URL by
@jdx in [#9831](#9831)
- **(github)** use default shell for credential command by @risu729 in
[#9664](#9664)
- **(settings)** distinguish unset known settings from unknown ones by
@jdx in [#9818](#9818)
- **(upgrade)** remove completed progress jobs to prevent duplicate
output by @jdx in [#9779](#9779)
- **(vfox)** resolve GitHub token lazily inside Lua plugins by @jdx in
[#9816](#9816)

### 🚜 Refactor

- **(config)** separate core and backend tool options by @risu729 in
[#9753](#9753)
- **(schema)** reuse env directive property schemas by @risu729 in
[#9651](#9651)

### 📚 Documentation

- **(aliases)** fix Aliased Versions example and drop stale asdf callout
by @jdx in [#9830](#9830)

### ⚡ Performance

- **(aqua)** use phf for baked registry lookups by @risu729 in
[#9763](#9763)
- **(task)** cache per-file content hashes for
source_freshness_hash_contents by @jdx in
[#9819](#9819)

### 🧪 Testing

- **(e2e)** pin aube to known-good version in npm package_manager test
by @jdx in [#9794](#9794)

### 📦 Registry

- replace unsupported exe options by @risu729 in
[#9587](#9587)
- update pi by @garysassano in
[#9792](#9792)

### Chore

- **(ci)** use non-large runners for release builds by @jdx in
[#9786](#9786)
- **(ci)** compare registry PRs from fork point by @risu729 in
[#9643](#9643)
- **(ci)** make build-copr.sh the single source of truth for COPR
chroots by @jdx in [#9788](#9788)
- **(ci)** use crates.io trusted publishing in release-plz by @jdx in
[#9793](#9793)
- **(ci)** remove autofix.ci workflow by @jdx in
[#9801](#9801)
- **(ci)** restore -large runner for Linux release builds by @jdx in
[#9815](#9815)
- **(ci)** add zizmor workflow for github actions security analysis by
@jdx in [#9804](#9804)
- **(ci)** assert mise run render produces no diff by @jdx in
[#9803](#9803)
- **(copr)** publish EL9 builds via centos-stream+epel-next-9 chroot by
@jdx in [#9787](#9787)

### Ci

- remove pull_request_target workflow by @jdx in
[#9799](#9799)
- remove caching from publishing workflows by @jdx in
[#9800](#9800)

### Security

- reject shell metacharacters in version strings and CI inputs by @jdx
in [#9814](#9814)

## 📦 Aqua Registry Updates

### New Packages (11)

- [`Code-Hex/Neo-cowsay`](https://github.com/Code-Hex/Neo-cowsay)
-
[`SonarSource/sonarqube-cli`](https://github.com/SonarSource/sonarqube-cli)
- [`earendil-works/pi`](https://github.com/earendil-works/pi)
- [`hylo-lang/hylo-new`](https://github.com/hylo-lang/hylo-new)
- [`jfernandez/bpftop`](https://github.com/jfernandez/bpftop)
- [`modem-dev/hunk`](https://github.com/modem-dev/hunk)
- [`npm/cli`](https://github.com/npm/cli)
- [`racket/racket/minimal`](https://github.com/racket/racket)
- [`slackapi/slack-cli`](https://github.com/slackapi/slack-cli)
- [`vectordotdev/vector`](https://github.com/vectordotdev/vector)
- [`wasilibs/go-yamllint`](https://github.com/wasilibs/go-yamllint)

### Updated Packages (10)

- [`DataDog/pup`](https://github.com/DataDog/pup)
- [`aquasecurity/trivy`](https://github.com/aquasecurity/trivy)
- [`astral-sh/uv`](https://github.com/astral-sh/uv)
- [`caarlos0/svu`](https://github.com/caarlos0/svu)
-
[`cargo-bins/cargo-binstall`](https://github.com/cargo-bins/cargo-binstall)
- [`foundry-rs/foundry`](https://github.com/foundry-rs/foundry)
- [`gastownhall/beads`](https://github.com/gastownhall/beads)
-
[`gruntwork-io/terragrunt`](https://github.com/gruntwork-io/terragrunt)
- [`pnpm/pnpm`](https://github.com/pnpm/pnpm)
- [`santosr2/TerraTidy`](https://github.com/santosr2/TerraTidy)
@thernstig

Copy link
Copy Markdown
Contributor

Works great.

3PeatVR pushed a commit to 3PeatVR/mise that referenced this pull request May 14, 2026
…9831)

## Summary

Fixes [jdx#9829](jdx#9829) — `mise use
-g imagemagick` (and other tools pulling `adwaita-icon-theme`
transitively) fails with:

```
conda solve failed: encountered duplicate records for adwaita-icon-theme-40.1.1-ha770c72_1.tar.bz2
```

[PR jdx#8337](jdx#8337) previously addressed
this class of error by deduplicating records by URL in
`flatten_repodata`, but that wasn't sufficient. rattler-solve's resolvo
solver detects duplicates by `DistArchiveIdentifier`
(`name-version-build` + archive type), not by URL — see
`rattler_solve-6.0.2/src/resolvo/mod.rs:414`. When conda-forge serves
the same archive under multiple URLs (distinct CDN paths / aliasing),
URL-based dedup keeps both, so the solver still rejects them.

This PR switches the dedup key to `r.identifier`, the exact key
rattler-solve uses, so colliding records can no longer reach the solver.
`.conda` vs `.tar.bz2` variants of the same package remain distinct
(their `archive_type` differs), preserving the solver's existing
archive-type preference logic.

## Test plan

- [x] `cargo check -p mise` compiles
- [ ] User-reported reproducer: `mise use -g imagemagick` on Linux64

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes conda solver input deduplication to drop records by
`DistArchiveIdentifier`, which can affect which package variants reach
the solver and thus impact dependency resolution. Scope is small and
covered by new regression tests, but it touches install/solve behavior.
> 
> **Overview**
> Fixes conda solve failures caused by duplicate package records being
returned under different URLs by deduplicating solver inputs on
`RepoDataRecord.identifier` (archive identifier) instead of URL.
> 
> Refactors repodata flattening to reuse a new
`dedup_records_by_identifier` helper, and adds unit tests to ensure
identical identifiers across different URLs collapse while distinct
`.conda` vs `.tar.bz2` variants are preserved.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
01fc485. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3PeatVR pushed a commit to 3PeatVR/mise that referenced this pull request May 14, 2026
### 🐛 Bug Fixes

- **(backend)** use runtime paths for backend bin dirs by @risu729 in
[jdx#9606](jdx#9606)
- **(ci)** preserve vendor/aqua-registry/ in PPA publish workflow by
@jdx in [jdx#9782](jdx#9782)
- **(ci)** set UTF-8 locale in e2e Docker image by @jdx in
[jdx#9820](jdx#9820)
- **(ci)** pass UTF-8 locale through to e2e tests by @jdx in
[jdx#9823](jdx#9823)
- **(conda)** dedup repodata by archive identifier instead of URL by
@jdx in [jdx#9831](jdx#9831)
- **(github)** use default shell for credential command by @risu729 in
[jdx#9664](jdx#9664)
- **(settings)** distinguish unset known settings from unknown ones by
@jdx in [jdx#9818](jdx#9818)
- **(upgrade)** remove completed progress jobs to prevent duplicate
output by @jdx in [jdx#9779](jdx#9779)
- **(vfox)** resolve GitHub token lazily inside Lua plugins by @jdx in
[jdx#9816](jdx#9816)

### 🚜 Refactor

- **(config)** separate core and backend tool options by @risu729 in
[jdx#9753](jdx#9753)
- **(schema)** reuse env directive property schemas by @risu729 in
[jdx#9651](jdx#9651)

### 📚 Documentation

- **(aliases)** fix Aliased Versions example and drop stale asdf callout
by @jdx in [jdx#9830](jdx#9830)

### ⚡ Performance

- **(aqua)** use phf for baked registry lookups by @risu729 in
[jdx#9763](jdx#9763)
- **(task)** cache per-file content hashes for
source_freshness_hash_contents by @jdx in
[jdx#9819](jdx#9819)

### 🧪 Testing

- **(e2e)** pin aube to known-good version in npm package_manager test
by @jdx in [jdx#9794](jdx#9794)

### 📦 Registry

- replace unsupported exe options by @risu729 in
[jdx#9587](jdx#9587)
- update pi by @garysassano in
[jdx#9792](jdx#9792)

### Chore

- **(ci)** use non-large runners for release builds by @jdx in
[jdx#9786](jdx#9786)
- **(ci)** compare registry PRs from fork point by @risu729 in
[jdx#9643](jdx#9643)
- **(ci)** make build-copr.sh the single source of truth for COPR
chroots by @jdx in [jdx#9788](jdx#9788)
- **(ci)** use crates.io trusted publishing in release-plz by @jdx in
[jdx#9793](jdx#9793)
- **(ci)** remove autofix.ci workflow by @jdx in
[jdx#9801](jdx#9801)
- **(ci)** restore -large runner for Linux release builds by @jdx in
[jdx#9815](jdx#9815)
- **(ci)** add zizmor workflow for github actions security analysis by
@jdx in [jdx#9804](jdx#9804)
- **(ci)** assert mise run render produces no diff by @jdx in
[jdx#9803](jdx#9803)
- **(copr)** publish EL9 builds via centos-stream+epel-next-9 chroot by
@jdx in [jdx#9787](jdx#9787)

### Ci

- remove pull_request_target workflow by @jdx in
[jdx#9799](jdx#9799)
- remove caching from publishing workflows by @jdx in
[jdx#9800](jdx#9800)

### Security

- reject shell metacharacters in version strings and CI inputs by @jdx
in [jdx#9814](jdx#9814)

## 📦 Aqua Registry Updates

### New Packages (11)

- [`Code-Hex/Neo-cowsay`](https://github.com/Code-Hex/Neo-cowsay)
-
[`SonarSource/sonarqube-cli`](https://github.com/SonarSource/sonarqube-cli)
- [`earendil-works/pi`](https://github.com/earendil-works/pi)
- [`hylo-lang/hylo-new`](https://github.com/hylo-lang/hylo-new)
- [`jfernandez/bpftop`](https://github.com/jfernandez/bpftop)
- [`modem-dev/hunk`](https://github.com/modem-dev/hunk)
- [`npm/cli`](https://github.com/npm/cli)
- [`racket/racket/minimal`](https://github.com/racket/racket)
- [`slackapi/slack-cli`](https://github.com/slackapi/slack-cli)
- [`vectordotdev/vector`](https://github.com/vectordotdev/vector)
- [`wasilibs/go-yamllint`](https://github.com/wasilibs/go-yamllint)

### Updated Packages (10)

- [`DataDog/pup`](https://github.com/DataDog/pup)
- [`aquasecurity/trivy`](https://github.com/aquasecurity/trivy)
- [`astral-sh/uv`](https://github.com/astral-sh/uv)
- [`caarlos0/svu`](https://github.com/caarlos0/svu)
-
[`cargo-bins/cargo-binstall`](https://github.com/cargo-bins/cargo-binstall)
- [`foundry-rs/foundry`](https://github.com/foundry-rs/foundry)
- [`gastownhall/beads`](https://github.com/gastownhall/beads)
-
[`gruntwork-io/terragrunt`](https://github.com/gruntwork-io/terragrunt)
- [`pnpm/pnpm`](https://github.com/pnpm/pnpm)
- [`santosr2/TerraTidy`](https://github.com/santosr2/TerraTidy)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants