Skip to content

Conversation

@gridbugs
Copy link
Collaborator

@gridbugs gridbugs commented Oct 3, 2025

This allows the dependencies of the project to be shared with dev tools, preventing unnecessarily rebuilding dependencies while building a tool if an identical version of the package has already been built or vice versa.

This requires storing build artifacts of dependencies of dev tools and the project (when built in the "default" context) in a single directory, since their relative path within the _build directory is part of the key used to cache build artifacts. Previously the build artifacts of a package were placed in a directory named after that package, however since dev tools and the project may depend on different versions of the same package, directories needed to be renamed to avoid collisions. Even if the project and dev tools rely on the same version of some package, it may still need to be rebuilt on occasion due to the presence/absence of depopts, or due to manual modification to lockfiles. For this reason, the directories containing package build artifacts contain in their name, a hash of the contents of the lockfile of the metadata in the package's transitive dependency closure.

This change introduces the concept of a "package slug" which is a unique directory name for a package comprising its name, version, and dependency closure hash. The DB type used to store metadata while building packages has been updated to be keyed based on slugs, and to store package metadata for dev tools and the project's default build context in the same table to allow dependencies to be shared.

Fixes #12491

@rgrinberg
Copy link
Member

Do you mind rebasing? I'll do a more complete review then.

@gridbugs gridbugs force-pushed the share-build-artifacts-between-projec-and-dev-tools branch 2 times, most recently from 0838ad3 to f5ceafc Compare October 7, 2025 02:29
@gridbugs
Copy link
Collaborator Author

gridbugs commented Oct 7, 2025

I'm still in the process of updating tests, but I'm interested in feedback of my general approach.

@gridbugs gridbugs force-pushed the share-build-artifacts-between-projec-and-dev-tools branch 2 times, most recently from 95ef51a to 09f1449 Compare October 7, 2025 07:48
Copy link
Collaborator

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of making a package be represented by some kind of hashed representation seems like a quite good solution.

One could probably even go a step further and make the hash the only representation of the package in the .pkg folder.

There's a bit of nitpicking about names (I don't think introducing new exotic terms is a good idea), but the main concern from the code is that the way the summaries are computed seems to be in the wrong place as that leads to cycles which need to be broken and rather fragile-seeming code that does some side-effects to make it work.

[Digest_feed.compute_digest_with_hasher] because the hasher is a
singleton. *)
iter_all_versions_of_non_dune_dependencies (fun dep ->
let _dep_slug : Pkg_slug.t = Pkg.slug dep in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is odd and the comment does not make it clearer. Why is it calculating the summary only to discard it? If this is necessary I'd say that the hasher API needs to be revised to avoid having code that seemingly does nothing to trigger a side effect.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it calculating the summary only to discard it?

This is moot now anyway since this code has been removed, but here's why it was necessary if you're curious:

It was so that the lazy cell would be guaranteed to be populated when the information is used later on while computing a a Dune_digest.t. Computing a slug requires computing a Dune_digest.t, and while computing a Dune_digest.t we reuse a singleton "hasher" (for performance reasons) but this means that you can't nest the computation of a digest. This meant we had to be careful about the order in which these lazy computations were forced.

; ( "exported_env"
, Dyn.list (Action.Env_update.to_dyn String_with_vars.to_dyn) exported_env )
; "enabled_on_platforms", Solver_env_disjunction.to_dyn enabled_on_platforms
; "slug", Dyn.option Dyn.opaque slug
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it opaque here? Pkg_slug.to_dyn seems to exist and not have issues with the conversion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was because the slug field contained a lazy computation that computed the slug, and I didn't want to force the computation just to compute the Dyn.t.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This code is removed now.)

~human_readable_description:(fun () ->
Pp.text
"A map associating package slugs with package metadata with entries for all \
dev tools with lockdirs in the current project.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dev tools with lockdirs in the current project.")
dev tools with lock directories in the current project.")

then
(* Dev tools are built in the default context, so allow their
dependencies to be shared with the project's if it too is being
built in the default context. *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder, if the build is not in the default context, does it work when both the project and dependency depend on a package with the same slug?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each context has a different directory under _build/_private/ where package source and compiled artifacts are placed, and dune doesn't share dependencies across contexts. If the context is not the default context the dune won't share built artifacts with dev tools, even if the dependencies have the same slug. This is something we might consider changing in the future since package digests should make it possible to share dependencies across contexts. It would require restructuring the build directory further to remove context names from paths.

@gridbugs gridbugs force-pushed the share-build-artifacts-between-projec-and-dev-tools branch 2 times, most recently from 7dbe4dd to ab33cba Compare October 7, 2025 10:51
@gridbugs
Copy link
Collaborator Author

gridbugs commented Oct 7, 2025

Thanks for the review @Leonidas-from-XIV! I'll address your comments tomorrow, but regarding:

One could probably even go a step further and make the hash the only representation of the package in the .pkg folder.

That would certainly be possible however I think there's still value in keeping the package names and versions in the directory names for easier debugging.

Copy link
Member

@rgrinberg rgrinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of a "slug" is sound, but I think you need to change where and how it is computed. My suggestion is to compute in the rules after you've evaluated the lock file for the package. The slug should be computed from the evaluated representations of dependencies rather than their AST form. This will also you to avoid the various awkwardness around handling multi platform lock directories.

@gridbugs
Copy link
Collaborator Author

gridbugs commented Oct 8, 2025

I agree that it's awkward to have the package slug be a field in Lock_dir.Pkg.t, and it would certainly be easier to compute it if it were a field of the resolved package type. However after experimenting with computing slugs at the same time packages are resolved I think it's necessary that slugs be computed before resolving packages, since the slug tells dune which package needs to be resolved.

For example given the target _build/_private/default/.pkg/base.v0.17.3-76c96787e3996384a88ec21268e56f64/target/cookie, dune needs to resolve base.v0.17.3 from a (possibly manually edited) lockfile which hashes (along with its dependencies) to the hash the slug. In the current state of the PR the mapping from slug to the pertinent information from the corresponding lockfile is done with the Slug_table.t type which is added to DB.t and populated before resolving any packages.

I'm going to remove the slug field from Lock_dir.Pkg.t and instead compute slugs in Pkg_rules while building the slug table.

@gridbugs gridbugs force-pushed the share-build-artifacts-between-projec-and-dev-tools branch from ab33cba to e1c34a7 Compare October 8, 2025 06:23
@gridbugs
Copy link
Collaborator Author

gridbugs commented Oct 8, 2025

Ok I've moved the computation of slugs to pkg_rules and renamed them to Pkg_digest.t. Dune computes a table mapping a pkg digest to some package metadata as part of building the DB.t for pkg_rules prior to resolving a package. Since a slug is no longer part of a Lock_dir.Pkg.t all the awkwardness around deferring its computation is removed.

The only feedback I've applied thus far is moving the slug computation into pkg_rules and the consequences of this change. I'd appreciate another look from @rgrinberg and @Leonidas-from-XIV that the general structure is ok before I address the more fine-grained feedback and continue fixing the tests.

I'll also note that I've added a new command dune pkg print-slug (which I'm yet to rename) which prints the pkg digest given a package name. Such a mechanism is now needed to write tests that involve building packages since the target one must build to build a package now contains a package digest. I still need to update this command to work after moving the computation of digests into pkg_rules.

|> Option.value ~default:[]
|> List.filter_map
~f:(fun { Dune_pkg.Lock_dir.Dependency.name; loc = dep_loc } ->
if Package.Name.equal name Dune_pkg.Dune_dep.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you be removing all "provided" deps here?

Copy link
Member

@rgrinberg rgrinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from some minor issues

@gridbugs gridbugs force-pushed the share-build-artifacts-between-projec-and-dev-tools branch 2 times, most recently from f8fea2d to 689ce38 Compare October 9, 2025 01:08
{ pkg; deps; pkg_digest })
in
Package.Name.Map.values pkgs_by_name
|> List.map ~f:(compute_entry ~seen_set:Package.Name.Set.empty ~seen_list:[])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should should use Name.Map.map here and just return a map. The caller can transform the map into the digest map they need. That would also allow you to get rid of the linear search elsewhere.

@gridbugs gridbugs force-pushed the share-build-artifacts-between-projec-and-dev-tools branch 4 times, most recently from 7247eff to acbb8db Compare October 10, 2025 05:19
@gridbugs
Copy link
Collaborator Author

The remaining test failures are because some tests create lockfiles containing a path inside the test sandbox, which means their package digests will be influenced by the sandbox path and thus different on different machines. Many test currently print the path to a package's contents which includes the digest. I'm investigating how best to solve this and I'm open to suggestions.

@Alizter
Copy link
Collaborator

Alizter commented Oct 10, 2025

@gridbugs I would sanitise the hashes so they aren't shown. We do this in multiple places especially for things that look like digests.

@gridbugs
Copy link
Collaborator Author

Good idea @Alizter. I'll try that.

@gridbugs gridbugs force-pushed the share-build-artifacts-between-projec-and-dev-tools branch from acbb8db to 95050e8 Compare October 10, 2025 07:17
This allows the dependencies of the project to be shared with dev tools,
preventing unnecessarily rebuilding dependencies while building a tool
if an identical version of the package has already been built or vice
versa.

This requires storing build artifacts of dependencies of dev tools and
the project (when built in the "default" context) in a single directory,
since their relative path within the _build directory is part of the key
used to cache build artifacts. Previously the build artifacts of a
package were placed in a directory named after that package, however
since dev tools and the project may depend on different versions of the
same package, directories needed to be renamed to avoid collisions. Even
if the project and dev tools rely on the same version of some package,
it may still need to be rebuilt on occasion due to the presence/absence
of depopts, or due to manual modification to lockfiles. For this reason,
the directories containing package build artifacts contain in their
name, a hash of the contents of the lockfile of the metadata in the
package's transitive dependency closure.

This change introduces the concept of a "package digest" which is a
unique directory name for a package comprising its name, version, and
dependency closure hash. The DB type used to store metadata while
building packages has been updated to be keyed based on package digests,
and to store package metadata for dev tools and the project's default
build context in the same table to allow dependencies to be shared.

Signed-off-by: Stephen Sherratt <[email protected]>
@gridbugs gridbugs force-pushed the share-build-artifacts-between-projec-and-dev-tools branch from 95050e8 to 7047ce0 Compare October 10, 2025 08:01
@gridbugs
Copy link
Collaborator Author

The ocaml 5.4.0 test is failing on other PRs so I doubt it's affected by this change. Merging now.

@gridbugs gridbugs merged commit d889d07 into ocaml:main Oct 10, 2025
25 of 26 checks passed
davesnx added a commit to davesnx/dune that referenced this pull request Oct 14, 2025
* 'main' of github.com:/ocaml/dune: (147 commits)
  cram test: test only parameter flags in merlin generation
  fix(oxcaml): import eta-expansion changes from opam-repo (ocaml#12563)
  address review comments
  Mask the path to the stdlib
  fix(oxcaml): generate merlin config for library parameters
  fix(melange + include_qualified): track correct `.cmj` dependencies in emit (ocaml#12531)
  refactor: remove some unused code in [Path] (ocaml#12558)
  dep_rules: don't run (transitive) `ocamldep` on single module buildables (ocaml#12555)
  fix(pkg): ignore project settings for building packages
  test(pkg): reproduce ocaml#12131
  melange: add a test for module cycle checks (ocaml#12554)
  chore: lint check for new changes entries (ocaml#12553)
  feature(cram): allow for conflict detection (ocaml#12538)
  ci: update for ocaml 5.4 release (ocaml#12552)
  chore(script): generate changelog from structure (ocaml#12516)
  Reuse dependencies between project and tools (ocaml#12526)
  Introduce Io.overwrite_file
  test: fix dune install requiring a mandir
  Enable package management for more tests
  Add a `dune tools env` command to add dev tools to PATH (ocaml#12521)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Installing the ocamllsp dev tool rebuilds the compiler unnecessarily

4 participants