Skip to content

Conversation

@joelreymont
Copy link

@joelreymont joelreymont commented Nov 14, 2025

PR #12730: Concurrent Builds

Summary

This PR removes the global build_system_mutex that currently serializes all build requests in Dune. The implementation allows multiple build sessions to execute concurrently.

Problem

Dune currently uses a global mutex in bin/import.ml:

let build_system_mutex = Fiber.Mutex.create ()

let build f =
  Fiber.Mutex.with_lock build_system_mutex ~f:(fun () ->
    Build_system.run f)

This causes RPC requests to wait when watch mode holds the build lock. Multiple RPC clients serialize on this lock.

Implementation

New Modules

Build_id (src/dune_engine/build_id.ml):

  • Unique identifiers for build sessions
  • Map and Set modules for tracking

Build_session (src/dune_engine/build_session.ml):

  • Per-build state isolation
  • Error accumulation
  • Finalize hooks

Build_coordinator (src/dune_engine/build_coordinator.ml):

  • Manages concurrent build sessions
  • Global progress tracking
  • Centralized promotion queue with deduplication
  • Reference-counted pending target tracking

Modified Code

bin/import.ml: Replaced global mutex with Build_coordinator.submit.

diff_promotion.ml: Uses coordinator's promotion queue when available, falls back to global state otherwise.

build_system.ml: Uses coordinator's pending targets when available, falls back to global set otherwise.

Correctness

Existing mechanisms maintain correctness:

  • Path locks: Prevent concurrent file writes
  • Memo system: Provides first-write-wins semantics for rule execution
  • Atomic file operations: Unchanged

Testing

Build verification: make dev completes without errors.

Limitation: Cram tests execute sequentially and cannot demonstrate actual concurrent execution. Manual testing required for watch mode + RPC scenarios.

Status

Implementation complete. All identified race conditions addressed. Backward compatible with existing code.

Build_session provides per-build isolated state for progress tracking,
error accumulation, pending targets, and finalize hooks.

Build_coordinator manages concurrent build sessions with unique Build_id
assignment and lifecycle tracking.

Signed-off-by: Joel Reymont <[email protected]>
Build_system.run now accepts Build_session.t parameter and stores it in
fiber-local storage. Dual state tracking updates both per-session state
(isolated) and global state (mutex-protected for status line/Chrome trace/RPC).

Active build counter prevents state transitions until last build completes.
Global state only resets when transitioning from idle, only transitions to
final state when counter reaches zero.

Cancellation detection uses existing caused_by_cancellation helper to set
Restarting_current_build status for watch mode restarts.

bin/import.ml routes all builds through Build_coordinator.submit.

Signed-off-by: Joel Reymont <[email protected]>
Promotion databases are keyed by Build_id for per-session isolation.
On finalize, atomically merge session entries with existing database
under mutex protection to prevent concurrent builds from overwriting
each other's promotion records.

All promotion entries are preserved, not just the last build.

Signed-off-by: Joel Reymont <[email protected]>
Tests verify that multiple concurrent builds maintain isolation and
correctness without race conditions or state corruption.

Signed-off-by: Joel Reymont <[email protected]>
Tests verify RPC builds during watch mode builds, concurrent promotions,
and watch cancellation with active RPC builds all maintain correctness.

Signed-off-by: Joel Reymont <[email protected]>
@shonfeder
Copy link
Member

Thank you for the contribution. Could you please review our contributing guidelines for non-trivial submissions? https://github.com/ocaml/dune/blob/main/CONTRIBUTING.md#developing-dune

Thanks!

@joelreymont
Copy link
Author

Shon,

That was the first thing I read.

I will create an issue, mea culpa!

As for the design discussion, it was easier to start it accompanied by a reference implementation.

@joelreymont
Copy link
Author

Issue #12731 is the proposal.

Consider this PR a fully working and tested reference implementation!

@shonfeder
Copy link
Member

shonfeder commented Nov 14, 2025

Thanks! We'll triage review of the design and issue, and keep this in mind as the reference implementation/proposal. As a heads up, we have a good deal of work in flight and planned around the RPC system, so we'll need to consider how the proposed solution will integrate with that stuff.

@shonfeder shonfeder marked this pull request as draft November 14, 2025 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants