fix(task): prevent hang when nested tasks fail by jdx · Pull Request #6430 · jdx/mise

jdx · 2025-09-26T10:22:26Z

Summary

Fixes task execution hanging when nested or dependent tasks fail.

Problem

When a task failed within a sequence of tasks (especially nested or dependent tasks), mise would hang indefinitely instead of properly exiting with an error. This was reported in #6391.

Root Cause

The inject_and_wait function was waiting for a completion signal from the dependency graph that would never come. When a task failed:

The scheduler would stop spawning new dependent tasks
The dependency graph would never become empty (dependent tasks remained in the graph)
The done signal was never sent because the graph wasn't empty
The function would hang forever waiting for the signal

Solution

Modified the inject_and_wait function to:

Periodically check if task execution should stop due to failure (every 100ms)
When a failure is detected and continue_on_error is false, clean up the dependency graph by removing all remaining tasks
This triggers the completion signal and allows the function to exit properly with an error

Test plan

Added e2e test test_task_failure_hang that reproduces the issue
Test verifies that mise exits with error code 1 instead of hanging when a dependency fails
All existing task tests pass

Fixes: #6391

🤖 Generated with Claude Code

When a task failed within a sequence of tasks (especially nested or dependent tasks), mise would hang indefinitely instead of properly exiting with an error. This was caused by the dependency graph never becoming empty when dependent tasks were not executed due to an earlier failure. The fix modifies inject_and_wait() to periodically check for failures and clean up the dependency graph when stopping early, ensuring proper termination. Fixes: #6391 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull Request Overview

Fixes a hanging issue where mise would wait indefinitely when nested or dependent tasks fail, instead of properly exiting with an error. The fix implements periodic checking for task failures and proper cleanup of the dependency graph.

Replaced blocking wait with a polling loop that checks for early failure conditions every 100ms
Added dependency graph cleanup when failures occur and continue_on_error is false
Added e2e test to verify the fix prevents hanging behavior

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/cli/run.rs	Modified `inject_and_wait` function to use polling with timeout instead of blocking wait, added failure detection and dependency graph cleanup
e2e/tasks/test_task_failure_hang	Added new e2e test that verifies mise exits properly when dependent tasks fail instead of hanging

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-26T10:23:28Z

src/cli/run.rs

+                let tasks_to_remove: Vec<Task> = deps.all().cloned().collect();
+                for task in tasks_to_remove {
+                    deps.remove(&task);
+                }
+                drop(deps);


This cleanup code holds the lock while collecting all tasks and then iterates through them. Consider clearing the dependency graph more efficiently with a single operation like deps.clear() if available, or release the lock between collection and removal to reduce lock contention.

Suggested change

let tasks_to_remove: Vec<Task> = deps.all().cloned().collect();

for task in tasks_to_remove {

deps.remove(&task);

}

drop(deps);

deps.clear();

// If deps.clear() is not available, fallback to the original method:

// let tasks_to_remove: Vec<Task> = deps.all().cloned().collect();

// drop(deps);

// let mut deps = sub_deps.lock().await;

// for task in tasks_to_remove {

// deps.remove(&task);

// }

// drop(deps);

Copilot · 2025-09-26T10:23:29Z

src/cli/run.rs

+                }
+                drop(deps);
+                // Give a short time for the spawned task to finish cleanly
+                let _ = tokio::time::timeout(Duration::from_millis(100), done_rx).await;


The timeout duration (100ms) is hardcoded and appears twice in this function. Consider extracting this as a named constant to improve maintainability and consistency.

github-actions · 2025-09-26T10:36:33Z

Hyperfine Performance

`mise x -- echo`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`mise-2025.9.19 x -- echo`	20.6 ± 0.3	19.7	22.9	1.00
`mise x -- echo`	20.9 ± 0.3	20.1	22.1	1.02 ± 0.02

`mise env`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`mise-2025.9.19 env`	19.8 ± 0.5	19.2	26.6	1.00
`mise env`	20.3 ± 0.6	19.4	26.9	1.02 ± 0.04

`mise hook-env`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`mise-2025.9.19 hook-env`	19.5 ± 0.2	18.9	20.2	1.00
`mise hook-env`	20.0 ± 0.3	19.1	21.2	1.02 ± 0.02

`mise ls`

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`mise-2025.9.19 ls`	17.4 ± 0.3	16.8	18.4	1.00
`mise ls`	17.8 ± 0.3	17.1	18.9	1.02 ± 0.02

`xtasks/test/perf`

Command	mise-2025.9.19	mise	Variance
install (cached)	171ms	✅ 105ms	+62%
ls (cached)	65ms	65ms	+0%
bin-paths (cached)	71ms	71ms	+0%
task-ls (cached)	504ms	500ms	+0%

✅ Performance improvement: install cached is 62%

Extends the fix from jdx#6430 to handle the main scheduler loop. When a task failure occurs and continue_on_error is false, the scheduler now properly cleans up the main dependency graph before exiting. ## Problem The previous fix in jdx#6430 addressed hangs in nested task sequences (inject_and_wait), but didn't handle the main scheduler loop. When multiple tasks depended on a common failing dependency, the scheduler would stop accepting new work but wouldn't clean up the dependency graph, causing it to wait indefinitely for a completion signal that would never arrive. ## Root Cause In the main scheduler loop (lines 413-492): 1. When a dependency fails, is_stopping() becomes true 2. The scheduler breaks out of the inner drain loop 3. However, remaining tasks stay in main_deps 4. The spawned async task waits for deps to be empty before signaling 5. This creates a deadlock - scheduler stopped but deps not cleared ## Solution Added cleanup logic similar to inject_and_wait: when is_stopping() is detected, explicitly remove all remaining tasks from main_deps. This triggers the completion signal and allows proper shutdown. Fixes the hang demonstrated in test_task_failure_hang_depends 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…6481) ## Summary Fixes task execution hanging when a task depends on multiple tasks that share a common failing dependency. Extends the fix from #6430 to handle the main scheduler loop. ## Problem When multiple tasks depend on a common dependency that fails (e.g., `lint`, `test`, and `build:docker` all depend on `install`, which fails), and another task depends on all of them (e.g., `check` depends on all three), the scheduler would hang indefinitely instead of properly exiting with an error. This scenario wasn't covered by the previous fix in #6430, which only handled nested task sequences within `inject_and_wait`. ## Root Cause In the main scheduler loop: 1. When a dependency fails, `is_stopping()` becomes true 2. The scheduler stops accepting new tasks from the queue 3. However, remaining tasks stay in the `main_deps` dependency graph 4. The spawned async task waits for `main_deps` to become empty before sending the `main_done` signal 5. This creates a deadlock - the scheduler has stopped but the deps graph is never cleared ## Solution Added cleanup logic similar to `inject_and_wait`: when `is_stopping()` is detected in the main scheduler loop, explicitly remove all remaining tasks from `main_deps`. This triggers the completion signal and allows proper shutdown. ## Test plan - Added e2e test `test_task_failure_hang_depends` that reproduces the complex dependency scenario - Test verifies that mise exits with error code 1 instead of hanging when dependencies fail - Existing test `test_task_failure_hang` continues to pass - All task-related e2e tests pass Fixes: #6391 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com> --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: jdx <216188+jdx@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings September 26, 2025 10:22

Copilot AI reviewed Sep 26, 2025

View reviewed changes

jdx enabled auto-merge (squash) September 26, 2025 10:23

jdx disabled auto-merge September 26, 2025 10:23

jdx enabled auto-merge (squash) September 26, 2025 10:28

jdx merged commit 5024a12 into main Sep 26, 2025
19 checks passed

jdx deleted the fix/task-hang-on-failure branch September 26, 2025 10:33

jdx mentioned this pull request Sep 26, 2025

chore: release 2025.9.20 #6419

Merged

BrewTestBot mentioned this pull request Sep 26, 2025

mise 2025.9.20 Homebrew/homebrew-core#245922

Merged

jdx mentioned this pull request Sep 30, 2025

fix(task): prevent hang when tasks with multiple dependencies fail #6481

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(task): prevent hang when nested tasks fail#6430

fix(task): prevent hang when nested tasks fail#6430
jdx merged 1 commit intomainfrom
fix/task-hang-on-failure

jdx commented Sep 26, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 26, 2025

Uh oh!

Copilot AI Sep 26, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                let tasks_to_remove: Vec<Task> = deps.all().cloned().collect();
-                for task in tasks_to_remove {
-                    deps.remove(&task);
-                }
-                drop(deps);
+                deps.clear();
+                // If deps.clear() is not available, fallback to the original method:
+                // let tasks_to_remove: Vec<Task> = deps.all().cloned().collect();
+                // drop(deps);
+                // let mut deps = sub_deps.lock().await;
+                // for task in tasks_to_remove {
+                //     deps.remove(&task);
+                // }
+                // drop(deps);

Uh oh!

Conversation

jdx commented Sep 26, 2025

Summary

Problem

Root Cause

Solution

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 26, 2025

Hyperfine Performance

mise x -- echo

mise env

mise hook-env

mise ls

xtasks/test/perf

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`mise x -- echo`

`mise env`

`mise hook-env`

`mise ls`

`xtasks/test/perf`