fix(task): prevent hang when nested tasks fail#6430
Conversation
When a task failed within a sequence of tasks (especially nested or dependent tasks), mise would hang indefinitely instead of properly exiting with an error. This was caused by the dependency graph never becoming empty when dependent tasks were not executed due to an earlier failure. The fix modifies inject_and_wait() to periodically check for failures and clean up the dependency graph when stopping early, ensuring proper termination. Fixes: #6391 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull Request Overview
Fixes a hanging issue where mise would wait indefinitely when nested or dependent tasks fail, instead of properly exiting with an error. The fix implements periodic checking for task failures and proper cleanup of the dependency graph.
- Replaced blocking wait with a polling loop that checks for early failure conditions every 100ms
- Added dependency graph cleanup when failures occur and
continue_on_erroris false - Added e2e test to verify the fix prevents hanging behavior
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/cli/run.rs | Modified inject_and_wait function to use polling with timeout instead of blocking wait, added failure detection and dependency graph cleanup |
| e2e/tasks/test_task_failure_hang | Added new e2e test that verifies mise exits properly when dependent tasks fail instead of hanging |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| let tasks_to_remove: Vec<Task> = deps.all().cloned().collect(); | ||
| for task in tasks_to_remove { | ||
| deps.remove(&task); | ||
| } | ||
| drop(deps); |
There was a problem hiding this comment.
This cleanup code holds the lock while collecting all tasks and then iterates through them. Consider clearing the dependency graph more efficiently with a single operation like deps.clear() if available, or release the lock between collection and removal to reduce lock contention.
| let tasks_to_remove: Vec<Task> = deps.all().cloned().collect(); | |
| for task in tasks_to_remove { | |
| deps.remove(&task); | |
| } | |
| drop(deps); | |
| deps.clear(); | |
| // If deps.clear() is not available, fallback to the original method: | |
| // let tasks_to_remove: Vec<Task> = deps.all().cloned().collect(); | |
| // drop(deps); | |
| // let mut deps = sub_deps.lock().await; | |
| // for task in tasks_to_remove { | |
| // deps.remove(&task); | |
| // } | |
| // drop(deps); |
| } | ||
| drop(deps); | ||
| // Give a short time for the spawned task to finish cleanly | ||
| let _ = tokio::time::timeout(Duration::from_millis(100), done_rx).await; |
There was a problem hiding this comment.
The timeout duration (100ms) is hardcoded and appears twice in this function. Consider extracting this as a named constant to improve maintainability and consistency.
Hyperfine Performance
|
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2025.9.19 x -- echo |
20.6 ± 0.3 | 19.7 | 22.9 | 1.00 |
mise x -- echo |
20.9 ± 0.3 | 20.1 | 22.1 | 1.02 ± 0.02 |
mise env
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2025.9.19 env |
19.8 ± 0.5 | 19.2 | 26.6 | 1.00 |
mise env |
20.3 ± 0.6 | 19.4 | 26.9 | 1.02 ± 0.04 |
mise hook-env
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2025.9.19 hook-env |
19.5 ± 0.2 | 18.9 | 20.2 | 1.00 |
mise hook-env |
20.0 ± 0.3 | 19.1 | 21.2 | 1.02 ± 0.02 |
mise ls
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
mise-2025.9.19 ls |
17.4 ± 0.3 | 16.8 | 18.4 | 1.00 |
mise ls |
17.8 ± 0.3 | 17.1 | 18.9 | 1.02 ± 0.02 |
xtasks/test/perf
| Command | mise-2025.9.19 | mise | Variance |
|---|---|---|---|
| install (cached) | 171ms | ✅ 105ms | +62% |
| ls (cached) | 65ms | 65ms | +0% |
| bin-paths (cached) | 71ms | 71ms | +0% |
| task-ls (cached) | 504ms | 500ms | +0% |
✅ Performance improvement: install cached is 62%
Extends the fix from jdx#6430 to handle the main scheduler loop. When a task failure occurs and continue_on_error is false, the scheduler now properly cleans up the main dependency graph before exiting. ## Problem The previous fix in jdx#6430 addressed hangs in nested task sequences (inject_and_wait), but didn't handle the main scheduler loop. When multiple tasks depended on a common failing dependency, the scheduler would stop accepting new work but wouldn't clean up the dependency graph, causing it to wait indefinitely for a completion signal that would never arrive. ## Root Cause In the main scheduler loop (lines 413-492): 1. When a dependency fails, is_stopping() becomes true 2. The scheduler breaks out of the inner drain loop 3. However, remaining tasks stay in main_deps 4. The spawned async task waits for deps to be empty before signaling 5. This creates a deadlock - scheduler stopped but deps not cleared ## Solution Added cleanup logic similar to inject_and_wait: when is_stopping() is detected, explicitly remove all remaining tasks from main_deps. This triggers the completion signal and allows proper shutdown. Fixes the hang demonstrated in test_task_failure_hang_depends 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…6481) ## Summary Fixes task execution hanging when a task depends on multiple tasks that share a common failing dependency. Extends the fix from #6430 to handle the main scheduler loop. ## Problem When multiple tasks depend on a common dependency that fails (e.g., `lint`, `test`, and `build:docker` all depend on `install`, which fails), and another task depends on all of them (e.g., `check` depends on all three), the scheduler would hang indefinitely instead of properly exiting with an error. This scenario wasn't covered by the previous fix in #6430, which only handled nested task sequences within `inject_and_wait`. ## Root Cause In the main scheduler loop: 1. When a dependency fails, `is_stopping()` becomes true 2. The scheduler stops accepting new tasks from the queue 3. However, remaining tasks stay in the `main_deps` dependency graph 4. The spawned async task waits for `main_deps` to become empty before sending the `main_done` signal 5. This creates a deadlock - the scheduler has stopped but the deps graph is never cleared ## Solution Added cleanup logic similar to `inject_and_wait`: when `is_stopping()` is detected in the main scheduler loop, explicitly remove all remaining tasks from `main_deps`. This triggers the completion signal and allows proper shutdown. ## Test plan - Added e2e test `test_task_failure_hang_depends` that reproduces the complex dependency scenario - Test verifies that mise exits with error code 1 instead of hanging when dependencies fail - Existing test `test_task_failure_hang` continues to pass - All task-related e2e tests pass Fixes: #6391 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com> --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: jdx <216188+jdx@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>
Summary
Fixes task execution hanging when nested or dependent tasks fail.
Problem
When a task failed within a sequence of tasks (especially nested or dependent tasks), mise would hang indefinitely instead of properly exiting with an error. This was reported in #6391.
Root Cause
The
inject_and_waitfunction was waiting for a completion signal from the dependency graph that would never come. When a task failed:donesignal was never sent because the graph wasn't emptySolution
Modified the
inject_and_waitfunction to:continue_on_erroris false, clean up the dependency graph by removing all remaining tasksTest plan
test_task_failure_hangthat reproduces the issueFixes: #6391
🤖 Generated with Claude Code