Skip to content

fix: resolve console output glitches with progress display#2193

Merged
thomhurst merged 6 commits intomainfrom
fix/console-output-glitches-2162
Jan 18, 2026
Merged

fix: resolve console output glitches with progress display#2193
thomhurst merged 6 commits intomainfrom
fix/console-output-glitches-2162

Conversation

@thomhurst
Copy link
Owner

Summary

Fixes #2162 - Console output glitches where progress rendering interferes with module output.

Issues Fixed:

  • Progress ctx.Refresh() could interleave with module output causing text corruption
  • Spectre.Console markup appeared raw ([bold cyan]?[/]) instead of rendered
  • Module/Activity info showed as (Module: Unknown, Activity: (null))
  • Timeout in progress loop could leave render lock unreleased causing deadlock

Changes:

  • Add _renderLock SemaphoreSlim for mutual exclusion between progress rendering and output writes
  • Use Interlocked.Exchange for atomic lock ownership tracking (_renderLockHeldByPause)
  • Add _isFlushingOutput flag to bypass buffering when writing module output directly
  • Fix ResumeAsync() to always release render lock even after progress loop timeout
  • Simplify file operation log messages by removing Module/Activity info

Test plan

  • Run pipeline with verbose output and verify no text corruption
  • Verify Spectre.Console markup renders properly (colors, bold, etc.)
  • Verify CI group commands (::group::) appear correctly
  • Test with multiple modules completing simultaneously
  • Verify no deadlocks when output takes longer than 60 seconds

- Add render lock synchronization to prevent progress refresh during output writes
- Use atomic flag tracking to safely coordinate lock ownership between pause/resume
- Add flushing output flag to bypass buffering when writing module output
- Fix ResumeAsync to always release render lock even after timeout
- Simplify file operation log messages by removing Module/Activity info

Fixes race conditions where:
- Progress ctx.Refresh() could interleave with module output
- Spectre.Console markup appeared raw ([bold cyan]?[/]) instead of rendered
- Timeout in progress loop could leave render lock unreleased
@thomhurst
Copy link
Owner Author

Summary

Fixes console output glitches by adding render lock synchronization between progress updates and module output, plus a bypass flag to prevent buffering during output flushes.

Critical Issues

1. Timeout path does not release render lock or signal pause completion

In ProgressSession.cs:130-148, when the 60-second timeout occurs, the code clears _isPaused and _resumeSignal, but does NOT:

  • Clear _pauseComplete - causing any waiting PauseAsync() callers to hang indefinitely on line 241
  • Release _renderLock that was acquired by PauseAsync() at line 248 - causing a permanent deadlock

The timeout handling code must be updated to match the cleanup logic in ResumeAsync():

if (completedTask == timeoutTask)
{
    // Timeout - force resume to prevent stuck state
    _logger.LogWarning(/* ... */);

    await _pauseLock.WaitAsync().ConfigureAwait(false);
    try
    {
        _isPaused = false;
        _resumeSignal?.TrySetResult();
        _resumeSignal = null;
        _pauseComplete?.TrySetResult();  // ADD THIS
    }
    finally
    {
        _pauseLock.Release();
    }
    
    // ADD THIS - Release render lock if we held it
    if (Interlocked.Exchange(ref _renderLockHeldByPause, 0) == 1)
    {
        _renderLock.Release();
    }
}

2. Race condition in DisposeAsync

In ProgressSession.cs:458-465, DisposeAsync() releases the render lock if held, then disposes the semaphores. However, there's a race: if ResumeAsync() executes concurrently after the Interlocked.Exchange check but before disposal, it could call Release() on a disposed semaphore.

The semaphore disposal should happen inside a lock or after ensuring no concurrent operations are possible:

public async ValueTask DisposeAsync()
{
    // First ensure we're not paused
    await _pauseLock.WaitAsync().ConfigureAwait(false);
    try
    {
        // Atomically check and clear - only release if we held the lock
        if (Interlocked.Exchange(ref _renderLockHeldByPause, 0) == 1)
        {
            _renderLock.Release();
        }
    }
    finally
    {
        _pauseLock.Release();
    }

    // ... rest of disposal logic ...
    
    // NOW safe to dispose semaphores (no concurrent access possible)
    _pauseLock.Dispose();
    _renderLock.Dispose();
}

Suggestions

None - the overall synchronization design is sound. The _isFlushingOutput bypass flag is an elegant solution.

Verdict

⚠️ REQUEST CHANGES - Critical concurrency bugs in timeout handling and disposal paths

@thomhurst thomhurst enabled auto-merge January 18, 2026 12:38
Replace SemaphoreSlim-based synchronization with a cleaner signal-based
design that cannot deadlock:

- Remove _renderLock and _pauseLock semaphores
- Use simple object lock (_pauseStateLock) held only briefly for state changes
- Track _inRefresh flag to know if progress loop is in ctx.Refresh()
- PauseAsync waits for _refreshCompleted signal (not a lock)
- Progress loop signals _refreshCompleted after ctx.Refresh() completes

This design ensures:
- Lock is never held during slow operations (ctx.Refresh())
- No blocking on locks that might be held by slow terminal I/O
- Clear signal-based handoff between pause request and completion
@thomhurst
Copy link
Owner Author

Summary

Fixes console output glitches by adding coordinated synchronization between progress rendering and module output, using signal-based coordination to prevent text corruption and markup rendering issues.

Critical Issues

None found ✅

Suggestions

1. Consider edge case in DisposeAsync with concurrent PauseAsync

In ProgressSession.cs:353-357, the DisposeAsync method signals resume under lock, but there's a potential race if PauseAsync is called concurrently between the lock release and the actual disposal:

lock (_pauseStateLock)
{
    _isPaused = false;
    _resumeSignal?.TrySetResult();
}
// <-- PauseAsync could create new _resumeSignal here
// Then awaiting would hang indefinitely on the new signal

While unlikely in practice (DisposeAsync should be final), you could make it more defensive by also clearing the signal:

lock (_pauseStateLock)
{
    _isPaused = false;
    _resumeSignal?.TrySetResult();
    _resumeSignal = null;  // Add this
    _refreshCompleted = null;  // Add this
}

2. Minor: Timeout handling doesn't clear _inRefresh flag

In the timeout path (ProgressSession.cs:234-244), when forcing resume after 60 seconds, the code clears pause state but doesn't reset _inRefresh. While this is likely harmless since the timeout indicates the progress loop may be stuck, you could add _inRefresh = false; for consistency at line 243.

Previous Review Status

The previous review identified critical issues with render lock handling and semaphore cleanup. This has been completely refactored to use a lock-based signal coordination approach, which eliminates those issues. The new implementation properly:

  • Uses TaskCompletionSource signals instead of holding locks during slow operations
  • Avoids potential deadlocks from timeout scenarios
  • Properly coordinates pause/resume/refresh states

Verdict

APPROVE - The refactored synchronization design is sound. The suggestions are minor defensive improvements, not blocking issues.

The root cause of the CI hang was a blocking sync-over-async call in
ModuleLogger.Dispose(). When the DI container disposed scoped services,
it called Dispose() synchronously which used .GetAwaiter().GetResult()
to block on the async EnqueueAndFlushAsync() method.

Changes:
- Make ModuleLogger implement IAsyncDisposable
- Move async flush logic to DisposeAsync() which is called by the
  DI container when using AsyncServiceScope
- Make sync Dispose() non-blocking (just marks as disposed)
- Remove finalizer since Dispose() no longer blocks

The Microsoft DI container prefers DisposeAsync() when a service
implements IAsyncDisposable, so the normal disposal path will now
be fully async with no blocking calls.
@thomhurst
Copy link
Owner Author

Summary

This PR fixes console output glitches by introducing signal-based coordination between progress rendering and module output, replacing the previous render lock approach with a more robust state machine using TaskCompletionSource signals.

Critical Issues

None found ✅

Suggestions

1. Minor edge case in DisposeAsync with concurrent operations

In ProgressSession.cs:353-357, while unlikely, there's a theoretical race if PauseAsync is called concurrently during disposal between releasing the lock and actual task completion:

lock (_pauseStateLock)
{
    _isPaused = false;
    _resumeSignal?.TrySetResult();
    // Consider adding:
    // _resumeSignal = null;
    // _refreshCompleted = null;
}

This would make the cleanup more defensive, though disposal should be the final operation.

2. Timeout handling could reset _inRefresh for consistency

In the timeout path (around line 234-244), when forcing resume after 60 seconds, consider adding _inRefresh = false; for state consistency, though this is likely harmless since timeout indicates the progress loop may be stuck.

Previous Review Status

The PR has already been reviewed and approved by thomhurst. The previous review identified critical concurrency issues with the initial render lock implementation, which were addressed by refactoring to the current signal-based coordination approach. The new design properly:

  • Uses TaskCompletionSource signals instead of holding locks during slow operations
  • Avoids potential deadlocks from timeout scenarios
  • Properly coordinates pause/resume/refresh states

Verdict

APPROVE - The refactored synchronization design is sound and addresses the console glitching issues effectively. The suggestions above are minor defensive improvements, not blocking issues.

- Add Microsoft.Testing.Extensions.HangDump to all test projects
- Configure 20-minute hang dump timeout in test execution
- Upload hang dump artifacts in GitHub Actions workflow
@thomhurst
Copy link
Owner Author

Summary

This PR fixes console output glitches (#2162) by improving synchronization between progress rendering and module output through better pause/resume coordination and proper async disposal.

Critical Issues

1. Potential deadlock in ProgressSession.PauseAsync() - Race condition with double-wait

In src/ModularPipelines/Console/ProgressSession.cs:305-345, there's a subtle race condition:

lock (_pauseStateLock)
{
    if (_isPaused)
    {
        // Already paused - if there's an in-flight refresh, wait for it
        waitForRefresh = _refreshCompleted;  // ⚠️ Could be from previous pause
    }
    else
    {
        _isPaused = true;
        _resumeSignal = new TaskCompletionSource(...);
        
        if (_inRefresh)
        {
            _refreshCompleted = new TaskCompletionSource(...);
            waitForRefresh = _refreshCompleted;
        }
    }
}

if (waitForRefresh != null)
{
    await waitForRefresh.Task.ConfigureAwait(false);  // ⚠️ Could wait on stale TCS
}

Problem: If PauseAsync() is called while already paused, it captures _refreshCompleted from the previous pause operation. If that TCS was never signaled (e.g., timeout occurred), the second caller will wait forever on a stale TaskCompletionSource.

Fix: Only wait on _refreshCompleted if we just created it in this call:

lock (_pauseStateLock)
{
    if (_isPaused)
    {
        return;  // Already paused, nothing to do
    }
    
    _isPaused = true;
    _resumeSignal = new TaskCompletionSource(...);
    
    if (_inRefresh)
    {
        _refreshCompleted = new TaskCompletionSource(...);
        waitForRefresh = _refreshCompleted;
    }
}

2. Missing volatile field or memory barrier for _isFlushingOutput

In src/ModularPipelines/Console/ConsoleCoordinator.cs:40, _isFlushingOutput is used for cross-thread coordination but isn't marked volatile:

private volatile bool _isFlushingOutput;  // ⚠️ Missing 'volatile'

It's read in CoordinatedTextWriter (different thread) via the lambda at line 53/60 and written in OutputCoordinator.ProcessQueueAsync(). Without volatile or proper barriers, writes from one thread may not be visible to readers on other threads due to CPU cache coherency.

Fix: Mark the field as volatile:

private volatile bool _isFlushingOutput;

3. Missing null check for _consoleCoordinator injection

In src/ModularPipelines/Console/OutputCoordinator.cs:162-177, the new _consoleCoordinator field is injected via constructor but never null-checked. If DI fails to provide it, you'll get NullReferenceException at lines 185/192.

Fix: Add null check or ArgumentNullException in constructor:

_consoleCoordinator = consoleCoordinator ?? throw new ArgumentNullException(nameof(consoleCoordinator));

Suggestions

1. Consider using ConfigureAwait(false) consistently in disposal path

In src/ModularPipelines/Logging/ModuleLogger.cs:933-1005, the new DisposeAsync() uses ConfigureAwait(false) at line 985, but this is in a disposal path where you're already detached from any user context. Consider documenting why this is necessary or if it's just defensive.

2. Timeout behavior discrepancy

The old code (line 161-178 removed) used Task.Delay + Task.WhenAny for timeout, while new code (line 984) uses CancellationTokenSource with timeout. This is cleaner, but the timeout exception handling differs:

  • Old: Logs warning and continues
  • New: Catches OperationCanceledException and logs warning

This is fine, but be aware the exception type changed from timeout to cancellation.

Previous Review Status

Unable to fetch previous comments (GitHub API permissions).

Verdict

⚠️ REQUEST CHANGES - Critical issues found:

  1. Race condition in PauseAsync with stale TaskCompletionSource
  2. Missing volatile on _isFlushingOutput
  3. Missing null check for injected _consoleCoordinator

- Fix race condition in PauseAsync: return early if already paused to
  avoid waiting on stale TaskCompletionSource from previous pause
- Add ArgumentNullException checks in OutputCoordinator constructor
- Reset _inRefresh in timeout path for state consistency
- Clear signals in DisposeAsync to prevent concurrent PauseAsync hanging
@thomhurst
Copy link
Owner Author

Summary

This PR fixes console output glitches (#2162) by coordinating synchronization between progress rendering and module output through the _isFlushingOutput flag, refactoring pause/resume logic to use signal-based coordination, implementing proper async disposal, and adding hang dump support for debugging.

Critical Issues

1. Missing volatile modifier on _isFlushingOutput field

In src/ModularPipelines/Console/ConsoleCoordinator.cs:40, the new _isFlushingOutput field is used for cross-thread communication but lacks the volatile modifier:

private volatile bool _isFlushingOutput;  // Line 40 - missing 'volatile'

This field is:

  • Written in OutputCoordinator.ProcessQueueAsync() (lines 188, 195 of the diff)
  • Read in CoordinatedTextWriter via the lambda passed at lines 53/60

Without volatile or memory barriers, writes from the OutputCoordinator thread may not be visible to reader threads due to CPU cache coherency, causing the bypass mechanism to fail intermittently.

Fix: Mark the field as volatile:

private volatile bool _isFlushingOutput;

2. Missing null guard for _consoleCoordinator in OutputCoordinator

In src/ModularPipelines/Console/OutputCoordinator.cs:177-180, the constructor adds null checks for other parameters but the diff shows _consoleCoordinator is added without protection:

_consoleCoordinator = consoleCoordinator ?? throw new ArgumentNullException(nameof(consoleCoordinator));

If this line is missing from the actual code, you'll get NullReferenceException at lines 188/195 when calling SetFlushingOutput.

Verification needed: Confirm the constructor includes this null check as shown in the diff.

Suggestions

1. Consider timeout handling consistency

The ModuleLogger.DisposeAsync() refactoring (lines 933-1005) changes from Task.WhenAny + Task.Delay to CancellationTokenSource with timeout. While cleaner, the exception type changes from timeout to OperationCanceledException. This is fine but worth documenting for anyone debugging timeout scenarios.

2. File operation log simplification

Removing Module/Activity context from file operation logs (File.cs and Folder.cs changes) significantly cleans up the messages. However, if diagnosing which module performed a file operation becomes necessary, this information is now lost. Consider whether Activity context should be preserved in structured logging properties rather than the message template.

Verdict

⚠️ REQUEST CHANGES - Critical issue #1 (missing volatile) must be fixed to ensure correct cross-thread visibility. Verify issue #2 is already addressed in the actual code.

… OutputCoordinator

Move IsFlushing state ownership from ConsoleCoordinator to OutputCoordinator,
eliminating the circular dependency between these components. OutputCoordinator
now owns the flushing state since it controls when flushing occurs.

- Add IsFlushing property to IOutputCoordinator interface
- OutputCoordinator manages _isFlushingOutput internally
- Remove SetFlushingOutput from IConsoleCoordinator
- ConsoleCoordinator reads IsFlushing from IOutputCoordinator
@thomhurst
Copy link
Owner Author

Summary

Fixes console output glitches by improving synchronization between progress rendering and module output flushing, adds hang dump collection, and removes Activity/Module context from file operation logs.

Critical Issues

1. Finalizer removed but not from base class - CRITICAL BUG

Location: src/ModularPipelines/Logging/ModuleLogger.cs:894-910

The PR removes the finalizer from ModuleLogger but the base class ModuleLogger still has a finalizer at line 94-97 that calls Dispose(disposing: false). However, the derived class no longer has the Dispose(bool disposing) method - it was replaced with separate Dispose() and DisposeAsync() methods.

Problem: The finalizer in the base class will still be called, but there is no Dispose(bool) method to handle it. This could lead to resource leaks or unexpected behavior.

Fix: Remove the finalizer from the base class as well, or ensure proper finalization support with the new async disposal design.

2. PR description does not match the actual changes

The PR description mentions adding _renderLock SemaphoreSlim and using Interlocked.Exchange for _renderLockHeldByPause, but I don't see these in the diff. Instead, the changes use _pauseStateLock, _inRefresh, and _refreshCompleted.

Question: Did you push the wrong branch or update the PR description for a different version?

Suggestions

  1. Hang dump timeout of 20 minutes seems excessive - consider 5-10 minutes
  2. Verify ConfigureAwait(false) is used consistently in all async calls

Verdict

REQUEST CHANGES - Critical issue #1 must be fixed. Issue #2 needs clarification.

@thomhurst thomhurst merged commit 42b4da4 into main Jan 18, 2026
12 checks passed
@thomhurst thomhurst deleted the fix/console-output-glitches-2162 branch January 18, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Glitches in the logger output

1 participant