feat(firmware): wait for application readiness after PIC32 reconnect (closes #145) by cptkoolbeenz · Pull Request #200 · daqifi/daqifi-core

cptkoolbeenz · 2026-05-12T05:12:17Z

Summary

Adds an opt-in readiness probe so a PIC32 firmware update doesn't transition to Complete until the device is actually ready to answer normal application commands. Serial transport re-enumeration succeeds well before the application firmware is up; without this wait, downstream flows (LAN chip-info queries, WiFi prep) hit a half-started device — exactly the pattern desktop had to work around with its own retry loop.

API additions

FirmwareUpdateServiceOptions gains 3 new properties:

PostReconnectReadinessProbe — Func<IStreamingDevice, CancellationToken, Task<bool>>?. Returns true when the application is responsive. Null disables the wait (legacy behavior preserved).
PostReconnectReadinessTimeout (default 30s) — wall-clock budget for the wait.
PostReconnectReadinessRetryDelay (default 500ms) — delay between probe attempts.

When the timeout elapses without the probe returning true, the update transitions to Failed with a TimeoutException wrapped in FirmwareUpdateException — NOT a silent Complete on a half-ready device, which is the entire point of #145.

Test plan

Probe succeeds on attempt N>1 holds back Complete until ready (assert Complete state + probe call count = 3)
Timeout raises a properly wrapped FirmwareUpdateException with FailedState=JumpingToApp and the readiness keyword in the inner message
Null probe preserves legacy fast-complete path (belt-and-suspenders)
Build clean on net9.0 + net10.0
Full suite: 893/895 (2 skipped require live hardware)

Closes #145

…loses #145) Adds an opt-in readiness probe to FirmwareUpdateService so a PIC32 firmware update doesn't transition to Complete until the device is actually ready to answer normal application commands. The serial transport re-enumerates well before the application firmware is up; without this wait, downstream flows (LAN chip-info queries, WiFi prep) hit a half-started device and either fail or have to reimplement their own retry loop in the calling app — exactly the pattern desktop had to work around. API additions on FirmwareUpdateServiceOptions: - PostReconnectReadinessProbe — Func<IStreamingDevice, CancellationToken, Task<bool>>?. Returns true when the application is responsive. Null disables the wait (legacy behavior). - PostReconnectReadinessTimeout (default 30s) — wall-clock budget - PostReconnectReadinessRetryDelay (default 500ms) — between probe attempts When the timeout elapses without the probe returning true, the update transitions to Failed with a clear TimeoutException wrapped in FirmwareUpdateException — NOT a silent Complete on a half-ready device. Test plan: 3 new tests cover (a) probe succeeds on attempt N >1 holding back Complete until ready, (b) timeout raises a properly wrapped FirmwareUpdateException with FailedState=JumpingToApp and the readiness keyword in the inner message, (c) null probe preserves legacy fast-complete path. Full suite 893/895 (2 skipped require live hardware).

cptkoolbeenz · 2026-05-12T05:12:26Z

/improve

cptkoolbeenz · 2026-05-12T05:12:27Z

/agentic_review

qodo-code-review · 2026-05-12T05:12:33Z

Review Summary by Qodo

Add post-reconnect application readiness probe for PIC32 firmware updates

✨ Enhancement

Walkthroughs

Description

• Add optional post-reconnect readiness probe to FirmwareUpdateService
• Probe waits for PIC32 application firmware readiness before completing update
• Timeout transitions update to Failed state instead of silent completion
• Preserve legacy behavior when probe is null (opt-in feature)

Diagram

flowchart LR
  A["Serial Reconnect"] --> B{"Readiness Probe<br/>Configured?"}
  B -->|Yes| C["Poll Probe<br/>with Retry"]
  B -->|No| D["Complete<br/>Legacy Path"]
  C -->|Success| E["Complete"]
  C -->|Timeout| F["Failed<br/>with TimeoutException"]

File Changes

1. src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs ✨ Enhancement +32/-0

Add readiness probe configuration options

• Add PostReconnectReadinessProbe property: optional callback returning true when device is ready
• Add PostReconnectReadinessTimeout property with 30s default timeout
• Add PostReconnectReadinessRetryDelay property with 500ms default retry delay
• Add validation for new timeout and retry delay properties

src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs

2. src/Daqifi.Core/Firmware/FirmwareUpdateService.cs ✨ Enhancement +80/-0

Implement application readiness wait logic

• Invoke readiness probe after serial reconnect if configured
• Implement WaitForApplicationReadyAsync method with retry loop and timeout handling
• Poll probe with configurable delay between attempts
• Throw TimeoutException with descriptive message when probe never returns true
• Handle probe exceptions gracefully by treating as not-ready and retrying
• Log debug messages for probe attempts and success

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs

3. src/Daqifi.Core.Tests/Firmware/FirmwareUpdateServiceTests.cs 🧪 Tests +157/-0

Add three comprehensive readiness probe tests

• Add test verifying probe succeeds on attempt N>1 and holds back Complete state
• Add test verifying timeout fails update with proper exception wrapping
• Add test verifying null probe preserves legacy fast-complete behavior
• All tests validate probe call counts, state transitions, and exception details

src/Daqifi.Core.Tests/Firmware/FirmwareUpdateServiceTests.cs

qodo-code-review · 2026-05-12T05:12:33Z

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (1)

1. Readiness probe is optional 📎 Requirement gap ☼ Reliability

Description

FirmwareUpdateService only waits for application readiness when PostReconnectReadinessProbe is
set; otherwise it proceeds immediately after serial reconnect, which can still return a half-started
device. This keeps the failure mode the checklist is trying to eliminate and can still force
downstream integrations to add their own readiness polling.

Code

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[R817-820]

+        if (_options.PostReconnectReadinessProbe is { } probe)
+        {
+            await WaitForApplicationReadyAsync(device, probe, cancellationToken).ConfigureAwait(false);
+        }

Evidence
The checklist requires the flow to confirm application readiness after PIC32 reconnect and to remove
the need for downstream status-wait loops. The new code explicitly makes the readiness check opt-in
(if (_options.PostReconnectReadinessProbe is { } probe)), and the options explicitly
document/encode null as the legacy fast-complete path, meaning reconnect can still be treated as
complete without readiness verification.
Wait for application readiness after PIC32 reconnect in FirmwareUpdateService
Downstream callers should not need custom status-wait loops for firmware update reliability
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[810-820]
src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs[124-140]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The post-reconnect readiness wait is currently opt-in (`PostReconnectReadinessProbe` is nullable and gated), so the firmware update flow can still proceed immediately after reconnect without confirming the application is ready.

## Issue Context
Checklist IDs 1 and 2 require Core’s firmware update lifecycle to handle post-reconnect application readiness so reconnect completion isn’t based solely on transport reopening, and downstream callers don’t need their own readiness loops.

## Fix Focus Areas
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[810-820]
- src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs[124-150]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Non-monotonic elapsed logging 🐞 Bug ◔ Observability

Description

WaitForApplicationReadyAsync computes the logged readiness wait duration using DateTime.UtcNow, so
if the system clock is adjusted during the wait, the emitted {Elapsed} can jump or be inaccurate,
confusing troubleshooting/telemetry. This does not break the timeout logic (CTS handles that), but
it can produce misleading diagnostics for slow-boot scenarios.

Code

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[R892-896]

+                    var elapsed = DateTime.UtcNow - waitStart;
+                    _logger.LogInformation(
+                        "Device became application-ready after {Elapsed} on probe attempt {Attempt}.",
+                        elapsed,
+                        probesExecuted);

Evidence
The new code uses wall-clock timestamps to compute an elapsed duration for logging; elsewhere in the
repo, elapsed durations are measured with Stopwatch, indicating the preferred monotonic approach for
timing/elapsed reporting.
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[839-840]
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[892-896]
src/Daqifi.Core/Firmware/ProcessExternalProcessRunner.cs[30-33]
src/Daqifi.Core/Device/DaqifiStreamingDevice.cs[639-640]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`WaitForApplicationReadyAsync` measures elapsed time for logging via `DateTime.UtcNow`, which is wall-clock time and not monotonic. If NTP/VM clock corrections occur during the wait, the logged `Elapsed` can be misleading.

### Issue Context
Timeout enforcement is already handled via `CancellationTokenSource(totalTimeout)`; the improvement is specifically for *diagnostic accuracy* in the Information log emitted when readiness is reached.

### Fix Focus Areas
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[839-840]
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[892-896]

### Suggested change
- Replace `var waitStart = DateTime.UtcNow;` with a monotonic timer, e.g. `var sw = System.Diagnostics.Stopwatch.StartNew();`
- Replace `var elapsed = DateTime.UtcNow - waitStart;` with `var elapsed = sw.Elapsed;`
- (Optional) add `using System.Diagnostics;` at the top, or keep the fully-qualified `System.Diagnostics.Stopwatch` usage.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ~~Stale probe exception kept~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

WaitForApplicationReadyAsync retains lastProbeException from an earlier thrown probe attempt even
after later probes successfully run and return false, so a final TimeoutException can incorrectly
include an unrelated InnerException and mislead debugging/handlers inspecting InnerException.

Code

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[R870-885]

+                var isReady = await probe(device, linkedToken)
+                    .WaitAsync(linkedToken)
+                    .ConfigureAwait(false);
+
+                if (isReady)
+                {
+                    if (probesExecuted > 1)
+                    {
+                        _logger.LogDebug(
+                            "Device became application-ready on probe attempt {Attempt}.",
+                            probesExecuted);
+                    }
+                    return;
+                }
+                _logger.LogDebug("Application-ready probe returned false on attempt {Attempt}; will retry.", probesExecuted);
+            }

Evidence
The code only sets lastProbeException when the probe throws/cancels, but it is used as the
TimeoutException inner exception in multiple timeout paths; since it is never cleared on a
successful probe returning false, a later timeout can incorrectly surface an earlier exception that
is no longer representative of the most recent probe outcomes.
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[842-860]
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[864-885]
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[903-916]
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[924-928]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`WaitForApplicationReadyAsync` stores the most recent probe-thrown exception in `lastProbeException` and then attaches it as `InnerException` when timing out. However, when a probe invocation *succeeds* (i.e., returns `false`), the code does not clear `lastProbeException`. This can cause a timeout after many later `false` results to still report an earlier, unrelated exception as the timeout’s `InnerException`.

### Issue Context
This impacts the new post-reconnect readiness probe loop and makes timeout failures misleading (and potentially mis-handled) when callers look at `TimeoutException.InnerException`.

### Fix
Clear `lastProbeException` after a successful probe invocation (both `true` and `false` paths), or alternatively clear it immediately before each probe attempt and only set it in the catch blocks.

### Fix Focus Areas
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[864-885]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. ~~Misleading probe-timeout message~~ ✓ Resolved 🐞 Bug ◔ Observability

Description

On readiness timeout while awaiting the probe, the thrown TimeoutException claims the probe “was
canceled by the timeout while running,” but the code comments explicitly note the probe may ignore
cancellation (meaning only the await is canceled). This can mislead troubleshooting by implying the
probe cooperated with cancellation when it may still be executing in the background.

Code

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[R887-888]

+                    "The readiness probe was canceled by the timeout while running.",
+                    lastProbeException);

Evidence
The comment documents that WaitAsync(linkedToken) is used so the caller stops waiting even if the
probe ignores cancellation, but the timeout message states the probe itself was canceled, which is
not guaranteed by this mechanism.
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[862-869]
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[885-888]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`WaitForApplicationReadyAsync` throws a `TimeoutException` whose message says the readiness probe "was canceled by the timeout while running." However, the surrounding code/commentary makes it clear the probe may ignore cancellation and continue running; the timeout is enforced by canceling the *wait*, not necessarily the probe itself.

This message should be reworded to avoid stating that the probe itself was canceled.

### Issue Context
The code uses `WaitAsync(linkedToken)` specifically to stop awaiting even if the probe ignores the cancellation token.

### Fix Focus Areas
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[883-888]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

View more (3)

5. ~~Transport-specific readiness timeout text~~ ✓ Resolved 📎 Requirement gap ⚙ Maintainability

Description

The new TimeoutException messages in WaitForApplicationReadyAsync hardcode serial/PIC32 wording
(e.g., "serial transport reopened" / "after PIC32 reconnect"), which is misleading for non-serial
transports and undermines a transport-agnostic readiness design.

Code

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[R856-857]

+                    "The serial transport reopened but the readiness probe never returned true; the device may still be initializing or the firmware may have failed to start.",
+                    lastProbeException);

Evidence
PR Compliance ID 3 requires a transport-agnostic readiness design; the new timeout messages
explicitly reference serial transport and PIC32 reconnect, which couples the readiness path’s
diagnostics to a specific transport/device.
Maintain transport-agnostic readiness verification design
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[856-857]
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[906-907]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The post-reconnect readiness timeout exceptions include transport- and device-specific phrasing (serial/PIC32). This weakens the transport-agnostic readiness concept and can confuse callers on other transports.

## Issue Context
Compliance requires the readiness mechanism to remain transport-agnostic where possible. Exception text is part of the public surface (diagnostics) and should avoid implying a specific transport.

## Fix Focus Areas
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[856-857]
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[906-907]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

6. ~~Probe errors lose root cause~~ ✓ Resolved 🐞 Bug ◔ Observability

Description

WaitForApplicationReadyAsync swallows all non-cancellation exceptions from the caller-provided probe
and continues retrying, so repeated deterministic probe failures can end up reported only as a
TimeoutException without retaining the underlying exception that caused the probe to fail. If Debug
logs are off, the real error context is effectively lost.

Code

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[R890-896]

+            catch (Exception ex) when (ex is not OperationCanceledException)
+            {
+                _logger.LogDebug(
+                    ex,
+                    "Application-ready probe threw on attempt {Attempt}; treating as not-ready and retrying.",
+                    attempt);
+            }

Evidence
The probe exception is explicitly caught and treated as not-ready; later timeout exceptions are
thrown without carrying forward that exception context.
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[851-896]
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[846-849]
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[884-889]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`WaitForApplicationReadyAsync` catches all non-`OperationCanceledException` exceptions from `probe(...)`, logs them at Debug, and retries. On eventual timeout, the thrown `TimeoutException` does not include the last probe exception as an inner exception or in its message.

This can make failures hard to diagnose in environments where Debug logs are not collected.

### Issue Context
This is specifically about improving diagnostics without changing the retry semantics (still treat probe failures as "not ready").

### Fix Focus Areas
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[823-907]

### Suggested fix
Track the most recent probe exception (e.g., `Exception? lastProbeException`) when catching exceptions from the probe. When throwing `TimeoutException` due to readiness timeout (in all timeout branches), include `lastProbeException` as the inner exception (or append a short summary to the message).

Optionally, consider logging the first probe exception at `LogWarning` (once) and subsequent ones at `LogDebug` to avoid losing signal while still preventing log spam.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

7. ~~Conflicting timeout budgets~~ ✓ Resolved 🐞 Bug ☼ Reliability

Description

The readiness probe enforces its own PostReconnectReadinessTimeout, but the whole reconnect step is
also bounded by JumpingToApplicationTimeout via ExecuteWithStateTimeoutAsync. If
JumpingToApplicationTimeout is shorter (or mostly consumed by serial reconnect), the probe can be
aborted by the outer state-timeout and callers will see a generic JumpingToApp timeout instead of
the readiness-specific timeout message.

Code

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[R828-834]

+        var totalTimeout = _options.PostReconnectReadinessTimeout;
+        var retryDelay = _options.PostReconnectReadinessRetryDelay;
+
+        using var timeoutCts = new CancellationTokenSource(totalTimeout);
+        using var linkedCts = CancellationTokenSource.CreateLinkedTokenSource(
+            cancellationToken, timeoutCts.Token);
+        var linkedToken = linkedCts.Token;

Evidence
The JumpToApp step is wrapped by a state-level timeout, while the new readiness probe adds another
independent timeout; whichever expires first determines the surfaced failure mode/message.
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[294-302]
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[799-907]
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[973-1037]
src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs[55-59]
src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs[141-150]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Two independent timeouts bound the same step:
- outer state timeout: `JumpingToApplicationTimeout` (via `ExecuteWithStateTimeoutAsync(JumpingToApp, ...)`)
- inner probe timeout: `PostReconnectReadinessTimeout` (via `new CancellationTokenSource(totalTimeout)`)

If the outer timeout is shorter than (or effectively shorter than) the inner timeout, the readiness probe can be cut off by the state timeout first, yielding a generic state-timeout error and masking the readiness-specific failure.

### Issue Context
The new probe timeout is created inside `WaitForApplicationReadyAsync`, but `JumpToApplicationAndReconnectAsync` is invoked under the JumpingToApp state timeout.

### Fix Focus Areas
- src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs[172-231]
- src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs[55-59]
- src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs[141-150]

### Suggested fix
In `FirmwareUpdateServiceOptions.Validate()` add a guard when `PostReconnectReadinessProbe != null` to ensure `JumpingToApplicationTimeout` is strictly greater than `PostReconnectReadinessTimeout` (and ideally includes additional headroom for the serial reconnect portion, e.g. `PollInterval` or a small fixed buffer). Throw a clear `ArgumentOutOfRangeException` describing the required relationship so misconfiguration fails fast with an actionable message.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

8. ~~Misleading probe attempt number~~ ✓ Resolved 🐞 Bug ◔ Observability

Description

The new TimeoutException messages include "(attempt {attempt})", but attempt is incremented before
the readiness probe runs, so a timeout during the initial cancellation check can report attempt 1
even though 0 probes executed. This can mislead debugging/telemetry when diagnosing readiness
timeouts.

Code

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[R855-856]

+                    $"Device did not become application-ready within {totalTimeout} (attempt {attempt}). " +
+                    "The transport reconnected but the readiness probe never returned true; the device may still be initializing or the firmware may have failed to start.",

Evidence

The timeout message uses the loop variable attempt while the loop increments attempt before
checking cancellation and before running the probe, so the reported attempt can be ahead of the
number of probe executions in timeout-at-start scenarios.

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[844-858]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`WaitForApplicationReadyAsync` includes an "attempt" counter in TimeoutException messages, but the counter is incremented before the probe is actually executed. In the edge case where the timeout has already fired (or fires right before the first probe), the error message can claim “attempt 1” even though no probe ran, which is misleading for troubleshooting.

### Issue Context
The timeout message now interpolates `attempt` (e.g., `(... attempt {attempt})`). The loop increments `attempt` at the top, and only later invokes the probe.

### Fix Focus Areas
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[844-858]

### Suggested fix
Track *probes executed* separately (increment immediately before/after invoking the probe) and report that number in the timeout messages, or move the attempt increment to just before the probe call so the reported attempt always corresponds to a probe execution.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

9. ~~Readiness wait not surfaced~~ ✓ Resolved 🐞 Bug ◔ Observability

Description

When PostReconnectReadinessProbe is enabled, long waits (up to 30s by default) occur inside
JumpingToApp without any progress/state notification beyond Debug logs, making the update appear
hung to observers relying on progress events. This reduces operational visibility during the new
wait phase.

Code

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[R817-820]

+        if (_options.PostReconnectReadinessProbe is { } probe)
+        {
+            await WaitForApplicationReadyAsync(device, probe, cancellationToken).ConfigureAwait(false);
+        }

Evidence
The JumpingToApp progress is reported before calling JumpToApplicationAndReconnectAsync, but the
readiness wait adds additional wall-clock time without additional progress reporting or info-level
logging.
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[294-304]
src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[799-907]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The readiness-wait loop can take significant time, but it does not emit any info-level log or progress update indicating that the service is intentionally waiting for application readiness.

### Issue Context
`TransitionToState` does not emit a `StateChanged` event when the state stays the same, and `WaitForApplicationReadyAsync` does not call `ReportProgress`, so observers may not see any visible signal during the wait.

### Fix Focus Areas
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[799-821]
- src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[823-907]

### Suggested fix
Add a one-time `LogInformation` (or `LogDebug` if preferred, but once) when entering the readiness wait and when it succeeds (including elapsed time / attempt count). If you want this to be visible to progress listeners, consider plumbing `progress` into `JumpToApplicationAndReconnectAsync` so it can `ReportProgress` with an updated `CurrentOperation` like "Waiting for application readiness…".

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-05-12T05:12:34Z

Code Review by Qodo

New Review Started

This review has been superseded by a new analysis

qodo-code-review · 2026-05-12T05:13:22Z

PR Code Suggestions ✨

Latest suggestions up to 26b19fe

Warning

/improve is deprecated. Use /agentic_review instead (removal date not yet scheduled).

Category	Suggestion	Impact
General	Use monotonic elapsed timing Replace `DateTime.UtcNow` with `Stopwatch` for elapsed time calculations. src/Daqifi.Core/Firmware/FirmwareUpdateService.cs [34-93] -var waitStart = DateTime.UtcNow; +var waitStopwatch = Stopwatch.StartNew(); ... if (isReady) { - var elapsed = DateTime.UtcNow - waitStart; + var elapsed = waitStopwatch.Elapsed; _logger.LogInformation( "Device became application-ready after {Elapsed} on probe attempt {Attempt}.", elapsed, probesExecuted); return; } `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 5 __ Why: Using `Stopwatch` is a best practice for elapsed time measurement as it avoids inaccuracies and potential negative durations caused by system clock adjustments.	Low
Possible issue	Preserve correct timeout causality Do not include the `OperationCanceledException` as an inner exception for `TimeoutException` when the wait deadline fires. src/Daqifi.Core/Firmware/FirmwareUpdateService.cs [96-118] catch (OperationCanceledException ex) when (!cancellationToken.IsCancellationRequested) { // Two cases reach here: // 1. The wait deadline fired (timeoutCts canceled) — surface // as TimeoutException so callers see the readiness budget. // 2. The probe itself threw OperationCanceledException for // some unrelated reason (its own internal CTS, etc). That // must NOT crash the update loop — treat it as a probe // failure and retry, same as any other thrown exception. if (timeoutCts.IsCancellationRequested) { throw new TimeoutException( $"Device did not become application-ready within {totalTimeout} (probes executed: {probesExecuted}). " + "The wait for the readiness probe was canceled by the timeout — note the probe may ignore cancellation and continue running in the background.", - lastProbeException ?? ex); + lastProbeException); } lastProbeException = ex; _logger.LogDebug( ex, "Application-ready probe was canceled on attempt {Attempt}; treating as not-ready and retrying.", probesExecuted); } `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 5 __ Why: Omitting the cancellation exception as the inner exception of the `TimeoutException` provides cleaner exception chaining and ensures consistency with similar timeout handling at the beginning of the loop.	Low
Update

Previous suggestions

Suggestions up to commit ea82fd1

Category	Suggestion	Impact
General	Fix probe-attempt accounting Modify the probe attempt counter to increment only after the probe task is established. Use a local variable to update the counter just before awaiting the task. src/Daqifi.Core/Firmware/FirmwareUpdateService.cs [42-67] var probesExecuted = 0; while (true) { try { linkedToken.ThrowIfCancellationRequested(); } catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested && !cancellationToken.IsCancellationRequested) { throw new TimeoutException( $"Device did not become application-ready within {totalTimeout} (probes executed: {probesExecuted}). " + "The transport reconnected but the readiness probe never returned true; the device may still be initializing or the firmware may have failed to start.", lastProbeException); } + var attempt = probesExecuted + 1; try { - probesExecuted++; + var probeTask = probe(device, linkedToken); + probesExecuted = attempt; + // WaitAsync(linkedToken) enforces the timeout deadline even // when the probe ignores its own CancellationToken and would // otherwise hang or return after the budget elapses. When // the deadline fires, WaitAsync throws OperationCanceledException // immediately — we don't keep waiting for the rogue probe. - var isReady = await probe(device, linkedToken) + var isReady = await probeTask .WaitAsync(linkedToken) .ConfigureAwait(false); Suggestion importance[1-10]: 1 __ Why: The C# `await` mechanism synchronously evaluates the `probe()` call before yielding, so there is no gap where cancellation could bypass the invocation, making this change unnecessary.	Low
Possible issue	Prevent busy-loop on zero delay Add a defensive check to clamp negative or zero retry delays to a minimum of 1 millisecond. This prevents a busy-loop scenario if the delay validation was somehow skipped. src/Daqifi.Core/Firmware/FirmwareUpdateService.cs [113-123] try { - await Task.Delay(retryDelay, linkedToken).ConfigureAwait(false); + var delay = retryDelay <= TimeSpan.Zero ? TimeSpan.FromMilliseconds(1) : retryDelay; + await Task.Delay(delay, linkedToken).ConfigureAwait(false); } catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested && !cancellationToken.IsCancellationRequested) { throw new TimeoutException( $"Device did not become application-ready within {totalTimeout} (probes executed: {probesExecuted}). " + "The transport reconnected but the readiness probe never returned true; the device may still be initializing or the firmware may have failed to start.", lastProbeException); } Suggestion importance[1-10]: 1 __ Why: The PR already adds explicit validation via `ValidatePositive` in `FirmwareUpdateServiceOptions.Validate` to ensure the retry delay is positive, making defensive clamping in the consumer redundant and an anti-pattern.	Low

✅ Suggestions up to commit 5ac58c0

Category Suggestion Impact

Possible issue

✅ ~~Make validations truly opt-in~~

Suggestion Impact:

The commit wrapped the readiness timeout/retry-delay positive validations and the JumpingToApplicationTimeout cross-check inside an `if (PostReconnectReadinessProbe is not null)` block, making readiness validations opt-in when the probe is configured.

code diff:

-        ValidatePositive(PostReconnectReadinessTimeout, nameof(PostReconnectReadinessTimeout));
-        ValidatePositive(PostReconnectReadinessRetryDelay, nameof(PostReconnectReadinessRetryDelay));
-
-        // The whole JumpToApp step is wrapped by JumpingToApplicationTimeout,
-        // so a probe budget that meets or exceeds it would always be cut off
-        // by the outer state-timeout — surfacing a generic JumpingToApp error
-        // instead of the readiness-specific message. Reject the configuration
-        // up front so misconfigurations fail fast.
-        if (PostReconnectReadinessProbe is not null &&
-            PostReconnectReadinessTimeout >= JumpingToApplicationTimeout)
-        {
-            throw new ArgumentOutOfRangeException(
-                nameof(PostReconnectReadinessTimeout),
-                PostReconnectReadinessTimeout,
-                $"{nameof(PostReconnectReadinessTimeout)} must be strictly less than {nameof(JumpingToApplicationTimeout)} when {nameof(PostReconnectReadinessProbe)} is set, " +
-                "otherwise the outer state-timeout fires first and masks the readiness-specific timeout.");
+
+        // The readiness options only matter when a probe is configured —
+        // gate every readiness validation behind that, both the positive
+        // checks and the cross-property constraint, so callers that don't
+        // use the probe never have to think about these timeouts.
+        if (PostReconnectReadinessProbe is not null)
+        {
+            ValidatePositive(PostReconnectReadinessTimeout, nameof(PostReconnectReadinessTimeout));
+            ValidatePositive(PostReconnectReadinessRetryDelay, nameof(PostReconnectReadinessRetryDelay));
+
+            // The whole JumpToApp step is wrapped by JumpingToApplicationTimeout,
+            // so a probe budget that meets or exceeds it would always be cut off
+            // by the outer state-timeout — surfacing a generic JumpingToApp error
+            // instead of the readiness-specific message. Reject the configuration
+            // up front so misconfigurations fail fast.
+            if (PostReconnectReadinessTimeout >= JumpingToApplicationTimeout)
+            {
+                throw new ArgumentOutOfRangeException(
+                    nameof(PostReconnectReadinessTimeout),
+                    PostReconnectReadinessTimeout,
+                    $"{nameof(PostReconnectReadinessTimeout)} must be strictly less than {nameof(JumpingToApplicationTimeout)} when {nameof(PostReconnectReadinessProbe)} is set, " +
+                    "otherwise the outer state-timeout fires first and masks the readiness-specific timeout.");
+            }
         }

Gate the validation of PostReconnectReadinessTimeout and
PostReconnectReadinessRetryDelay behind a check for whether the readiness probe
is configured.

src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs [52-68]

-ValidatePositive(PostReconnectReadinessTimeout, nameof(PostReconnectReadinessTimeout));
-ValidatePositive(PostReconnectReadinessRetryDelay, nameof(PostReconnectReadinessRetryDelay));
+if (PostReconnectReadinessProbe is not null)
+{
+    ValidatePositive(PostReconnectReadinessTimeout, nameof(PostReconnectReadinessTimeout));
+    ValidatePositive(PostReconnectReadinessRetryDelay, nameof(PostReconnectReadinessRetryDelay));
 
-// The whole JumpToApp step is wrapped by JumpingToApplicationTimeout,
-// so a probe budget that meets or exceeds it would always be cut off
-// by the outer state-timeout — surfacing a generic JumpingToApp error
-// instead of the readiness-specific message. Reject the configuration
-// up front so misconfigurations fail fast.
-if (PostReconnectReadinessProbe is not null &&
-    PostReconnectReadinessTimeout >= JumpingToApplicationTimeout)
-{
-    throw new ArgumentOutOfRangeException(
-        nameof(PostReconnectReadinessTimeout),
-        PostReconnectReadinessTimeout,
-        $"{nameof(PostReconnectReadinessTimeout)} must be strictly less than {nameof(JumpingToApplicationTimeout)} when {nameof(PostReconnectReadinessProbe)} is set, " +
-        "otherwise the outer state-timeout fires first and masks the readiness-specific timeout.");
+    // The whole JumpToApp step is wrapped by JumpingToApplicationTimeout,
+    // so a probe budget that meets or exceeds it would always be cut off
+    // by the outer state-timeout — surfacing a generic JumpingToApp error
+    // instead of the readiness-specific message. Reject the configuration
+    // up front so misconfigurations fail fast.
+    if (PostReconnectReadinessTimeout >= JumpingToApplicationTimeout)
+    {
+        throw new ArgumentOutOfRangeException(
+            nameof(PostReconnectReadinessTimeout),
+            PostReconnectReadinessTimeout,
+            $"{nameof(PostReconnectReadinessTimeout)} must be strictly less than {nameof(JumpingToApplicationTimeout)} when {nameof(PostReconnectReadinessProbe)} is set, " +
+            "otherwise the outer state-timeout fires first and masks the readiness-specific timeout.");
+    }
 }

Suggestion importance[1-10]: 5

__

Why: While newly added properties won't automatically break existing callers, it is good practice to only validate options when the related feature (the probe) is actually enabled.

Low

Suggestions up to commit 5ac58c0

Category Suggestion Impact

Possible issue

Track attempts and observe probe tasks

Declare the probe task as a local variable before awaiting it to allow observing
potential unobserved task exceptions.

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs [59-67]

+var probeTask = probe(device, linkedToken);
 probesExecuted++;
+
 // WaitAsync(linkedToken) enforces the timeout deadline even
 // when the probe ignores its own CancellationToken and would
 // otherwise hang or return after the budget elapses. When
 // the deadline fires, WaitAsync throws OperationCanceledException
 // immediately — we don't keep waiting for the rogue probe.
-var isReady = await probe(device, linkedToken)
+var isReady = await probeTask
     .WaitAsync(linkedToken)
     .ConfigureAwait(false);

Suggestion importance[1-10]: 2

__

Why: The suggestion separates the task declaration but fails to observe the exception itself, and any variable declared here would be out of scope in the subsequent catch block.

Low

✅ Suggestions up to commit 881f65a

Category Suggestion Impact

Possible issue

✅ ~~Handle probe cancellations safely~~

Suggestion Impact:

Updated the OperationCanceledException catch to capture non-caller cancellations, convert timeout-triggered cancellations into TimeoutException, and treat probe-thrown cancellations as retryable failures with logging.

code diff:

-            catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested && !cancellationToken.IsCancellationRequested)
-            {
-                throw new TimeoutException(
-                    $"Device did not become application-ready within {totalTimeout} (attempt {attempt}). " +
-                    "The wait for the readiness probe was canceled by the timeout — note the probe may ignore cancellation and continue running in the background.",
-                    lastProbeException);
+            catch (OperationCanceledException ex) when (!cancellationToken.IsCancellationRequested)
+            {
+                // Two cases reach here:
+                // 1. The wait deadline fired (timeoutCts canceled) — surface
+                //    as TimeoutException so callers see the readiness budget.
+                // 2. The probe itself threw OperationCanceledException for
+                //    some unrelated reason (its own internal CTS, etc). That
+                //    must NOT crash the update loop — treat it as a probe
+                //    failure and retry, same as any other thrown exception.
+                if (timeoutCts.IsCancellationRequested)
+                {
+                    throw new TimeoutException(
+                        $"Device did not become application-ready within {totalTimeout} (attempt {attempt}). " +
+                        "The wait for the readiness probe was canceled by the timeout — note the probe may ignore cancellation and continue running in the background.",
+                        lastProbeException ?? ex);
+                }
+
+                lastProbeException = ex;
+                _logger.LogDebug(
+                    ex,
+                    "Application-ready probe was canceled on attempt {Attempt}; treating as not-ready and retrying.",
+                    attempt);
             }

Handle OperationCanceledException thrown independently by the probe to prevent
unhandled exceptions and allow retries.

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs [55-92]

 try
 {
     // WaitAsync(linkedToken) enforces the timeout deadline even
     // when the probe ignores its own CancellationToken and would
     // otherwise hang or return after the budget elapses. When
     // the deadline fires, WaitAsync throws OperationCanceledException
     // immediately — we don't keep waiting for the rogue probe.
     var isReady = await probe(device, linkedToken)
         .WaitAsync(linkedToken)
         .ConfigureAwait(false);
 
     if (isReady)
     {
         if (attempt > 1)
         {
             _logger.LogDebug(
                 "Device became application-ready on probe attempt {Attempt}.",
                 attempt);
         }
         return;
     }
     _logger.LogDebug("Application-ready probe returned false on attempt {Attempt}; will retry.", attempt);
 }
-catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested && !cancellationToken.IsCancellationRequested)
+catch (OperationCanceledException ex) when (!cancellationToken.IsCancellationRequested)
 {
-    throw new TimeoutException(
-        $"Device did not become application-ready within {totalTimeout} (attempt {attempt}). " +
-        "The wait for the readiness probe was canceled by the timeout — note the probe may ignore cancellation and continue running in the background.",
-        lastProbeException);
+    if (timeoutCts.IsCancellationRequested)
+    {
+        throw new TimeoutException(
+            $"Device did not become application-ready within {totalTimeout} (attempt {attempt}). " +
+            "The wait for the readiness probe was canceled by the timeout — note the probe may ignore cancellation and continue running in the background.",
+            lastProbeException ?? ex);
+    }
+
+    lastProbeException = ex;
+    _logger.LogDebug(
+        ex,
+        "Application-ready probe was canceled on attempt {Attempt}; treating as not-ready and retrying.",
+        attempt);
 }
 catch (Exception ex) when (ex is not OperationCanceledException)
 {
     lastProbeException = ex;
     _logger.LogDebug(
         ex,
         "Application-ready probe threw on attempt {Attempt}; treating as not-ready and retrying.",
         attempt);
 }

Suggestion importance[1-10]: 10

__

Why: Identifies and fixes a crucial bug where a probe throwing OperationCanceledException independently would skip the existing catch blocks and crash the update loop.

High

✅ Suggestions up to commit 7dc8bdf

Category

Suggestion

Impact

General

✅ ~~Improve timeout message precision~~

Suggestion Impact:

Updated TimeoutException messages to interpolate {totalTimeout} (preserving sub-second precision) rather than rounding to whole seconds; related timeout messages were also reworded while applying this change.

code diff:

                 throw new TimeoutException(
-                    $"Device did not become application-ready within {totalTimeout.TotalSeconds:F0}s after PIC32 reconnect (attempt {attempt}). " +
-                    "The serial transport reopened but the readiness probe never returned true; the device may still be initializing or the firmware may have failed to start.");
+                    $"Device did not become application-ready within {totalTimeout} (attempt {attempt}). " +
+                    "The transport reconnected but the readiness probe never returned true; the device may still be initializing or the firmware may have failed to start.",
+                    lastProbeException);
             }
 
             try
             {
-                var isReady = await probe(device, linkedToken).ConfigureAwait(false);
-
-                // Re-check the budget after the await: a probe that ignores
-                // its CancellationToken could otherwise return true after
-                // the timeout has elapsed and bypass the budget. Caller's
-                // CT propagates as OperationCanceledException out of this
-                // method (caught at the same handler below); only the
-                // timeout-CT case is reinterpreted as TimeoutException.
-                try
-                {
-                    linkedToken.ThrowIfCancellationRequested();
-                }
-                catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested && !cancellationToken.IsCancellationRequested)
-                {
-                    throw new TimeoutException(
-                        $"Device did not become application-ready within {totalTimeout.TotalSeconds:F0}s after PIC32 reconnect (attempt {attempt}). " +
-                        "The readiness probe ignored cancellation and returned after the deadline.");
-                }
+                // WaitAsync(linkedToken) enforces the timeout deadline even
+                // when the probe ignores its own CancellationToken and would
+                // otherwise hang or return after the budget elapses. When
+                // the deadline fires, WaitAsync throws OperationCanceledException
+                // immediately — we don't keep waiting for the rogue probe.
+                var isReady = await probe(device, linkedToken)
+                    .WaitAsync(linkedToken)
+                    .ConfigureAwait(false);
 
                 if (isReady)
                 {
@@ -881,14 +880,32 @@
                 }
                 _logger.LogDebug("Application-ready probe returned false on attempt {Attempt}; will retry.", attempt);
             }
-            catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested && !cancellationToken.IsCancellationRequested)
-            {
-                throw new TimeoutException(
-                    $"Device did not become application-ready within {totalTimeout.TotalSeconds:F0}s after PIC32 reconnect (attempt {attempt}). " +
-                    "The readiness probe was canceled by the timeout while running.");
+            catch (OperationCanceledException ex) when (!cancellationToken.IsCancellationRequested)
+            {
+                // Two cases reach here:
+                // 1. The wait deadline fired (timeoutCts canceled) — surface
+                //    as TimeoutException so callers see the readiness budget.
+                // 2. The probe itself threw OperationCanceledException for
+                //    some unrelated reason (its own internal CTS, etc). That
+                //    must NOT crash the update loop — treat it as a probe
+                //    failure and retry, same as any other thrown exception.
+                if (timeoutCts.IsCancellationRequested)
+                {
+                    throw new TimeoutException(
+                        $"Device did not become application-ready within {totalTimeout} (attempt {attempt}). " +
+                        "The wait for the readiness probe was canceled by the timeout — note the probe may ignore cancellation and continue running in the background.",
+                        lastProbeException ?? ex);
+                }
+
+                lastProbeException = ex;
+                _logger.LogDebug(
+                    ex,
+                    "Application-ready probe was canceled on attempt {Attempt}; treating as not-ready and retrying.",
+                    attempt);
             }
             catch (Exception ex) when (ex is not OperationCanceledException)
             {
+                lastProbeException = ex;
                 _logger.LogDebug(
                     ex,
                     "Application-ready probe threw on attempt {Attempt}; treating as not-ready and retrying.",
@@ -902,7 +919,9 @@
             catch (OperationCanceledException) when (timeoutCts.IsCancellationRequested && !cancellationToken.IsCancellationRequested)
             {
                 throw new TimeoutException(
-                    $"Device did not become application-ready within {totalTimeout.TotalSeconds:F0}s after PIC32 reconnect (attempt {attempt}).");
+                    $"Device did not become application-ready within {totalTimeout} (attempt {attempt}). " +
+                    "The transport reconnected but the readiness probe never returned true; the device may still be initializing or the firmware may have failed to start.",
+                    lastProbeException);
             }

Change the timeout message formatting to preserve sub-second precision instead
of rounding to whole seconds.

src/Daqifi.Core/Firmware/FirmwareUpdateService.cs [41-43]

 throw new TimeoutException(
-    $"Device did not become application-ready within {totalTimeout.TotalSeconds:F0}s after PIC32 reconnect (attempt {attempt}). " +
+    $"Device did not become application-ready within {totalTimeout} after PIC32 reconnect (attempt {attempt}). " +
     "The serial transport reopened but the readiness probe never returned true; the device may still be initializing or the firmware may have failed to start.");

Suggestion importance[1-10]: 5

__

Why: While minor, preserving sub-second precision in the timeout message improves diagnostic clarity, especially since tests and callers may use sub-second values.

Low

…ll-behaved probes Defensive cancellation re-check after the await: a probe that ignores its CancellationToken could otherwise return true after the timeout elapsed and slip past the budget. The post-await check forces the TimeoutException path even for that case. Caller-cancellation semantics preserved: the inner try/catch only reinterprets the timeout-CT case; OperationCanceledException from the caller-CT propagates out unchanged via the outer catch.

cptkoolbeenz · 2026-05-12T05:14:57Z

/improve

cptkoolbeenz · 2026-05-12T05:14:58Z

/agentic_review

qodo-code-review · 2026-05-12T05:15:04Z

Code Review by Qodo

New Review Started

This review has been superseded by a new analysis

qodo-code-review · 2026-05-12T05:15:47Z

Persistent suggestions updated to latest commit 7dc8bdf

…sion in timeout messages Replaced {totalTimeout.TotalSeconds:F0}s with {totalTimeout} across all 4 readiness-probe TimeoutException messages. The :F0 formatter rounded sub-second timeouts to "0s" — including the test value of 150ms — which was unhelpful for diagnostics. {TimeSpan} formats as hh:mm:ss.fffffff and preserves precision.

cptkoolbeenz · 2026-05-12T05:17:26Z

/improve

cptkoolbeenz · 2026-05-12T05:17:27Z

/agentic_review

qodo-code-review · 2026-05-12T05:17:35Z

Persistent review updated to latest commit cff6e8f

qodo-code-review · 2026-05-12T05:18:11Z

Persistent suggestions updated to latest commit cff6e8f

…t enforcement Replaced the post-await cancellation re-check from pass 2 with Task.WaitAsync(linkedToken) on the probe call itself. WaitAsync is the stronger primitive: when the timeout fires, it throws OperationCanceledException immediately instead of waiting for the rogue probe to return. The post-await re-check was defensive but still required the probe to complete; WaitAsync short-circuits. Skipped a redundant inline option-validation suggestion (already covered by Validate() in FirmwareUpdateServiceOptions).

qodo-code-review · 2026-05-12T05:22:08Z

+        if (_options.PostReconnectReadinessProbe is { } probe)
+        {
+            await WaitForApplicationReadyAsync(device, probe, cancellationToken).ConfigureAwait(false);
+        }


1. Readiness probe is optional 📎 Requirement gap ☼ Reliability

FirmwareUpdateService only waits for application readiness when PostReconnectReadinessProbe is set; otherwise it proceeds immediately after serial reconnect, which can still return a half-started device. This keeps the failure mode the checklist is trying to eliminate and can still force downstream integrations to add their own readiness polling.

Agent Prompt

## Issue description The post-reconnect readiness wait is currently opt-in (`PostReconnectReadinessProbe` is nullable and gated), so the firmware update flow can still proceed immediately after reconnect without confirming the application is ready. ## Issue Context Checklist IDs 1 and 2 require Core’s firmware update lifecycle to handle post-reconnect application readiness so reconnect completion isn’t based solely on transport reopening, and downstream callers don’t need their own readiness loops. ## Fix Focus Areas - src/Daqifi.Core/Firmware/FirmwareUpdateService.cs[810-820] - src/Daqifi.Core/Firmware/FirmwareUpdateServiceOptions.cs[124-150]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

cptkoolbeenz · 2026-05-12T05:23:23Z

/agentic_review

qodo-code-review · 2026-05-12T05:23:29Z

Persistent review updated to latest commit eb28b8d

…t cause in TimeoutException Qodo flagged that deterministic probe failures lose their root cause: the catch swallows the probe exception, retries, and eventually throws TimeoutException without the underlying error attached. With Debug logs off the real cause is invisible. Now captures the most-recent probe exception and attaches it as InnerException to all 3 timeout-path TimeoutExceptions. Steady-state behavior unchanged; observability improves for the failure case. Skipped 3 other findings: - "Readiness probe is optional" — defensible design per the issue's explicit "callers can provide a readiness probe" wording; making it always-on requires Core to know what "ready" means, which it doesn't. - "Conflicting timeout budgets" — readiness timeout (30s default) fits inside JumpingToApplicationTimeout (45s default) by design; documented in the existing options docstrings. - "Readiness wait not surfaced" — observability nice-to-have; out of scope for this fix.

cptkoolbeenz · 2026-05-12T05:26:38Z

/improve

cptkoolbeenz · 2026-05-12T05:26:39Z

/agentic_review

qodo-code-review · 2026-05-12T05:26:46Z

Persistent review updated to latest commit 32e6f91

qodo-code-review · 2026-05-12T05:27:32Z

Persistent suggestions updated to latest commit 32e6f91

…licationTimeout interaction (PR #200) Address Qodo finding 'Conflicting timeout budgets': the readiness probe's budget runs inside the JumpingToApp state, so users who raise the readiness timeout near/above JumpingToApplicationTimeout will see the outer state-timeout fire first. Default values (45s state, 30s readiness) leave headroom for reconnect; documented the interaction explicitly so configuration mistakes are visible at the call site. Skipping the remaining 2 findings as design choices: - 'Readiness probe is optional' — the issue's own wording specifies caller-provided probe; making it always-on would require Core to know what 'ready' means for an arbitrary device. - 'Readiness wait not surfaced' — observability nice-to-have; out of scope for this fix.

cptkoolbeenz · 2026-05-12T05:29:16Z

Convergence summary (Qodo /agentic_review pass 5, after commit 89657ba):

Persistent findings + dispositions:

Readiness probe is optional — keep as-is. The issue's "Possible approaches" list explicitly includes "extend the reconnect contract so callers can provide a readiness probe", which is what this PR implements. Making the probe always-on with a default would require Core to know what "ready" means for an arbitrary device, which it doesn't (the streaming-device interface has no generic ping). Callers that need the wait set the probe; callers that don't get the legacy fast-complete behavior they had before. This is the pattern desktop already uses.
Conflicting timeout budgets — addressed via docs in 89657ba. The readiness budget runs inside the JumpingToApp state, so configuration that raises it above the state timeout will see the outer fire first. Default values (45s state, 30s readiness) leave ample headroom; the docstring now spells this out so misconfiguration is visible at the call site.
~~Probe errors lose root cause~~ ✅ resolved in 32e6f91 — TimeoutException now carries the most-recent probe exception as InnerException.
Readiness wait not surfaced — observability nice-to-have. Out of scope for this fix; would require plumbing periodic progress reports during the wait, which adds complexity for a secondary concern. Filed as a follow-up if someone needs it.

Test gate: 893/895 (2 hardware skips). CI green.

Ready for review.

cptkoolbeenz · 2026-05-12T06:03:39Z

/improve

cptkoolbeenz · 2026-05-12T06:38:10Z

/agentic_review

qodo-code-review · 2026-05-12T06:38:18Z

Persistent review updated to latest commit ea82fd1

qodo-code-review · 2026-05-12T06:39:34Z

Persistent suggestions updated to latest commit ea82fd1

When the probe throws on attempt N then later attempts return false until the readiness budget expires, the TimeoutException carried the stale exception from attempt N as InnerException — misleading debugging and any handler that inspects InnerException. Clear lastProbeException whenever a probe invocation completes normally (true OR false), so a subsequent timeout only attaches a probe-thrown exception when it's actually the most recent outcome. Locked in with a focused test.

cptkoolbeenz · 2026-05-12T06:42:41Z

/improve

cptkoolbeenz · 2026-05-12T06:42:42Z

/agentic_review

qodo-code-review · 2026-05-12T06:42:48Z

Persistent review updated to latest commit b089763

qodo-code-review · 2026-05-12T06:43:22Z

PR Code Suggestions ✨

Warning

/improve is deprecated. Use /agentic_review instead (removed after 2026-05-31).

No code suggestions found for the PR.

cptkoolbeenz · 2026-05-12T06:47:23Z

/improve

cptkoolbeenz · 2026-05-12T06:47:24Z

/agentic_review

qodo-code-review · 2026-05-12T06:47:31Z

Persistent review updated to latest commit b089763

qodo-code-review · 2026-05-12T06:48:48Z

PR Code Suggestions ✨

Warning

/improve is deprecated. Use /agentic_review instead (removed after 2026-05-31).

No code suggestions found for the PR.

cptkoolbeenz · 2026-05-12T07:00:18Z

/improve

qodo-code-review · 2026-05-12T07:00:40Z

PR Code Suggestions ✨

Warning

/improve is deprecated. Use /agentic_review instead (removed after 2026-05-31).

No code suggestions found for the PR.

Add LogInformation at WaitForApplicationReadyAsync entry and on success (with elapsed time + attempt count) so observers tailing the log can distinguish a deliberate readiness poll from a stuck flow. Wait can take up to PostReconnectReadinessTimeout (default 30s); previously only Debug-level breadcrumbs existed during that window.

cptkoolbeenz · 2026-05-12T19:29:35Z

/improve

cptkoolbeenz · 2026-05-12T19:29:36Z

/agentic_review

qodo-code-review · 2026-05-12T19:29:43Z

Persistent review updated to latest commit 26b19fe

qodo-code-review · 2026-05-12T19:30:43Z

Persistent suggestions updated to latest commit 26b19fe

…eadiness-probe

…t (PR #200 follow-up) SerialStreamTransport.Stream returns _serialPort.BaseStream, which is a fresh Stream instance after Disconnect() reopens the port. Previously DaqifiDevice.Disconnect kept _messageProducer / _messageConsumer alive with references to the old BaseStream, and Connect's "if (... == null)" guard skipped recreation — so any Send() after a transport reconnect wrote to the disposed stream and silently no-op'd, while the text consumer on the new stream saw zero bytes. Surfaced by PR #200's post-reconnect readiness probe: GetLanChipInfoAsync returned null on every attempt because the probe's Send went to the dead stream. Null the producer/consumer in Disconnect so Connect's existing guard triggers fresh construction against the current Stream. Regression test fails without this change (Send post-reconnect lands on the previous, disposed stream rather than the rotated current one). Note: this does not fully resolve the post-PIC32-reset readiness probe on macOS hardware — a separate USB CDC re-enumeration race appears to leave the new SerialPort handle unable to receive bytes despite IsOpen=true. That's tracked separately. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tylerkron · 2026-05-15T03:45:12Z

Hardware validation + bundled Core fix

Spent a session validating this PR on real hardware (Nq1 / /dev/cu.usbmodem101, macOS). Three flashes with different probe configurations on top of the merged main branch. Findings:

1. PR mechanism is correct end-to-end

--readiness-trivial probe ((_, _) => Task.FromResult(true)) transitioned the update through JumpingToApp → Complete in 2.3 seconds — confirming the probe→wait→state-transition pipeline works exactly as designed:

[State] Verifying -> JumpingToApp | ... | 03:13:29.9
[Probe] trivial-true
[State] JumpingToApp -> Complete | PIC32 firmware update completed. | 03:13:32.1
Firmware update completed successfully.

2. Stale-stream bug uncovered in `DaqifiDevice` (bundled in commit `e59410e`)

The natural probe (GetLanChipInfoAsync) returned null on every attempt across multiple flashes. Trace-listener instrumentation showed Collection complete at ~2000ms, 0 lines for each probe — the text consumer never saw a single byte.

Root cause: SerialStreamTransport.Stream returns _serialPort.BaseStream, which is a brand-new Stream instance after Disconnect() reopens the port. But DaqifiDevice.Disconnect() kept _messageProducer / _messageConsumer alive with references to the old BaseStream, and Connect()'s if (_messageProducer == null) guard skipped recreation. Any Send() after a transport reconnect wrote to a disposed stream and silently no-op'd.

This was latent because no existing internal flow calls SCPI on the same DaqifiStreamingDevice instance after a reconnect — desktop disposes and reconnects fresh. This PR's readiness probe is the first caller to exercise that path, so it surfaced it.

Fix (commit e59410e): null _messageConsumer and _messageProducer in Disconnect() so Connect()'s existing guard triggers fresh construction against the current Stream. Added a regression test (DaqifiDevice_DisconnectThenConnect_SendsToCurrentTransportStream) that uses a stream-rotating mock transport — fails without the fix, passes with it. Full suite: 964/964 passing.

3. Separate USB CDC re-enumeration issue — follow-up

Even with the stale-stream fix, the LAN chip-info probe still times out post-PIC32-reset on macOS. Trace evidence:

	Working `--lan-chip-info` (fresh process)	Probe post-firmware-reset
Protobuf consumer stopped at	500ms (busy reading)	0-1ms (idle / errored)
First response at	603ms ✓	(never)
Lines collected	1	0

The new SerialPort opens (IsOpen == true), my fix gives it a fresh producer/consumer bound to the new BaseStream, but no bytes flow back through that handle. A separate process opening the same /dev/cu.usbmodem101 a few seconds later gets a working handle immediately.

This is consistent with a macOS USB CDC re-enumeration race: after PIC32 jumps to application, the host sees the old device disappear and a new one appear with the same path name, but the in-process SerialPort.Open() may bind too soon to a kernel device-node that isn't fully ready. A fresh process binds cleanly.

Mitigation options for a follow-up:

Brief grace delay after transport.Connect() succeeds, then re-open the port
Probe-side: caller's PostReconnectReadinessProbe could include the close/reopen dance
Document the contract: "probe must tolerate the device's post-reset settling period"

Implication for this PR: mechanism is sound, ship it. The probe's value will be most apparent once the follow-up USB CDC issue is also fixed; for now, callers should be aware that on macOS post-PIC32-reset, a probe based on GetLanChipInfoAsync will likely time out even though the device firmware is actually fine.

After PIC32 firmware update jumps to application, the device's USB CDC interface re-enumerates. On macOS, the first SerialPort.Open() to succeed inside the re-enum window is a "shadow" handle — IsOpen==true but writes silently drop and reads see zero bytes. Hardware-validated on a Nyquist1 (/dev/cu.usbmodem101): the first reconnect succeeded at ~2s with UnauthorizedAccessException rejected for every preceding attempt. Once that race-winning handle was held, subsequent SCPI exchanges returned 0 bytes for tens of seconds; a fresh process opening the same path got a clean handle immediately. Add PostReconnectStaleHandleDelay option (default 2s) and a dance in JumpToApplicationAndReconnectAsync that, after WaitForSerialReconnectAsync succeeds, closes the port, sleeps the delay, then reconnects to obtain a clean kernel binding. Validated end-to-end with PR #200's readiness probe: with the dance + a probe that runs InitializeAsync to wake the post-reset dormant device, the LAN chip-info query returns valid data and the firmware update transitions to Complete. Opt out by setting the option to TimeSpan.Zero (Windows, where the first open is already clean). Tests: - UpdateFirmwareAsync_PostReconnectStaleHandleDelay_TriggersExtraDisconnectReconnect asserts the dance fires (2 disconnects + 2 reconnects vs baseline 1 each). - UpdateFirmwareAsync_PostReconnectStaleHandleDelayZero_SkipsExtraDisconnectReconnect asserts the opt-out path with TimeSpan.Zero keeps the baseline counts. - CreateFastOptions sets the delay to Zero so the unit-test suite (which doesn't exercise USB CDC) doesn't pay the per-test 2s settling cost. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tylerkron · 2026-05-15T04:07:01Z

Update: full hardware fix landed (commit `a5ce969`)

Continued hardware debugging revealed a second issue chained behind the first one. Now the readiness probe works end-to-end on macOS.

The complete failure mode

PIC32 firmware update finishes → MCU jumps to application → application firmware boots dormant (LEDs off, WiFi subsystem unpowered, doesn't respond to LAN/WiFi SCPI queries). Confirmed via SYSTem:REboot test: lights go off and stay off until an explicit `SYSTem:POWer:STATe 1` is sent.
PIC32 reset causes USB CDC to re-enumerate. On macOS the host SerialPort opens are rejected with `UnauthorizedAccessException` for ~2s.
The first `SerialPort.Open()` to succeed after that 2s window is a shadow handle: `IsOpen==true`, but writes silently drop and reads see zero bytes. A fresh process opening the same path gets a clean handle immediately.

Two-layer fix

Core (commit a5ce969): New `FirmwareUpdateServiceOptions.PostReconnectStaleHandleDelay` (default 2s). After `WaitForSerialReconnectAsync` succeeds, the firmware service now discards that race-winning handle — `device.Disconnect()` → sleep → `device.Connect()` → clean kernel binding. Opt out with `TimeSpan.Zero` (Windows, where the first open is already clean).

Probe-side responsibility (caller pattern): The post-reset device is dormant. The caller's probe should call `device.InitializeAsync()` first (which sends `SYSTem:POWer:STATe 1` etc.) before issuing the readiness query.

Hardware proof

With both fixes in place, the readiness probe (using `GetLanChipInfoAsync` after `InitializeAsync`) succeeds end-to-end on the Nyquist1:

```
[State] Verifying -> JumpingToApp
[RECONNECT-LOOP] enter; IsConnected=False
[RECONNECT-LOOP] attempt 1-9 failed (UnauthorizedAccessException) — ~2.3s
[RECONNECT-LOOP] connected on attempt 10
[STALE-FIX] Disconnect → sleep 2s → reconnect (clean kernel binding)
[RECONNECT-LOOP] connected on attempt 1 instantly
[Probe] LAN chip-info OK (ChipId=1377184, FW=19.7.7)
[State] JumpingToApp -> Complete
Firmware update completed successfully.
```

Recommended probe shape (for callers, including daqifi-desktop)

```csharp
options.PostReconnectReadinessProbe = async (device, ct) =>
{
if (device is DaqifiStreamingDevice sd)
{
// Wake the post-reset dormant device (TurnDeviceOn + format + echo etc.)
await sd.InitializeAsync().ConfigureAwait(false);
}
if (device is not ILanChipInfoProvider lan) return true;
var info = await lan.GetLanChipInfoAsync(ct).ConfigureAwait(false);
return info != null;
};
```

Tests (suite now at 968/968 passing on net9.0 + net10.0)

`UpdateFirmwareAsync_PostReconnectStaleHandleDelay_TriggersExtraDisconnectReconnect` — asserts the dance fires
`UpdateFirmwareAsync_PostReconnectStaleHandleDelayZero_SkipsExtraDisconnectReconnect` — asserts opt-out path
`CreateFastOptions` test helper sets the delay to Zero so unit tests don't pay the per-test 2s settling cost

After PR #200 hardware validation revealed two ergonomic issues with the readiness probe contract: 1. PIC32 application firmware boots dormant after the bootloader jumps to application (LEDs off, WiFi subsystem unpowered, won't answer LAN queries). Callers writing a "natural" probe like GetLanChipInfoAsync != null would silently fail for tens of seconds because the device needs SYSTem:POWer:STATe 1 (sent by InitializeAsync) before it'll respond. That dormant-state knowledge shouldn't be required of every probe author. 2. The PostReconnectStaleHandleDelay dance was logged at Debug level while the comparable readiness-probe wait is at Information. The dance is a significant operation (2+ seconds inside JumpingToApp); observers tailing logs should see it. Changes in JumpToApplicationAndReconnectAsync: - After the stale-handle dance reopens the port with a clean kernel binding, call InitializeAsync on the device if it's a DaqifiDevice (skipped for test fakes that don't extend it). Wrapped in try/catch with LogWarning so an init failure doesn't mask the eventual probe outcome — the probe stays the source of truth for "ready". - Stale-handle dance logged at Information (was Debug). Hardware-validated end-to-end: with this change, the example app probe simplifies to just `GetLanChipInfoAsync != null` (no wake-up code in the caller). Total post-reconnect time ~9s on Nyquist1 (2.3s race window + 2s stale-handle settle + ~4s init + ~1s probe). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tylerkron · 2026-05-15T04:27:05Z

Follow-up: auto-wake + log level (commit `3e4fd88`)

Self-review after the previous commit identified two ergonomic gaps. Fixed in 3e4fd88.

1. Auto-wake the dormant device in Core

Previous comment recommended callers do `await sd.InitializeAsync()` in their probe to wake the post-reset device. That works but pushes "device is dormant after PIC32 reset" knowledge onto every probe author — a footgun. Core now handles it automatically after the stale-handle dance:

```csharp
if (device is DaqifiDevice initializableDevice)
{
_logger.LogInformation("Waking post-reset device via InitializeAsync.");
try { await initializableDevice.InitializeAsync().ConfigureAwait(false); }
catch (Exception ex)
{
_logger.LogWarning(ex, "InitializeAsync after reconnect threw; continuing to readiness probe.");
}
}
```

Wrapped in try/catch with LogWarning so an init failure doesn't mask the probe outcome — the probe stays the source of truth for "ready". Skipped for non-DaqifiDevice transports (e.g. test fakes); they're responsible for their own readiness.

Probe pattern simplifies to:

```csharp
options.PostReconnectReadinessProbe = async (device, ct) =>
{
if (device is not ILanChipInfoProvider lan) return true;
var info = await lan.GetLanChipInfoAsync(ct).ConfigureAwait(false);
return info != null;
};
```

2. Stale-handle dance log level: Debug → Information

Matches the readiness probe wait. Dance is a significant operation (2+ seconds inside JumpingToApp); observers tailing logs should see it.

Hardware re-validation

Re-flashed on the Nyquist1 with the simplified "natural" probe:

```
[State] Verifying -> JumpingToApp
[Probe] LAN chip-info OK (ChipId=1377184, FW=19.7.7)
[State] JumpingToApp -> Complete
Firmware update completed successfully.
```

Total post-reconnect time: ~9 seconds (2.3s race window + 2s stale-handle settle + ~4s init + ~1s probe).

Suite status: 968/968 passing (net9.0 + net10.0)

cptkoolbeenz requested a review from a team as a code owner May 12, 2026 05:12

qodo-code-review Bot reviewed May 12, 2026

View reviewed changes

tylerkron and others added 2 commits May 14, 2026 20:45

Merge remote-tracking branch 'origin/main' into feat/post-reconnect-r…

af6bd19

…eadiness-probe

tylerkron merged commit 77c909f into main May 15, 2026
1 check passed

tylerkron deleted the feat/post-reconnect-readiness-probe branch May 15, 2026 04:28

This was referenced May 30, 2026

deps: Bump the nuget-patch-minor group with 5 updates daqifi/daqifi-desktop#536

Closed

deps: Bump coverlet.collector and 3 others daqifi/daqifi-desktop#539

Closed

Conversation

cptkoolbeenz commented May 12, 2026

Summary

API additions

Test plan

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

qodo-code-review Bot commented May 12, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-code-review Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

New Review Started

Uh oh!

qodo-code-review Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

qodo-code-review Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

New Review Started

Uh oh!

qodo-code-review Bot commented May 12, 2026

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

qodo-code-review Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qodo-code-review Bot commented May 12, 2026

Uh oh!

qodo-code-review Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

qodo-code-review Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

qodo-code-review Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qodo-code-review Bot commented May 12, 2026

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

qodo-code-review Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qodo-code-review Bot commented May 12, 2026

Uh oh!

cptkoolbeenz commented May 12, 2026

Uh oh!

cptkoolbeenz commented May 12, 2026

qodo-code-review Bot commented May 12, 2026 •

edited

Loading

qodo-code-review Bot commented May 12, 2026 •

edited

Loading

qodo-code-review Bot commented May 12, 2026 •

edited

Loading

qodo-code-review Bot commented May 12, 2026 •

edited

Loading

qodo-code-review Bot commented May 12, 2026 •

edited

Loading

qodo-code-review Bot commented May 12, 2026 •

edited

Loading

qodo-code-review Bot commented May 12, 2026 •

edited

Loading

qodo-code-review Bot commented May 12, 2026 •

edited

Loading

qodo-code-review Bot commented May 12, 2026 •

edited

Loading

qodo-code-review Bot commented May 12, 2026 •

edited

Loading

qodo-code-review Bot commented May 12, 2026 •

edited

Loading

2. Stale-stream bug uncovered in `DaqifiDevice` (bundled in commit `e59410e`)

Update: full hardware fix landed (commit `a5ce969`)

Follow-up: auto-wake + log level (commit `3e4fd88`)