Skip to content

fix: addToRecentSpools mutex guard, drain write queue before restart (#106, #107)#118

Merged
sjordan0228 merged 1 commit into
devfrom
fix/thread-safety
Apr 6, 2026
Merged

fix: addToRecentSpools mutex guard, drain write queue before restart (#106, #107)#118
sjordan0228 merged 1 commit into
devfrom
fix/thread-safety

Conversation

@sjordan0228
Copy link
Copy Markdown
Contributor

@sjordan0228 sjordan0228 commented Apr 6, 2026

Summary

Test plan

  • esp32dev builds clean
  • esp32s3devkitc builds clean
  • Trigger OTA update while tag write is queued — verify write completes before reboot

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced NFC operation reliability during system reboots and firmware updates. The system now properly waits for pending NFC write operations to complete before restarting, improving data integrity.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 6, 2026

Warning

Rate limit exceeded

@sjordan0228 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 28 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 15 minutes and 28 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f7ba2f66-755d-4d68-b87c-463e02b5e16f

📥 Commits

Reviewing files that changed from the base of the PR and between 5cf02c3 and 85fd991.

📒 Files selected for processing (3)
  • src/NFCManager.cpp
  • src/NFCManager.h
  • src/WebServerManager.cpp
📝 Walkthrough

Walkthrough

The changes add thread-safe queue management to NFCManager by introducing mutex-protected snapshot logic in addToRecentSpools() and a new isWriteQueueEmpty() query method. WebServerManager's reboot flows are updated to drain the NFC write queue with bounded polling instead of fixed delays before system restart.

Changes

Cohort / File(s) Summary
NFCManager Interface
src/NFCManager.h, src/NFCManager.cpp
Added isWriteQueueEmpty() const public method to check FreeRTOS write queue status. Enhanced addToRecentSpools() with early exit if tagMutex is null, mutex acquisition with 50ms timeout, and explicit mutex release before modifying recentSpools array.
WebServerManager Reboot Handler
src/WebServerManager.cpp
Replaced fixed 500ms delays with bounded queue-draining loops (up to 20 iterations of 100ms polling) awaiting isWriteQueueEmpty() in both handleApiPostConfig() and handleApiUploadFirmwareComplete() before ESP restart, with additional 200ms delay preserved.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description covers summary, changes, and test plan, but is missing the 'How to Test' section and 'Checklist' from the required template. Add a 'How to Test' section with detailed manual testing steps and complete the 'Checklist' section with items for testing OTA while writes are queued.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: mutex guard on addToRecentSpools and draining write queue before restart, with issue references.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/thread-safety

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the size/S Small change (10-50 lines) label Apr 6, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/NFCManager.cpp (1)

2257-2320: ⚠️ Potential issue | 🔴 Critical

Race condition: recentSpools array modified after mutex release.

The mutex is released at line 2300 before modifying the recentSpools array (lines 2302-2319). However, getRecentSpools() and updateRecentSpoolSyncStatus() both access recentSpools under tagMutex protection. This creates a data race where concurrent calls could read the array while addToRecentSpools() is shuffling entries.

The mutex should be held for the entire operation, including the array modifications.

🔒 Proposed fix: hold mutex for entire operation
 void NFCManager::addToRecentSpools() {
     if (tagMutex == nullptr) return;
     if (xSemaphoreTake(tagMutex, pdMS_TO_TICKS(50)) != pdTRUE) return;

     if (!currentSpool.tag_data_valid) {
         xSemaphoreGive(tagMutex);
         return;
     }

     // Check if this spool already exists in recent list
     int existingIndex = -1;
     for (size_t i = 0; i < recentSpoolsCount; i++) {
         if (strcmp(recentSpools[i].spool_id, currentSpool.spool_id) == 0) {
             existingIndex = i;
             break;
         }
     }

-    // Snapshot data under mutex
     RecentSpoolEntry newEntry;
     memset(&newEntry, 0, sizeof(newEntry));
     strncpy(newEntry.spool_id, currentSpool.spool_id, sizeof(newEntry.spool_id) - 1);

     opt_get_material_type(&currentSpool.tag_data, &newEntry.material_type);
     opt_get_primary_color(&currentSpool.tag_data, newEntry.color);

     if (opt_get_brand_name(&currentSpool.tag_data, newEntry.manufacturer, sizeof(newEntry.manufacturer)) != OPT_OK) {
         newEntry.manufacturer[0] = '\0';
     }

     float full_weight = 0.0f, consumed = 0.0f;
     opt_get_actual_full_weight(&currentSpool.tag_data, &full_weight);
     opt_get_consumed_weight(&currentSpool.tag_data, &consumed);
     newEntry.grams_remaining = (int)(full_weight - consumed);

     newEntry.last_seen = time(nullptr);
     newEntry.valid = true;
     newEntry.synced_to_spoolman = false;

     int32_t smId = -1;
     opt_get_gp_spoolman_id(&currentSpool.tag_data, &smId);
     newEntry.spoolman_id = smId;

-    xSemaphoreGive(tagMutex);
-
     if (existingIndex >= 0) {
         // Spool exists - shift entries to remove it from current position
         for (size_t i = existingIndex; i > 0; i--) {
             recentSpools[i] = recentSpools[i - 1];
         }
     } else {
         // New spool - shift all entries down to make room
         size_t shiftCount = (recentSpoolsCount < MAX_RECENT_SPOOLS) ? recentSpoolsCount : (MAX_RECENT_SPOOLS - 1);
         for (size_t i = shiftCount; i > 0; i--) {
             recentSpools[i] = recentSpools[i - 1];
         }
         if (recentSpoolsCount < MAX_RECENT_SPOOLS) {
             recentSpoolsCount++;
         }
     }

     // Place new entry at front
     recentSpools[0] = newEntry;
+
+    xSemaphoreGive(tagMutex);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/NFCManager.cpp` around lines 2257 - 2320, addToRecentSpools currently
releases tagMutex before shuffling/modifying the shared recentSpools array,
creating a data race with getRecentSpools() and updateRecentSpoolSyncStatus();
fix it by holding tagMutex for the entire operation: move the
xSemaphoreGive(tagMutex) call from before the array-shift logic to after
recentSpools/recentSpoolsCount updates (i.e., only release the mutex once
recentSpools[0] = newEntry and any count changes are complete), and ensure all
early-return paths still release the mutex where appropriate (preserve the
existing releases when currentSpool.tag_data_valid is false and when
xSemaphoreTake fails).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/WebServerManager.cpp`:
- Around line 784-790: The code pauses the scan task via
NFCManager::getInstance().pauseScanTask() before waiting for the write queue to
drain, but processWriteQueue() runs inside scanLoop(), so pausing prevents the
queue from ever emptying; fix by reversing the sequence: wait (with a bounded
loop and small delays) for NFCManager::getInstance().isWriteQueueEmpty() to
become true (or time out) first, then call
NFCManager::getInstance().pauseScanTask(), and only after confirming the queue
is empty proceed to delay and call ESP.restart(); keep the existing timeout
behavior to avoid blocking indefinitely and reference NFCManager::getInstance(),
isWriteQueueEmpty(), pauseScanTask(), and scanLoop() when making the change.
- Around line 850-855: The write-queue drain is happening after the scan task
was paused, so isWriteQueueEmpty() may always report empty; move the queue-drain
logic to the start of the upload flow so the queue is drained before pausing the
scan task: in handleApiUploadFirmwareChunk() (when transitioning to
UPLOAD_FILE_START) call the same write-queue drain loop that checks
NFCManager::getInstance().isWriteQueueEmpty() and delays until drained, then
pause the scan task; keep the existing drain in
handleApiUploadFirmwareComplete() only as a safety check.

---

Outside diff comments:
In `@src/NFCManager.cpp`:
- Around line 2257-2320: addToRecentSpools currently releases tagMutex before
shuffling/modifying the shared recentSpools array, creating a data race with
getRecentSpools() and updateRecentSpoolSyncStatus(); fix it by holding tagMutex
for the entire operation: move the xSemaphoreGive(tagMutex) call from before the
array-shift logic to after recentSpools/recentSpoolsCount updates (i.e., only
release the mutex once recentSpools[0] = newEntry and any count changes are
complete), and ensure all early-return paths still release the mutex where
appropriate (preserve the existing releases when currentSpool.tag_data_valid is
false and when xSemaphoreTake fails).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9d4a999e-d828-4ea0-add0-1858ae93cd3f

📥 Commits

Reviewing files that changed from the base of the PR and between e92097a and 5cf02c3.

📒 Files selected for processing (3)
  • src/NFCManager.cpp
  • src/NFCManager.h
  • src/WebServerManager.cpp

Comment thread src/WebServerManager.cpp Outdated
Comment thread src/WebServerManager.cpp Outdated
Comment on lines 850 to 855
// Wait for write queue to drain (#107)
for (int i = 0; i < 20 && !NFCManager::getInstance().isWriteQueueEmpty(); i++) {
delay(100);
}
delay(200);
ESP.restart();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same issue: scan task already paused during OTA upload.

The scan task was paused at UPLOAD_FILE_START in handleApiUploadFirmwareChunk() (line 821), so by the time handleApiUploadFirmwareComplete() runs, the scan task is already suspended and the write queue cannot drain.

For OTA, this is less critical since writes during firmware upload are uncommon, but for consistency and to fully address issue #107, the drain should happen before pausing.

🔧 Proposed fix: drain queue at start of upload before pausing

In handleApiUploadFirmwareChunk(), drain the queue before pausing:

     if (upload.status == UPLOAD_FILE_START) {
         Serial.printf("OTA: Upload start: %s\n", upload.filename.c_str());
+        // Wait for pending NFC writes to complete before pausing (`#107`)
+        for (int i = 0; i < 20 && !NFCManager::getInstance().isWriteQueueEmpty(); i++) {
+            delay(100);
+        }
         // Pause NFC scan task during upload to prevent interference
         NFCManager::getInstance().pauseScanTask();

Then the drain loop in handleApiUploadFirmwareComplete() becomes a safety check for any writes somehow queued during upload.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/WebServerManager.cpp` around lines 850 - 855, The write-queue drain is
happening after the scan task was paused, so isWriteQueueEmpty() may always
report empty; move the queue-drain logic to the start of the upload flow so the
queue is drained before pausing the scan task: in handleApiUploadFirmwareChunk()
(when transitioning to UPLOAD_FILE_START) call the same write-queue drain loop
that checks NFCManager::getInstance().isWriteQueueEmpty() and delays until
drained, then pause the scan task; keep the existing drain in
handleApiUploadFirmwareComplete() only as a safety check.

@sjordan0228 sjordan0228 merged commit 2423af5 into dev Apr 6, 2026
2 checks passed
@sjordan0228 sjordan0228 deleted the fix/thread-safety branch April 6, 2026 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small change (10-50 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant