Skip to content

Comments

Removed the 'configASSERT( xInheritanceOccurred == pdFALSE )' assertion from xQueueSemaphoreTake (IDFGH-8766)#10199

Merged
espressif-bot merged 1 commit intoespressif:masterfrom
AerofoilsGmbH:queue-send-fix
Nov 29, 2022
Merged

Removed the 'configASSERT( xInheritanceOccurred == pdFALSE )' assertion from xQueueSemaphoreTake (IDFGH-8766)#10199
espressif-bot merged 1 commit intoespressif:masterfrom
AerofoilsGmbH:queue-send-fix

Conversation

@timoxd7
Copy link
Contributor

@timoxd7 timoxd7 commented Nov 17, 2022

FreeRTOS References:

Commit: FreeRTOS/FreeRTOS-Kernel@4e2bf2c
Pull-Request: FreeRTOS/FreeRTOS-Kernel#576
Problem Description: https://forums.freertos.org/t/15967

This pull-request is purposed to copy the bugfix from the FreeRTOS Repo into esp-idf.
All credit goes to @Erlkoenig90.

Original FreeRTOS Pull Request Description

Removed the 'configASSERT( xInheritanceOccurred == pdFALSE )' assertion from xQueueSemaphoreTake as the reasoning behind it is wrong; it can trigger on wrongly on highly-contested semaphores on multicore systems. See the forum for details.

Fixes rare deadlock on heavy loaded multicore-systems.
@CLAassistant
Copy link

CLAassistant commented Nov 17, 2022

CLA assistant check
All committers have signed the CLA.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Nov 17, 2022
@github-actions github-actions bot changed the title Removed the 'configASSERT( xInheritanceOccurred == pdFALSE )' assertion from xQueueSemaphoreTake Removed the 'configASSERT( xInheritanceOccurred == pdFALSE )' assertion from xQueueSemaphoreTake (IDFGH-8766) Nov 17, 2022
@timoxd7 timoxd7 marked this pull request as ready for review November 17, 2022 19:42
@Alvin1Zhang
Copy link
Collaborator

Thanks for your contribution.

Copy link
Contributor

@Dazza0 Dazza0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timoxd7 Changes LGTM.

I think the issue described in the post is a manifestation of more general "spinlock acquire race condition" that can occur on SMP systems under the current FreeRTOS queue/semaphore/mutex implementation.

Basically, because there is a gap in the critical section between each loop iteration (see below), this can lead to some race conditions when there are multiple cores running concurrently.

BaseType_t xQueueSend/Receive/SemaphoreTake(...)
{
    // Main fucntion loop
    for(;;) {
        taskENTER_CRITICAL();
        // Check for empty/full queue/semaphore/mutex
        taskEXIT_CRITICAL();

        // Handle timeout stuff
        taskENTER_CRITICAL();
        if (timed_out) {
            portYIELD_WITHIN_API();
        }
        taskEXIT_CRITICAL();
        /*
        - If yielding, it occurs after we exit the critical section.
        - Once we schedule back here, we need to enter the critical section again
          which can lead to race conditions if the other CPU gets enters it
          faster than we do.
        */
    }
}

We've run into a similar issue with xQueueReceive() before where a lower priority task can "steal" a message intended for a higher priority task that was blocked on the queue. The flow would look like the following:

  • Task A (CPU 0, Priority 5) calls xQueueReceive() on a empty queue, thus it blocks
  • Task B (CPU 0, Priority 4) calls xQueueSend() to send a message to the queue, thus unblocking Task A.
  • Task A (on CPU 0) has how scheduled back, and needs to enter the critical section again to read the message.
  • At the same time a Task C (CPU 1, Priority 3) also calls xQueueReceive() at the same time, and is quicker than Task A to enter the critical section.
  • Task C ends up reading the message while Task A spins waiting to enter the critical section. Thus Task C as "stolen" Task A's message, even though Task C had a lower priority.

I'm not sure if there's clean way we can work around this SMP race condition, unless we add some information in the queue/mutexes regarding the last task that was unblocked on the queue.

Regardless, the PR changes LGTM. Thanks for the contribution.

@Dazza0
Copy link
Contributor

Dazza0 commented Nov 25, 2022

sha=355abfdff6f1039f3b98b598c484f6103ade9749

@Dazza0 Dazza0 added the PR-Sync-Merge Pull request sync as merge commit label Nov 25, 2022
@espressif-bot espressif-bot added Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: Opened Issue is new Resolution: NA Issue resolution is unavailable labels Nov 28, 2022
@espressif-bot espressif-bot merged commit 2976703 into espressif:master Nov 29, 2022
@timoxd7 timoxd7 deleted the queue-send-fix branch November 29, 2022 20:24
@AxelLin
Copy link
Contributor

AxelLin commented Dec 8, 2022

@Dazza0 Stable branches also need this fix?

@Erlkoenig90
Copy link
Contributor

Thanks for merging this into IDF! I would greatly appreciate it if it would be merged into 4.4.3 . I have manually applied this change to 4.4.2 and it works fine for me.

@Dazza0
Copy link
Contributor

Dazza0 commented Dec 8, 2022

@AxelLin @Erlkoenig90 Backports have already been internally raised and pending merge.

@AxelLin
Copy link
Contributor

AxelLin commented Dec 25, 2022

@Dazza0 The fix is in v4.4+, but v4.3 also needs fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR-Sync-Merge Pull request sync as merge commit Resolution: Done Issue is done internally Status: Done Issue is done internally

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants