Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fiber] Stackful cooperative scheduling via Fibers #743

Merged
merged 7 commits into from
Oct 29, 2021

Conversation

salkinium
Copy link
Member

@salkinium salkinium commented Oct 4, 2021

This continues the work in #439. The changes/additions so far:

Main function

The main function is not a fiber anymore, the developer has to manually start the scheduler. modm::fiber::yield() will return if scheduler is not running, therefore looping in the main functions. This will most likely work fine for interrupt/flags driven peripheral access like I2C, SPI and UART, however, won't work for any class that requires an update() function to be polled (but that didn't work with RF_CALL_BLOCKING either).

The scheduler::run() function will return to the main function if all fibers stop running (or if all functions have yielded). This can be used for sleeping and idle thread functionality.

This also treats the main function consistently across all platforms and carries no performance overhead if not using fibers.

I've also moved all classes into the modm::fiber namespace, since I'm worried about compatiblity with FreeRTOS and I don't want people believing that modm::yield() makes their code magically reentrant or something stupid like that. In general users should not have to manually yield, but rather use some higher-level primitive or modm API anyways.

Cortex-M Context Switch

I've optimized the context switch assembly by using the push LR, pop PC pair correctly. This significantly speeds up the context switch. For FPU enabled devices, we only need to store the upper 16 floating point registers, since the lower ones are not saved across subfunction calls according to the EABI. This does double the switching time, perhaps in future we can investigate optimizations via the FPU flags register.

Fibers run on the PSP, while the main function and the interrupts use the MSP whose size is defined by the existing lbuild option modm:platform:cortex-m:main_stack_size.

Guard Option

There is an invisible guard option modm:__fibers which prevents the fibers from being shown in modm.io modules and the doxygen docs until it is ready. This is particularly important for adding new peripheral drivers that may use a completely different API without resumable functions and shouldn't be exposed to the users right now.

Stack Placement

On STM32F4 with CCM, the main stack has been moved into RAM, since the CCM is not DMA-able. On all other STM32 the main and modm_faststack are placed in the fastest DMA-able memory. We will have to see how well the Cortex-M7 DTCM is really accessible for DMA, we'll deal with that later.

TODO:

  • Integrate puncover to get max stack size analysis. not trivial, will investigate later
  • AVR support
  • Cortex-M0 support
  • Cortex-M3 support
  • Cortex-M4/7 support
  • x86 support for Linux/macOS
  • x86 support for Windows
  • ARM64 support Cannot test it, delegating to future
  • Guard option to hide fibers from users right now
  • Move all stacks (main + fibers) into the fastest DMA-able memory
  • Passing a lambda function and copying the lambda capture onto the stack
    • Cortex-M
    • AVR
    • x86

@rleh
Copy link
Member

rleh commented Oct 4, 2021

The main stack will be reused for the IRQ stack, and the scheduler::start() function will return to the main function if all fibers stop running (or if all functions have yielded). This can be used for sleeping and idle thread functionality.

Shouldn't the function rather be named scheduler::run() (or scheduler::update())?

@salkinium
Copy link
Member Author

Yes, but I'm still working to port fibers to AVR, and haven't gotten to the API yet.

@salkinium
Copy link
Member Author

I've added a context switch based on avr-fibers and the GCC calling convention and register layout, however, it doesn't work entirely and as I cannot debug my AVR hardware, I've only been poking around in darkness as to why. In my mind it should just work, but I cannot even jump into the beginning of a fiber call. I'm going to use simavr to try and debug it.

@salkinium salkinium force-pushed the feature/fiber branch 3 times, most recently from f5d372f to 55c7da2 Compare October 10, 2021 23:00
@salkinium
Copy link
Member Author

salkinium commented Oct 10, 2021

I fixed the AVR fibers, turns out the ATmega2560 is using a 3 byte PC. 🙄
Validated the example and unit tests in hardware on ATmega2560 (3B PC) and ATmega328 (2B PC).

@salkinium salkinium force-pushed the feature/fiber branch 8 times, most recently from 9fb5836 to 5b3b722 Compare October 13, 2021 23:53
@salkinium salkinium marked this pull request as ready for review October 13, 2021 23:54
@salkinium
Copy link
Member Author

I'm not sure the x86_64 context switch is working correctly on Windows, I didn't understand what the previous implementation was doing with %rax. The examples and unittests work locally on macOS and in the Linux CI, but Windows seems to not like it? x86 seems to be a bit of a train wreck tho.

@chris-durand
Copy link
Member

I compared the x86_64 context switch to the implementation in boost context and found big differences for the Microsoft ABI. They also save XMM6-XMM15 floating point registers and other things. It looks like we are also missing some x87 floating point state for Unix x86_64 platforms.

There is some info in the boost context docs.

@salkinium
Copy link
Member Author

salkinium commented Oct 14, 2021

Oh boy… I'll check out if we can use setjmp/longjmp instead on Hosted. That would also give us ARM64 Linux support.

@chris-durand
Copy link
Member

I'll review tomorrow.

@salkinium
Copy link
Member Author

salkinium commented Oct 19, 2021

I've moved all stacks into the fastest DMA-able memory and added a guard option modm:__fibers so that we can build a parallel, very much not backwards compatible API of peripheral drivers without confusing everyone. This also prevents the docs scripts from including the unfinished fibers docs.

We'll have to see how annoying it is to have two incompatible API next to each other…

I will also restructure the one commit into a few different ones, they are all squashed together until I got things working.

@salkinium salkinium added this to the 2021q4 milestone Oct 20, 2021
src/modm/processing/fiber/channel.hpp Outdated Show resolved Hide resolved
src/modm/processing/fiber/channel.hpp Outdated Show resolved Hide resolved
src/modm/processing/fiber/context.h Show resolved Hide resolved
src/modm/processing/fiber/mutex.hpp Outdated Show resolved Hide resolved
src/modm/processing/fiber/waitable.hpp Outdated Show resolved Hide resolved
@chris-durand
Copy link
Member

chris-durand commented Oct 20, 2021

There is still something wrong with the Waitable implementation. Having two fibers block on an acquired mutex leads to weird behaviour and a segfault. I tested this code:

#include <modm/debug.hpp>
#include <modm/processing.hpp>

modm::fiber::Mutex m;

template<int index>
void test()
{
        for(int ii=0; ii<10; ii++)
        {
                m.acquire();
                MODM_LOG_INFO << "test" << index << "\n";
                modm::fiber::yield();
                m.release();
        }
}

modm::fiber::Stack<1024> stack[3];
modm::Fiber fiber1(stack[0], test<1>);
modm::Fiber fiber2(stack[1], test<2>);
modm::Fiber fiber3(stack[2], test<3>);

int
main(void)
{
        MODM_LOG_INFO << "Start" << modm::endl;
        modm::fiber::Scheduler::run();
        MODM_LOG_INFO << "End" << modm::endl;

        return 0;
}

It also crashes when I correctly initialize Waitable::lastWaiter as nullptr.

@rleh rleh mentioned this pull request Oct 22, 2021
Copy link
Member

@rleh rleh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@salkinium salkinium force-pushed the feature/fiber branch 2 times, most recently from f0719bd to a8e5c18 Compare October 28, 2021 07:53
@salkinium
Copy link
Member Author

I think the waitable implementation is too clever for me, it tries to reorder the list of fibers for low latency and I think it does it wrongly?

I would prefer to have a dumber polling based implementation first to explore the usefulness of the API. The current implementation also doesn't work from an ISR context, which is what is needed to implement interrupt driven peripheral drivers.

I've removed the Waitables for now, since we also don't have a Protothread/Resumables version of those and therefore aren't necessary to replace them.

@salkinium salkinium force-pushed the feature/fiber branch 2 times, most recently from baa8246 to dfdddc9 Compare October 28, 2021 09:09
Copy link
Member

@chris-durand chris-durand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge it without the waitables. They can be added later if required.

@salkinium salkinium added the ci:hal Triggers the exhaustive HAL compile CI jobs label Oct 29, 2021
@salkinium salkinium merged commit 3936a28 into modm-io:develop Oct 29, 2021
@salkinium salkinium deleted the feature/fiber branch October 29, 2021 03:29
@ghost
Copy link

ghost commented Oct 29, 2021

Just an idea. If the fiber knows the stack size it only costs a few instructions extra to check for overflow during stack swap. It would be a nice option to have.

@salkinium salkinium mentioned this pull request Jan 19, 2022
2 tasks
@hshose hshose mentioned this pull request Mar 13, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
advanced 🤯 ci:hal Triggers the exhaustive HAL compile CI jobs feature 🚧
Development

Successfully merging this pull request may close these issues.

3 participants