singleton!()'s use of Option can have nasty optimization results #364

Rahix · 2021-11-18T21:36:47Z

I was trying to store some large buffers inside of a cortex_m::singleton!() when I noticed that suddenly my flash usage increased dramatically. As some (contrived) examples, the following works (variable ends up in .bss):

cortex_m::singleton!(: [u8; 2048] = [0xff; 2048]);

but this doesn't (variable ends up in .data):

cortex_m::singleton!(: [bool; 2048] = [true; 2048]);

The reason is the Option inside the singleton!() macro:

cortex-m/src/macros.rs

Line 55 in 6b01313

static mut VAR: Option<$ty> = None;

Normally, the None value which is stored here has a bitpattern of all zeros, leading to it being stored in .bss. But when the given type has a niche (like bool above does), the None variant is "optimized" into it. This means it then has non-zero bits and cannot be stored in .bss anymore, leading to it getting moved to .data...

A quick solution is to "reimplement" the Option:

	static mut VAR: (MaybeUninit<$ty>, bool) = (MaybeUninit::uninit(), false);

This will always end up in .bss because no niche optimization exists for such types (yet). Before opening a PR, I wanted to discuss whether there are any objections to changing to such an implementation?

Side Note: I think it would be nice to allow users to overwrite the name of the static. When looking at nm or bloaty output, only seeing a bunch of symbols named __cortex_m_rt_main::{{closure}}::VAR is not great...

The text was updated successfully, but these errors were encountered:

TDHolmes · 2021-12-30T21:40:46Z

This change seems reasonable to me if you can show that it fixes the performance issue you found and has similar performance as the existing implementation. MaybeUninit makes a lot of sense for this macro, so seems reasonable to me...

Also a variant of this macro where we pass in a name seems nice

TDHolmes · 2021-12-30T22:33:05Z

here's what I came up with. It unfortunately involves adding a dependency paste to get the optional name, but it's a lightweight dependency (no transitive dependencies). It does require adding an atomic bool to indicate if we're initialized or not though, but that isn't too much overhead imo. Option would often have a similar overhead.

macro_rules! singleton {
    ($name:ident : $ty:ty = $expr:expr) => {
        $crate::interrupt::free(|_| {
            use ::core::{
                mem::MaybeUninit,
                sync::atomic::{AtomicBool, Ordering},
            };

            paste::paste! {
                static mut [< $name >]: MaybeUninit<$ty> = MaybeUninit::uninit();
                static [<$name _INITIALIZED>]: AtomicBool =
                    AtomicBool::new(false);

                if [<$name _INITIALIZED>].load(Ordering::Acquire) {
                    None
                } else {
                    let expr = $expr;

                    // initializes our T and returns an &mut to it
                    #[allow(unsafe_code)]
                    let t = unsafe { [< $name >].write(expr) };
                    [<$name _INITIALIZED>].store(true, Ordering::Release);

                    #[allow(unsafe_code)]
                    unsafe { Some(t) }
                }
            }
        })
    };
    (: $ty:ty = $expr:expr) => {
        // The old way - use the name VAR
        singleton!(VAR: $ty = $expr)
    };
}

Rahix · 2022-01-01T00:36:53Z

fixes the performance issue you found

This isn't about performance, but about flash usage. If the static ends up in .data, it potentially wastes a huge amount of flash. It should really be in .bss where it only occupies space in RAM at runtime.

has similar performance as the existing implementation

I did not check but I would be surprised if a boolean flag vs an Option ends up generating much different code.

It unfortunately involves adding a dependency paste to get the optional name, but it's a lightweight dependency (no transitive dependencies).

You can avoid this by just putting both variables into one static as I showed above:

static mut VAR: (MaybeUninit<$ty>, bool) = (MaybeUninit::uninit(), false);

It does require adding an atomic bool to indicate if we're initialized or not though

This is unnecessary. Interrupts are disabled already and the previous code also does not do any kind of synchronization on the Option so no atomic is needed in a new version either.

For reference, in the project where this came up, I am using the following macro as a replacement:

#[macro_export]
macro_rules! singleton {
    ($name:ident: $ty:ty = $expr:expr) => {
        ::cortex_m::interrupt::free(|_| {
            static mut $name: (::core::mem::MaybeUninit<$ty>, bool) =
                (::core::mem::MaybeUninit::uninit(), false);

            #[allow(unsafe_code)]
            let used = unsafe { $name.1 };
            if used {
                None
            } else {
                let expr = $expr;

                #[allow(unsafe_code)]
                unsafe {
                    $name.1 = true;
                    Some($name.0.write(expr))
                }
            }
        })
    };
}

Rahix · 2022-01-01T00:42:59Z

Unrelated side-note about your code: If you are going for atomics instead of a critical section, the code is racy: You must access the atomic in a single operation like AtomicBool::compare_exchange() instead of your

if some_atomic.load(Ordering::Acquire) {
    ...
} else {
    ...
    some_atomic.store(true, Ordering::Release);
    ...
}

Otherwise a racing task could run between the load and store.

But again, this is not relevant to this issue because we're in a critical section and the atomic is not necessary here anyway.

TDHolmes · 2022-01-01T01:25:26Z

By performance, I mainly meant WRT flash usage, but I did kind of mix and match what I meant by the word.

Ah, didn't think to use a tuple to get around having a separate variable tracking initialization. Nice!

Yeah I wasn't thinking of switching out the critical section for an atomic, just didn't fully think that one through. I don't think CAS is available on thumbv6m so not possible anyways.

Now if you add the existing match arm with VAR as the name I think this could be a backwards compatible change

Rahix · 2022-01-03T10:28:11Z

Sounds good, will send a PR.

As detailed in rust-embedded#364, niche optimization of the `Option` used in the `singleton!()` macro can lead to the initial value of the static to contain non-zero bits. This in turn leads to the whole static being moved from `.bss` to `.data` which means it eats up flash space for no reason. Especially if the singleton stores a particularly large type, this can be quite problematic. Prevent this by using an explicit boolean flag instead of the `Option` type. This is not quite as nice but at least there is no chance for the `singleton!()` to end up in `.data`...

380: Improve singleton!() macro r=adamgreig a=Rahix This PR addresses two shortcomings of the `cortex_m::singleton!()` macro, which I raised in #364. For review, I think it is best to look at the two commits implementing these changes individually. I think this changeset should also be backported to `0.7.x` where it applies cleanly and which is also the place where I tested it. The static is always initialized to a "zero" value with `Option::None` which means it should end up in `.bss`. However, if the enclosed type has a niche, `Option::None` can become a non-zero bitpattern which moves the whole singleton from `.bss` to `.data`. This is especially problematic when storing large buffers in the `singleton!()` as this starts eating lots of flash space unnecessarily. To prevent this, I switched to using an explicit boolean flag instead. This is not quite as nice but at least there is no chance for the `singleton!()` to end up in `.data`... For reference and as an example, the faulty behavior can be triggered with ```rust cortex_m::singleton!(: Option<u32> = None) ``` (the inner option has a non-zero niche which the outer option will occupy) Due to the static always being named `VAR` right now, all `singleton!()` instances end up having non-descriptive symbol names like `__cortex_m_rt_main::{{closure}}::VAR` which makes them hard to tell apart in a debugger or when looking at an objdump. I added the ability to set an explicit name which end up becoming part of the symbol name. This does not affect Rust code at all - the new name is not visible anywhere. It works like this: ```rust let value = singleton!(FOO_BUFFER: [u8; 1024] = [0u8; 1024]); ``` Of course the old syntax also still works and keeps the old behavior of calling the static `VAR`. Fixes #364. Co-authored-by: Rahix <[email protected]>

Rahix changed the title ~~singleton!()'s use of Option can have nasty optimization resullts~~ singleton!()'s use of Option can have nasty optimization results Nov 18, 2021

tmplt mentioned this issue Jan 1, 2022

DWT: differentiate the first comparator #376

Open

Rahix mentioned this issue Jan 3, 2022

Improve singleton!() macro #380

Merged

bors bot closed this as completed in e0b93a0 Jan 4, 2022

bors bot closed this as completed in #380 Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

singleton!()'s use of Option can have nasty optimization results #364

singleton!()'s use of Option can have nasty optimization results #364

Rahix commented Nov 18, 2021 •

edited

Loading

TDHolmes commented Dec 30, 2021

TDHolmes commented Dec 30, 2021 •

edited

Loading

Rahix commented Jan 1, 2022

Rahix commented Jan 1, 2022

TDHolmes commented Jan 1, 2022

Rahix commented Jan 3, 2022

singleton!()'s use of Option can have nasty optimization results #364

singleton!()'s use of Option can have nasty optimization results #364

Comments

Rahix commented Nov 18, 2021 • edited Loading

TDHolmes commented Dec 30, 2021

TDHolmes commented Dec 30, 2021 • edited Loading

Rahix commented Jan 1, 2022

Rahix commented Jan 1, 2022

TDHolmes commented Jan 1, 2022

Rahix commented Jan 3, 2022

Rahix commented Nov 18, 2021 •

edited

Loading

TDHolmes commented Dec 30, 2021 •

edited

Loading