Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

singleton!()'s use of Option can have nasty optimization results #364

Closed
Rahix opened this issue Nov 18, 2021 · 6 comments · Fixed by #380
Closed

singleton!()'s use of Option can have nasty optimization results #364

Rahix opened this issue Nov 18, 2021 · 6 comments · Fixed by #380

Comments

@Rahix
Copy link
Contributor

Rahix commented Nov 18, 2021

I was trying to store some large buffers inside of a cortex_m::singleton!() when I noticed that suddenly my flash usage increased dramatically. As some (contrived) examples, the following works (variable ends up in .bss):

cortex_m::singleton!(: [u8; 2048] = [0xff; 2048]);

but this doesn't (variable ends up in .data):

cortex_m::singleton!(: [bool; 2048] = [true; 2048]);

The reason is the Option inside the singleton!() macro:

static mut VAR: Option<$ty> = None;

Normally, the None value which is stored here has a bitpattern of all zeros, leading to it being stored in .bss. But when the given type has a niche (like bool above does), the None variant is "optimized" into it. This means it then has non-zero bits and cannot be stored in .bss anymore, leading to it getting moved to .data...

A quick solution is to "reimplement" the Option:

	static mut VAR: (MaybeUninit<$ty>, bool) = (MaybeUninit::uninit(), false);

This will always end up in .bss because no niche optimization exists for such types (yet). Before opening a PR, I wanted to discuss whether there are any objections to changing to such an implementation?

Side Note: I think it would be nice to allow users to overwrite the name of the static. When looking at nm or bloaty output, only seeing a bunch of symbols named __cortex_m_rt_main::{{closure}}::VAR is not great...

@Rahix Rahix changed the title singleton!()'s use of Option can have nasty optimization resullts singleton!()'s use of Option can have nasty optimization results Nov 18, 2021
@TDHolmes
Copy link
Contributor

This change seems reasonable to me if you can show that it fixes the performance issue you found and has similar performance as the existing implementation. MaybeUninit makes a lot of sense for this macro, so seems reasonable to me...

Also a variant of this macro where we pass in a name seems nice

@TDHolmes
Copy link
Contributor

TDHolmes commented Dec 30, 2021

here's what I came up with. It unfortunately involves adding a dependency paste to get the optional name, but it's a lightweight dependency (no transitive dependencies). It does require adding an atomic bool to indicate if we're initialized or not though, but that isn't too much overhead imo. Option would often have a similar overhead.

macro_rules! singleton {
    ($name:ident : $ty:ty = $expr:expr) => {
        $crate::interrupt::free(|_| {
            use ::core::{
                mem::MaybeUninit,
                sync::atomic::{AtomicBool, Ordering},
            };

            paste::paste! {
                static mut [< $name >]: MaybeUninit<$ty> = MaybeUninit::uninit();
                static [<$name _INITIALIZED>]: AtomicBool =
                    AtomicBool::new(false);

                if [<$name _INITIALIZED>].load(Ordering::Acquire) {
                    None
                } else {
                    let expr = $expr;

                    // initializes our T and returns an &mut to it
                    #[allow(unsafe_code)]
                    let t = unsafe { [< $name >].write(expr) };
                    [<$name _INITIALIZED>].store(true, Ordering::Release);

                    #[allow(unsafe_code)]
                    unsafe { Some(t) }
                }
            }
        })
    };
    (: $ty:ty = $expr:expr) => {
        // The old way - use the name VAR
        singleton!(VAR: $ty = $expr)
    };
}

@Rahix
Copy link
Contributor Author

Rahix commented Jan 1, 2022

fixes the performance issue you found

This isn't about performance, but about flash usage. If the static ends up in .data, it potentially wastes a huge amount of flash. It should really be in .bss where it only occupies space in RAM at runtime.

has similar performance as the existing implementation

I did not check but I would be surprised if a boolean flag vs an Option ends up generating much different code.

It unfortunately involves adding a dependency paste to get the optional name, but it's a lightweight dependency (no transitive dependencies).

You can avoid this by just putting both variables into one static as I showed above:

static mut VAR: (MaybeUninit<$ty>, bool) = (MaybeUninit::uninit(), false);

It does require adding an atomic bool to indicate if we're initialized or not though

This is unnecessary. Interrupts are disabled already and the previous code also does not do any kind of synchronization on the Option so no atomic is needed in a new version either.


For reference, in the project where this came up, I am using the following macro as a replacement:

#[macro_export]
macro_rules! singleton {
    ($name:ident: $ty:ty = $expr:expr) => {
        ::cortex_m::interrupt::free(|_| {
            static mut $name: (::core::mem::MaybeUninit<$ty>, bool) =
                (::core::mem::MaybeUninit::uninit(), false);

            #[allow(unsafe_code)]
            let used = unsafe { $name.1 };
            if used {
                None
            } else {
                let expr = $expr;

                #[allow(unsafe_code)]
                unsafe {
                    $name.1 = true;
                    Some($name.0.write(expr))
                }
            }
        })
    };
}

@Rahix
Copy link
Contributor Author

Rahix commented Jan 1, 2022

Unrelated side-note about your code: If you are going for atomics instead of a critical section, the code is racy: You must access the atomic in a single operation like AtomicBool::compare_exchange() instead of your

if some_atomic.load(Ordering::Acquire) {
    ...
} else {
    ...
    some_atomic.store(true, Ordering::Release);
    ...
}

Otherwise a racing task could run between the load and store.


But again, this is not relevant to this issue because we're in a critical section and the atomic is not necessary here anyway.

@TDHolmes
Copy link
Contributor

TDHolmes commented Jan 1, 2022

By performance, I mainly meant WRT flash usage, but I did kind of mix and match what I meant by the word.

Ah, didn't think to use a tuple to get around having a separate variable tracking initialization. Nice!

Yeah I wasn't thinking of switching out the critical section for an atomic, just didn't fully think that one through. I don't think CAS is available on thumbv6m so not possible anyways.

Now if you add the existing match arm with VAR as the name I think this could be a backwards compatible change

@Rahix
Copy link
Contributor Author

Rahix commented Jan 3, 2022

Sounds good, will send a PR.

Rahix added a commit to Rahix/cortex-m that referenced this issue Jan 3, 2022
As detailed in rust-embedded#364, niche optimization of the
`Option` used in the `singleton!()` macro can lead to the initial value
of the static to contain non-zero bits.  This in turn leads to the whole
static being moved from `.bss` to `.data` which means it eats up flash
space for no reason.  Especially if the singleton stores a particularly
large type, this can be quite problematic.

Prevent this by using an explicit boolean flag instead of the `Option`
type.  This is not quite as nice but at least there is no chance for the
`singleton!()` to end up in `.data`...
Rahix added a commit to Rahix/cortex-m that referenced this issue Jan 3, 2022
As detailed in rust-embedded#364, niche optimization of the
`Option` used in the `singleton!()` macro can lead to the initial value
of the static to contain non-zero bits.  This in turn leads to the whole
static being moved from `.bss` to `.data` which means it eats up flash
space for no reason.  Especially if the singleton stores a particularly
large type, this can be quite problematic.

Prevent this by using an explicit boolean flag instead of the `Option`
type.  This is not quite as nice but at least there is no chance for the
`singleton!()` to end up in `.data`...
Rahix added a commit to Rahix/cortex-m that referenced this issue Jan 4, 2022
As detailed in rust-embedded#364, niche optimization of the
`Option` used in the `singleton!()` macro can lead to the initial value
of the static to contain non-zero bits.  This in turn leads to the whole
static being moved from `.bss` to `.data` which means it eats up flash
space for no reason.  Especially if the singleton stores a particularly
large type, this can be quite problematic.

Prevent this by using an explicit boolean flag instead of the `Option`
type.  This is not quite as nice but at least there is no chance for the
`singleton!()` to end up in `.data`...
@bors bors bot closed this as completed in e0b93a0 Jan 4, 2022
@bors bors bot closed this as completed in #380 Jan 4, 2022
adamgreig pushed a commit that referenced this issue May 15, 2022
380: Improve singleton!() macro r=adamgreig a=Rahix

This PR addresses two shortcomings of the `cortex_m::singleton!()` macro, which I raised in #364.  For review, I think it is best to look at the two commits implementing these changes individually.

I think this changeset should also be backported to `0.7.x` where it applies cleanly and which is also the place where I tested it.

The static is always initialized to a "zero" value with `Option::None` which means it should end up in `.bss`.  However, if the enclosed type has a niche, `Option::None` can become a non-zero bitpattern which moves the whole singleton from `.bss` to `.data`.  This is especially problematic when storing large buffers in the `singleton!()` as this starts eating lots of flash space unnecessarily.

To prevent this, I switched to using an explicit boolean flag instead.  This is not quite as nice but at least there is no chance for the `singleton!()` to end up in `.data`...

For reference and as an example, the faulty behavior can be triggered with

```rust
cortex_m::singleton!(: Option<u32> = None)
```

(the inner option has a non-zero niche which the outer option will occupy)

Due to the static always being named `VAR` right now, all `singleton!()` instances end up having non-descriptive symbol names like `__cortex_m_rt_main::{{closure}}::VAR` which makes them hard to tell apart in a debugger or when looking at an objdump.

I added the ability to set an explicit name which end up becoming part of the symbol name.  This does not affect Rust code at all - the new name is not visible anywhere.  It works like this:

```rust
let value = singleton!(FOO_BUFFER: [u8; 1024] = [0u8; 1024]);
```

Of course the old syntax also still works and keeps the old behavior of calling the static `VAR`.

Fixes #364.

Co-authored-by: Rahix <[email protected]>
adamgreig pushed a commit that referenced this issue May 15, 2022
380: Improve singleton!() macro r=adamgreig a=Rahix

This PR addresses two shortcomings of the `cortex_m::singleton!()` macro, which I raised in #364.  For review, I think it is best to look at the two commits implementing these changes individually.

I think this changeset should also be backported to `0.7.x` where it applies cleanly and which is also the place where I tested it.

The static is always initialized to a "zero" value with `Option::None` which means it should end up in `.bss`.  However, if the enclosed type has a niche, `Option::None` can become a non-zero bitpattern which moves the whole singleton from `.bss` to `.data`.  This is especially problematic when storing large buffers in the `singleton!()` as this starts eating lots of flash space unnecessarily.

To prevent this, I switched to using an explicit boolean flag instead.  This is not quite as nice but at least there is no chance for the `singleton!()` to end up in `.data`...

For reference and as an example, the faulty behavior can be triggered with

```rust
cortex_m::singleton!(: Option<u32> = None)
```

(the inner option has a non-zero niche which the outer option will occupy)

Due to the static always being named `VAR` right now, all `singleton!()` instances end up having non-descriptive symbol names like `__cortex_m_rt_main::{{closure}}::VAR` which makes them hard to tell apart in a debugger or when looking at an objdump.

I added the ability to set an explicit name which end up becoming part of the symbol name.  This does not affect Rust code at all - the new name is not visible anywhere.  It works like this:

```rust
let value = singleton!(FOO_BUFFER: [u8; 1024] = [0u8; 1024]);
```

Of course the old syntax also still works and keeps the old behavior of calling the static `VAR`.

Fixes #364.

Co-authored-by: Rahix <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants