Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a feature that disables panic messages #3129

Closed
wants to merge 5 commits into from

Conversation

dflemstr
Copy link
Contributor

Overview

Add a new feature that reduces binary size significantly for certain types of binaries. For example, for the stm32 bootloader example, about 39% of the binary is just strings used for panicking, and as shown below, the binary size can easily be reduced from 10.1kiB to 6.16kiB.

This is usually not a problem for programs that use defmt as the existing defmt infra already removes most strings from the binary, but using defmt is not always desirable, e.g. for bootloaders.

For my own application, I need to fit a bootloader into 3 kiB , and this change allowed me to go from 3.33 kiB to 2.77 kiB.

Details

Without this feature enabled, here's the binary size for the stm32 bootloader example:

$ cargo build --release --features embassy-stm32/stm32f303re
$ bloaty --domain=vm /home/dflemstr/.cache/cargo/target/thumbv7em-none-eabi/release/stm32-bootloader-example
     VM SIZE    
 -------------- 
  83.0%  8.39Ki    .text
  11.7%  1.18Ki    .rodata
   3.9%     404    .vector_table
   1.4%     148    .bss
 100.0%  10.1Ki    TOTAL

$ strings -d /home/dflemstr/.cache/cargo/target/thumbv7em-none-eabi/release/stm32-bootloader-example
[...]
assertion failed: dfu.capacity() as u32 - active.capacity() as u32 >= page_sizeassertion failed: 2 + 2 * (active.capacity() as u32 / page_size) <=
    state.capacity() as u32 / STATE::WRITE_SIZE as u32Boot prepare error
FlashBadMagicNotAlignedOutOfBoundsOther
)called `Option::unwrap()` on a `None` value
    ,
assertion failed: n < 8usizecalled `Result::unwrap()` on an `Err` valueU
TryFromSliceError
assertion failed: self.lsinot yet implemented
assertion failed: max::HSE_OSC.contains(&hse.freq)
assertion failed: max::HSE_BYP.contains(&hse.freq)
assertion failed: max::HCLK.contains(&hclk)
assertion failed: max::PCLK1.contains(&pclk1)assertion failed: max::PCLK2.contains(&pclk2)assertion failed: !(adcpres == AdcHclkPrescaler::Div1 && config.ahb_pre != AHBPrescaler::DIV1)internal error: entered unreachable code@B
assertion failed: max::PLL_IN.contains(&in_freq)
assertion failed: max::PLL_OUT.contains(&out_freq)

With this feature enabled:

$ cargo build --release --features embassy-stm32/stm32f303re --features embassy-stm32/no-panic-msgs --features embassy-boot-stm32/no-panic-msgs
$ bloaty --domain=vm /home/dflemstr/.cache/cargo/target/thumbv7em-none-eabi/release/stm32-bootloader-example                                   
     VM SIZE    
 -------------- 
  86.9%  5.35Ki    .text
   6.4%     404    .vector_table
   4.4%     276    .rodata
   2.3%     148    .bss
 100.0%  6.16Ki    TOTAL

$ strings -d /home/dflemstr/.cache/cargo/target/thumbv7em-none-eabi/release/stm32-bootloader-example
[...]
called `Option::unwrap()` on a `None` valueassertion failed: n < 8usize

There are still some .unwrap()s and asserts left to be found, but this change removes most of them.

@dflemstr
Copy link
Contributor Author

Wasn't sure if I should add this to all crates... I see that the fmt.rs file is copy-pasted into a bunch of places, so I could add this feature to all places where that file exists?

@dflemstr
Copy link
Contributor Author

With the additional fixes in the last few commits in this PR, I have now eliminated enough call sites that at least for my application, there are no longer any dyn Debug code paths, and some expensive stuff like core::fmt::pad is removed.

This means that I am able to fit my app + bootloader + DFU partitions onto a stm32f0 device:

MEMORY
{
    BOOTLOADER       : ORIGIN = 0x08000000, LENGTH =   3K  /* BANK_1: bootloader */
    BOOTLOADER_STATE : ORIGIN = 0x08000c00, LENGTH =   1K  /* BANK_1: bootloader state */
    FLASH            : ORIGIN = 0x08001000, LENGTH =   29K /* BANK_1: active application */
    DFU              : ORIGIN = 0x08008400, LENGTH =   30K /* BANK_1: next app version */
    APPCONFIG        : ORIGIN = 0x0800fc00, LENGTH =   1K  /* BANK_1: app configuration variables */
    RAM              : ORIGIN = 0x20000000, LENGTH =   20K /* SRAM */
}

@Dirbaio
Copy link
Member

Dirbaio commented Jun 28, 2024

There's ways to remove the panic fmt bloat without adding cfgs to every single crate. Have you tried these?

  • Don't use panic-probe, write your own panic handler that does asm!("udf") instead. This removes 95% of the fmt stuff. Works on stable.
  • use build-std-features=panic_immediate_abort. Removes 100% of fmt. It makes all panics compile to a single udf instruction, just 2 bytes. Requires nightly, though.

Works best when combined with opt-level s or z, and lto=fat. See this for more details.

Given these exist, I don't think we should add no-panic-msgs features to every single crate, unfortunately. The maintenance cost is a bit high, it's another feature combination that has to be tested in CI etc. Also it'll only apply to the embassy crates where we've added this, while the above solutions apply to all crates automatically.

However, I see you've also changed many instances of .unwrap() to unwrap!(). That's something we should do anyway, because it reduces code size when using defmt, because it prints the error with defmt instead of through std::fmt and the standard panic handler. Could you send a PR with just these changes?

@dflemstr
Copy link
Contributor Author

Makes sense, and I have looked into the different approaches you list. I will have to resort to using such an approach if we can't find a way to offer this functionality in the upstream library (or just maintain my own fork of embassy with this flag -- it's for a very specific use-case after all).

I want to stress that a major difference with this approach compared to the methods you listed is that you still get full panic info when something goes wrong, ie file/line/column number, debugger integration with backtraces, etc. This also works with all the existing routing logic that applies to panic_handler, ie the panic info will still be reported via defmt/rtt/macrocell/whatever when something goes wrong instead of just halting the MCU.

I agree that we can definitely split out the changes regarding unwrap!() and such, they are very much unrelated to the main point of this PR.

Given these exist, I don't think we should add no-panic-msgs features to every single crate, unfortunately.

Regarding the trade-offs between adding a feature flag for this vs. not supporting the use-case at all, I guess it would be possible to find a middle road? Would it be a reasonable compromise to extract the fmt module into its own crate, where this behavior could be controlled centrally? I personally found it a bit tedious already with having the fmt.rs file copy-pasted in a dozen places, so maybe that's a win in itself, regardless of the new feature flag?

I think the reason why we have to thread feature flags like this -- where I would say the no-panic-msgs, log, and defmt features are in the same boat -- is because the fmt module exists in so many copies.

@Dirbaio
Copy link
Member

Dirbaio commented Jun 28, 2024

I want to stress that a major difference with this approach compared to the methods you listed is that you still get full panic info when something goes wrong, ie file/line/column number,

oh, that's true. have you tried printing only the location of the PanicInfo, and not the message? I suspect that'd still optimize out most of the fmt bloat without requiring patching every crate.

debugger integration with backtraces

you still get a backtrace with udf with e.g. probe-rs.

works with all the existing routing logic that applies to panic_handler, ie the panic info will still be reported via defmt/rtt/macrocell/whatever when something goes wrong instead of just halting the MCU.

udf triggers a HardFault, you can write your own custom code in the HardFault handler that still reports it over rtt/whatever. You probably should, since some kinds of crashes can cause a HardFault directly without going through the panic handler. You can't get the PanicInfo from the HardFault handler, but you could still dump the PC value and convert it with addr2line.

extract the fmt module into its own crate

this has been tried many times and it doesn't work, please don't try more 😅 . defmt macros break when reexported, also we're relying on the in-crate macro name resolution to override panic! etc for the whole crate, which is not possible if it's a separate crate. See #3065 for the last attempt and the problems it caused.

@Dirbaio
Copy link
Member

Dirbaio commented Jun 28, 2024

To clarify: it is a goal of Embassy to allow getting code size as small as possible. My position is not "code size is not a priority so no-panic-msgs isn't worth it", it is "there's already ways of achieving small code size without no-panic-msgs".

I think that is the case, at least. If you do find something that's achievable with no-panic-msgs but not with custom panic handlers or with panic_immediate_abort please let me know!

@dflemstr
Copy link
Contributor Author

OK, I will start by breaking out the unwrap!() type changes as mentioned as those can be considered a completely unrelated change, and it sounds like we're aligned on that.

I guess I'm chasing this from a few different angles at the same time, and my main goal is to meet my size constraints for now, so I won't have the time/energy to explore too many other alternatives, at least for now. The approach I'm taking right now is to load built binaries into Ghidra and other reverse-engineering tools to find code paths that lead to code bloat (along with bloaty etc as shown above). That's the only truly reliable way I have found to reason about this, because there is some really surprising stuff I wasn't able to find just reasoning about the source code (where it wasn't enough to remove an .unwrap(), or switch to a different #[panic_handler], etc). Some examples:

  • Some code paths generate <dyn Debug>::fmt type calls, which I think sometimes leads to all impls of Debug that are referenced in any other code path to be kept around; maybe because the compiler can't prove that they aren't used.
  • Especially impl Debug for () (ie literally the unit type) is tricky, because it seems to be reasonably common to have a Result<A, ()> or something like that be unwrapped. This impl in turn pulls in expensive stuff like core::fmt::pad

I do agree that adding too much noise for features like this isn't worth it, on the other hand I also really value enabling people to just cargo run or "right-click main() in the IDE and press Run" type workflows. Any extra stuff that needs to be sprinkled on top, like custom build commands, profiles, linkers etc detracts from that. More of a philosophical discussion I guess, and maybe out of scope for this PR 🙂

@dflemstr dflemstr mentioned this pull request Jun 28, 2024
@dflemstr
Copy link
Contributor Author

I thought I should provide some numbers to show what I'm working with:

  • My app with vanilla panic_probe and upstream embassy has a VM size of 33.6kiB
  • Removing panic_probe and adding a custom #[panic_handler] that just prints location info: 33.0kiB
  • Having #[panic_handler] only call udf() immediately: 27.1kiB
  • Applying my unwrap!()-style fixes (still using udf() panic handler): 24.2kiB

So there's about 3kiB that disappeared through doing my handful of unwrap!()-style fixes that don't disappear with just having a udf() style panic handler. This was significant for the bootloader since I brought it from ~6kiB to ~3kiB. For my non-bootloader application it didn't matter as much, since the combination of my different changes means that I have brought the VM size down enough now that extra workarounds shouldn't be necessary, but I think it gives some data on roughly the opportunity here:

You can reduce code size by roughly 20% by disabling panics/having a trivial panic handler, but by roughly 30% (my app) or 40% (example stm32 bootloader) or 50% (my bootloader) if doing something invasive to the code, like changing the panic macros.

@Dirbaio
Copy link
Member

Dirbaio commented Jun 28, 2024

and panic_immediate_abort?

@dflemstr
Copy link
Contributor Author

dflemstr commented Jun 28, 2024

and panic_immediate_abort?

I wasn't able to test this at the moment because I haven't been able to find a nightly version that works for my build (although I have to admit that I haven't tried too hard to bisect nightly versions). EDIT: but I figure this will yield the lowest VM size for sure since it essentially removes all implicit panics, in principle, with a significant loss of ergonomics of course.

@Dirbaio
Copy link
Member

Dirbaio commented Sep 10, 2024

I'm going to close this because I don't believe it's worth the effort given the reasons above, assuming panic_immediate_abort allows the same level of debloat.

Thanks anyway for the PR, tho :)

If you find bloat that doesn't go away with panic_immediate_abort but does with the no-panic-msgs please do let me know, so we can look at this again.

@Dirbaio Dirbaio closed this Sep 10, 2024
@dflemstr
Copy link
Contributor Author

@Dirbaio I have been going deep on this, see this thread for more details: https://rust-lang.zulipchat.com/#narrow/stream/327149-t-libs-api.2Fapi-changes/topic/Flexible.20fmt.20.20API.20that.20allows.20for.20deferred.20formatting

I think since panic_immediate_abort is not appropriate for my use-case, my next course of action would be to engage with/wait for these improvements, most notably externally implementable items for panics: rust-lang/rust#116005

@Dirbaio
Copy link
Member

Dirbaio commented Sep 11, 2024

I think since panic_immediate_abort is not appropriate for my use-case

Can you expand on why? Is it because you want to still be able to get the location of the panic in logs, it is doable:

  • Enable panic=abort and panic_immediate_abort.
  • Set a panic_handler that does asm!("udf")
  • In HardFault handler, log (for example to defmt) the CPU registers of the fault, and even the top X kb of stack if you want.

When you get a fault, you can find the line that caused it by addr2lineing the PC value. It sometimes doesn't give the exact location due to optimizations/inlining, but in that case with some looking at ASM you can figure it out.

If you additionally log the top X kb of stack, it's even possible to get a backtrace. See https://github.com/tweedegolf/stackdump for example.

We do this at my company's firmware and it works great. Combine it with a utility that saves the last N kb of defmt logs on crash and uploads it to the server after reboot we even get remote crashdumps with logs+backtrace.

All with zero strings in the firmware, and zero core::fmt bloat.

@dflemstr
Copy link
Contributor Author

Yeah, maybe you're right and I just haven't been able to spend enough time/thinking on about how to make it work.

I guess in summary it sounds like you're saying "this is definitely possible to solve by combining these existing tricks" while I'm saying "I choose to hold off on trying to make this work until the solution requires fewer moving parts/is easier to reason about"

I can share some more details just as a FYI/as context if you want to understand my situation, in case it helps when making future embassy design considerations -- ie consider this a user testimony. I also totally understand if this is "tl;dr" and if so feel free to ignore it 😅

Some additional constraints are, for example, that the device only communicates over CAN-bus as the primary interface -- there's no serial interface available in the "production" context -- and it is in that context that I am trying to enable things like OTA updates. The firmware would need to receive OTA firmware over a CAN transport, and also do boot testing/communicate the outcome over CAN as well -- including panics that would indicate an update failure. I have also found that it might be tricky to reach the size constraints even with panic info removed, so I might need to find an asymmetrical solution where as part of a firmware update, the app would first flash a smaller firmware update application into a smaller partition, restart, then do the "main" firmware update into a larger partition. I have considered whether it would be possible to have a shared "CAN transport" library in a third partition that gets re-used by both the small bootloader and the main app. I have also considered things like storing a compressed version of the firmware and enabling the bootloader to do decompression-on-the-fly.

All of this has added a bunch of complexity and I have found it difficult to track how things interact. For example, if I want to track whether a CAN peripheral is still available when a panic occurs, so that I might be able to communicate the failure over CAN bus -- this might not be possible/safe in general, but it would be nice if I could make it work. Another option would be to store failure info into flash so that after a panic it is possible to reboot and let the bootloader handle the firmware revert/reporting the status over CAN-bus.

In general I would assume that any interaction with peripherals will be tricky when a hard fault has happened -- but maybe this is recoverable somehow and it'd be possible to make it work.

All in all I want to hold off on trying to patch together something that works until I can rule out some dimensions of complexity, and not having to pay special attention to how panics are routed through the NVIC stack would be one such dimension. Maybe you could make it work with the existing tools as you describe. For my use-case I am looking for something that I will be able to trust and reason about, including not having to worry about something breaking in future refactors etc., taking into consideration the human factor of possibly having someone else write that code without the context I have in my head, ie the usual software development stuff 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants