Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interrupt: Optimize restore on AVR and MSP430 #40

Merged
merged 2 commits into from
Dec 9, 2022
Merged

interrupt: Optimize restore on AVR and MSP430 #40

merged 2 commits into from
Dec 9, 2022

Conversation

taiki-e
Copy link
Owner

@taiki-e taiki-e commented Sep 17, 2022

As we have already been doing for pre-v6 ARM, avoid unneeded branch and mask.

/// Restores the previous interrupt state.
#[inline]
#[instruction_set(arm::a32)]
pub(super) unsafe fn restore(State(prev): State) {
// SAFETY: the caller must guarantee that the state was retrieved by the previous `disable`,
unsafe {
// Do not use `nomem` and `readonly` because prevent preceding memory accesses from being reordered after interrupts are enabled.
asm!("msr cpsr_c, {0}", in(reg) prev, options(nostack));
}
}

@taiki-e taiki-e added O-avr Target: AVR O-msp430 Target: MSP430 labels Sep 17, 2022
@taiki-e
Copy link
Owner Author

taiki-e commented Sep 17, 2022

The followings are examples of the changes in code generation by this patch.
(Somehow I used 128-bit atomic store, so there are lots of mov (msp430) and std (avr)...)

before(msp430-none-elf):

00000000 <lib::atomic_u128_store::seq_cst::hb020bdfd361679f5>:
       0: 0d 42         mov     r2, r13
       2: 32 c2         dint
       4: 03 43         nop
       6: 9c 41 10 00 0e 00     mov     16(r1), 14(r12)
       c: 9c 41 0e 00 0c 00     mov     14(r1), 12(r12)
      12: 9c 41 0c 00 0a 00     mov     12(r1), 10(r12)
      18: 9c 41 0a 00 08 00     mov     10(r1), 8(r12)
      1e: 9c 41 08 00 06 00     mov     8(r1), 6(r12)
      24: 9c 41 06 00 04 00     mov     6(r1), 4(r12)
      2a: 9c 41 04 00 02 00     mov     4(r1), 2(r12)
      30: 9c 41 02 00 00 00     mov     2(r1), 0(r12)
      36: 3d b2         bit     #8, r13
      38: 03 24         jeq     $+8
      3a: 03 43         nop
      3c: 32 d2         eint
      3e: 03 43         nop
      40: 30 41         ret

after(msp430-none-elf):

00000000 <lib::atomic_u128_store::seq_cst::hb020bdfd361679f5>:
       0: 0d 42         mov     r2, r13
       2: 32 c2         dint
       4: 03 43         nop
       6: 9c 41 10 00 0e 00     mov     16(r1), 14(r12)
       c: 9c 41 0e 00 0c 00     mov     14(r1), 12(r12)
      12: 9c 41 0c 00 0a 00     mov     12(r1), 10(r12)
      18: 9c 41 0a 00 08 00     mov     10(r1), 8(r12)
      1e: 9c 41 08 00 06 00     mov     8(r1), 6(r12)
      24: 9c 41 06 00 04 00     mov     6(r1), 4(r12)
      2a: 9c 41 04 00 02 00     mov     4(r1), 2(r12)
      30: 9c 41 02 00 00 00     mov     2(r1), 0(r12)
      36: 02 4d         mov     r13, r2
      38: 30 41         ret

before(avr-unknown-gnu-atmega328):

00000000 <lib::atomic_u128_store::seq_cst::hb020bdfd361679f5>:
       0: af b7         in      r26, 63
       2: f8 94         cli
       4: fc 01 66 87   <unknown>
       8: 77 87         std     Z+7, r23
       a: 44 87         std     Z+4, r20
       c: 55 87         std     Z+5, r21
       e: 22 87         std     Z+2, r18
      10: 33 87         std     Z+3, r19
      12: 00 87         std     Z+0, r16
      14: 11 87         std     Z+1, r17
      16: e6 82         std     Z+6, r14
      18: f7 82         std     Z+7, r15
      1a: c4 82         std     Z+4, r12
      1c: d5 82         std     Z+5, r13
      1e: a2 82         std     Z+2, r10
      20: b3 82         std     Z+3, r11
      22: 80 82         std     Z+0, r8
      24: 91 82         std     Z+1, r9
      26: aa 23         tst     r26
      28: 02 f0         brmi    <unknown>
      2a: 08 95         ret
      2c: 78 94         sei
      2e: 08 95         ret

after(avr-unknown-gnu-atmega328):

00000000 <lib::atomic_u128_store::seq_cst::hb020bdfd361679f5>:
       0: af b7         in      r26, 63
       2: f8 94         cli
       4: fc 01 66 87   <unknown>
       8: 77 87         std     Z+7, r23
       a: 44 87         std     Z+4, r20
       c: 55 87         std     Z+5, r21
       e: 22 87         std     Z+2, r18
      10: 33 87         std     Z+3, r19
      12: 00 87         std     Z+0, r16
      14: 11 87         std     Z+1, r17
      16: e6 82         std     Z+6, r14
      18: f7 82         std     Z+7, r15
      1a: c4 82         std     Z+4, r12
      1c: d5 82         std     Z+5, r13
      1e: a2 82         std     Z+2, r10
      20: b3 82         std     Z+3, r11
      22: 80 82         std     Z+0, r8
      24: 91 82         std     Z+1, r9
      26: af bf         out     63, r26
      28: 08 95         ret

@taiki-e
Copy link
Owner Author

taiki-e commented Sep 17, 2022

cc @cr1901: I believe this solves the code size issue that you mentioned in #39 (comment).

@cr1901
Copy link

cr1901 commented Sep 17, 2022

Your approach with no branches was originally how I wanted to do it. You will notice my implementation of critical-section has a branch. Tragically, the entire sr is writeable, and I have no idea whether the state of sr will be clobbered instead of just GIE being set to enable if I don't set only the GIE bit on restore.

image

So I had to keep the branch. IMO, it's still worth testing whether having if r & 0x8 != 0 in restore is still a net win for size tho. But see below.

I believe this solves the code size issue that you mentioned in #39 (comment).

I probably should not have brought up the code size thing- I don't think your crate will have the size problem, because disable and restore aren't implementing functions declared as extern in another crate (like msp430 is doing for critical-section).1 Additionally, critical-section only allows bool, u8, u16 to be passed as state between acquire and release. With these restrictions, from my own testing transmuting sr to/from u16 generates better code instead of using a bool or wrapper.

Footnotes

  1. The size difference between bool and u16 is related to is a bug I have to file against rustc itself (but haven't yet). AFAICT from my own testing, your crate should not be affected by using bool vs u16.

@taiki-e
Copy link
Owner Author

taiki-e commented Oct 17, 2022

Sorry for the late response.

Tragically, the entire sr is writeable, and I have no idea whether the state of sr will be clobbered instead of just GIE being set to enable if I don't set only the GIE bit on restore.

Yeah, IIUC this approach may cause problems in code where we don't know what the user will do after the interrupt is disabled, depending on how LLVM handles ~{sr} clobber constraint that used when options(preserves_flags) is not passed.

Clobbering sr is not push/pop, but for bits that can be modified by arithmetic operations (C, Z, N, V), I believe it is something like "prevent subsequent operations from depending on the contents of sr before the asm call" (otherwise it would not be possible to write arithmetic operations using inline assembly1). I'm not sure how bits other than bits that can be modified by arithmetic operations are handled, but at least for the I bit, LLVM will not automatically restore them (otherwise it would not be possible to write code to disable interrupts).

And if the above is correct:

  • In the msp430's case, restore may incorrectly restore bits set or cleared by the user if LLVM does not restore other bits.
  • In the portable-atomic's case, closures perform only memory access (all), comparisons (compare_exchange, fetch_max, etc.), integer arithmetic operations (fetch_add, fetch_sub), and bitwise operations (fetch_and, fetch_or, etc.) in the critical section.

So, I think this approach is fine for portable-atomic's case. Thoughts?

I probably should not have brought up the code size thing- I don't think your crate will have the size problem, because disable and restore aren't implementing functions declared as extern in another crate (like msp430 is doing for critical-section)

Thanks for the clarification. I probably should have asked first about whether or not it is a portable-atomic's issue.

AFAICT from my own testing, your crate should not be affected by using bool vs u16.

Thanks for the testing. I can also confirm that which one is used as a state does not affect the generated code.

Below is the code generated for the current main branch that uses bool as a state, I can see that the r & 0x8 (bit #8, r13) is deferred until just before the branch (jeq $+8). Using u16 as a state generates the same code.

00000000 <lib::atomic_u128_store::seq_cst::hb020bdfd361679f5>:
       0: 0d 42         mov     r2, r13
       2: 32 c2         dint
       4: 03 43         nop
       6: 9c 41 10 00 0e 00     mov     16(r1), 14(r12)
       c: 9c 41 0e 00 0c 00     mov     14(r1), 12(r12)
      12: 9c 41 0c 00 0a 00     mov     12(r1), 10(r12)
      18: 9c 41 0a 00 08 00     mov     10(r1), 8(r12)
      1e: 9c 41 08 00 06 00     mov     8(r1), 6(r12)
      24: 9c 41 06 00 04 00     mov     6(r1), 4(r12)
      2a: 9c 41 04 00 02 00     mov     4(r1), 2(r12)
      30: 9c 41 02 00 00 00     mov     2(r1), 0(r12)
      36: 3d b2         bit     #8, r13                  <-----
      38: 03 24         jeq     $+8
      3a: 03 43         nop
      3c: 32 d2         eint
      3e: 03 43         nop
      40: 30 41         ret

Footnotes

  1. Well, it is possible to restore all bits using mov r2, {}/mov {},r2, but I don't think that is necessary when options(preserves_flags) is not passed.

@cr1901
Copy link

cr1901 commented Oct 17, 2022

I'm feeling under the weather today, so I don't have much bandwidth to give thoughts, but here's a summary:

I'm not sure how bits other than bits that can be modified by arithmetic operations are handled, but at least for the I bit, LLVM will not automatically restore them

None of the other bits are considered arithmetic flags. LLVM shouldn't generate code that reads/writes to those other bits, either directly (mov, bis) or as a side effect of other insns (anything arithmetic will change arithmetic flags :P). So you're on your own if you need to modify them.

So, I think this approach is fine for portable-atomic's case. Thoughts?

Based on the two bullet points above contrasting msp430 and portable-atomic, I think what you're saying is reasonable.

Minddump for me follows:

I think there should be an explicit comment about how portable-atomic promises to never modify the status register bits that LLVM doesn't know about. Due to the lack of preserves_flags, overwriting the status register wholesale is safe for restoring pre-interrupt state, because LLVM can tolerate not knowing those bits values.

This assumption can't be made for a generic critical section like the one the msp430 crate provides. Overwriting those arithmetic bits are still fine, but other flags can potentially be changed by explicit user intervention in the critical section (i.e.: more inline asm), and those changes1 should persist after the critical section is done. So it's up to the restore code inline asm block to preserve those bits, and a wholesale status register overwrite can't do that, while conditionally setting the GIE flag will.

That being said, I think the nop { mov {}, r2 { nop dance is still required for architectural reasons:

image

  1. Minus enabling GIE within the critical section, in which case you're on your own.

@taiki-e
Copy link
Owner Author

taiki-e commented Oct 19, 2022

I think there should be an explicit comment about how portable-atomic promises to never modify the status register bits that LLVM doesn't know about.

Agreed. I'll add a comment about this.

That being said, I think the nop { mov {}, r2 { nop dance is still required for architectural reasons:

Good catch, thanks! I've pushed the fix.


As for AVR, bits in the status register (SREG) are modified as a side effect of arithmetic operations or others, with the exception of the I (Global Interrupt Enable) bit. So, this optimization should be fine. Actually, it seems avr-device crate recently adopted the same optimization: Rahix/avr-device@280d685

bors bot added a commit that referenced this pull request Oct 19, 2022
44: interrupt: Various improvements r=taiki-e a=taiki-e

- Support RISC-V supervisor mode under `portable_atomic_s_mode` cfg

  > On RISC-V without A-extension, this generates code for machine-mode (M-mode) by default. If you pass the `--cfg portable_atomic_s_mode` together, this generates code for supervisor-mode (S-mode). In particular, `qemu-system-riscv*` uses [OpenSBI](https://github.com/riscv-software-src/opensbi) as the default firmware.

- Support disabling FIQs on pre-v6 ARM under `portable_atomic_disable_fiq` cfg

  > On pre-v6 ARM, this disables only IRQs by default. For many systems (e.g., GBA) this is enough. If the system need to disable both IRQs and FIQs, you need to pass the `--cfg portable_atomic_disable_fiq` together.

- Interrupt-related documentation improvements

  - [README.md](https://github.com/taiki-e/portable-atomic/blob/18961fab6716c3fe45e89cbf2abf46c6a10fad21/README.md#optional-cfg)
  - [src/imp/interrupt/README.md](https://github.com/taiki-e/portable-atomic/blob/18961fab6716c3fe45e89cbf2abf46c6a10fad21/src/imp/interrupt/README.md)

- Defer mask until just before branch

  > This does not change the code generation, but in the actual generated code the mask is deferred until just before the branch, like this. Since there has been some misleading discussion about this in the past, we will use code that more closely matches the generated code.

  For MSP430 and AVR, it will be done in #40.

Co-authored-by: Taiki Endo <[email protected]>
@taiki-e
Copy link
Owner Author

taiki-e commented Dec 9, 2022

Updated comment on restore.

// This clobbers the entire status register, but we never explicitly modify
// flags within a critical session, and the only flags that may be changed
// within a critical session are the arithmetic flags that are changed as
// a side effect of arithmetic operations, etc., which LLVM recognizes,
// so it is safe to clobber them here.
// See also the discussion at https://github.com/taiki-e/portable-atomic/pull/40.

bors r+

@bors
Copy link
Contributor

bors bot commented Dec 9, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-avr Target: AVR O-msp430 Target: MSP430
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants