Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible miscompilation in LLVM #573

Closed
tronje opened this issue Aug 6, 2024 · 19 comments
Closed

Possible miscompilation in LLVM #573

tronje opened this issue Aug 6, 2024 · 19 comments
Labels
compiler-bug Not a bug in avr-hal, but a bug in the rust compiler/LLVM

Comments

@tronje
Copy link

tronje commented Aug 6, 2024

Hi, it's me again, the guy with the weird problems. This time, I believe I have a decent grasp on what's going wrong. It's documented with a minimal example and a README here: https://github.com/tronje/llvm-avr-compiler-bug

The short version is that under certain conditions, the Y register is used to load struct members into registers, as an optimization. This causes stack corruption (maybe?) and causes incorrect program behavior. I do believe I did a decent job explaining it in the README of the project I linked. The Rust code should be fairly self-explanatory as well.
It does include some weird things, like extra booleans to get the struct to be a certain size, without which the bug does not occur.

Reasons I suspect the Y register to be the culprit:

In order to access stack locations, avr-gcc will set up a 16-bit frame pointer in R29:R28 (Y) because the stack pointer (SP) cannot be used to access stack slots.

source

https://github.com/rust-lang/llvm-project/blob/rustc-1.80.0/llvm/lib/Target/AVR/AVRRegisterInfo.cpp#L80-L91

And whenever the example program is changed in a way that does not cause the bug to occur, the offending use of Y is also not included in the assembly.

I hope it's okay to report this here; I realize this is just avr-hal, not LLVM 🙂 But I was hoping to get an opinion and perhaps some advice on how to proceed here. Thanks!

@tronje
Copy link
Author

tronje commented Aug 7, 2024

Friendly page for you, @Patryk27. I saw you had some great insights here as well.

@Patryk27
Copy link
Contributor

Patryk27 commented Aug 7, 2024

Thanks for reporting! I've already got two other AVR miscompilations at hand, so I won't be able to analyze it right now, but I'll get back to you 🙂

@tronje
Copy link
Author

tronje commented Aug 7, 2024

Little update with an extra finding: turning off LTO (no lto field in Cargo.toml) fixes the bug. I do still see two ldd instructions that reference Y in the assembly, though:

ldd	r14, Y+1	; 0x01
ldd	r15, Y+2	; 0x02

Not sure if these are harmful or not...

edit: in our larger project, from which we derived the minimal example, the instruction is generated even without LTO:

ld	r19, Y

@Patryk27
Copy link
Contributor

Patryk27 commented Aug 7, 2024

Those are most likely stack spills, i.e. when a function uses more data that can be kept within registers, the extra gets spilled onto stack using std and ldd - you'll probably see std Y+1, something etc. somewhere before in the flow.

@Rahix Rahix added the compiler-bug Not a bug in avr-hal, but a bug in the rust compiler/LLVM label Aug 11, 2024
@tronje
Copy link
Author

tronje commented Aug 19, 2024

Friendly ping. Any chance of an update? 🙏

Should I perhaps report this elsewhere, too? To Rust or LLVM directly?

@Patryk27
Copy link
Contributor

Hi, no progress yet, still stuck on llvm/llvm-project#102936.

If you have some spare time to investigate this issue more, you could try narrowing it down to a single *.ll file - that would be helpful for LLVM-people.

@tronje
Copy link
Author

tronje commented Aug 19, 2024

Hi, no progress yet, still stuck on llvm/llvm-project#102936.

Thanks for the update!

If you have some spare time to investigate this issue more, you could try narrowing it down to a single *.ll file - that would be helpful for LLVM-people.

We'll look into it 🙂

@Patryk27
Copy link
Contributor

Patryk27 commented Aug 20, 2024

Status: on newest LLVM, compiling this code crashes with:

rustc: /x/llvm-project/llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp:780: bool {anonymous}::AVRExpandPseudo::expand(Block&, BlockIt) [with unsigned int OP = 334; Block = llvm::MachineBasicBlock; BlockIt = llvm::MachineInstrBundleIterator<llvm::MachineInstr>]: Assertion `Imm <= 62 && "Offset is out of range"' failed.

That's because when loading data from stack, LLVM uses the LDDWRdPtrQ instruction, which is limited to "look behind" of 62 bytes (i.e. it can't load data that's further than 62 bytes away from the current stack pointer, simply because the underlying AVR instructions don't support more).

I'll see if there's something we can do here - maybe instead of generating:

ldd reg, Y+offset

... for larger offsets the codegen could do:

addi Y, offset
ldd reg, Y+0
subi Y, offset

tl;dr in order to load self.d LLVM generates something like loadFromStack(64), but the appropriate AVR instruction (ldd) doesn't support such a large offset

@tronje
Copy link
Author

tronje commented Aug 20, 2024

Ah, interesting. Thanks very much for looking into this! Would you say this is something I could/should look into? I've never worked on either LLVM specifically or compiler code in general, but I do like to think that I have a decent grasp on C++.

And I suppose until this is fixed in LLVM, there's not much we can do to work around this, is there?

@Patryk27
Copy link
Contributor

Patryk27 commented Aug 20, 2024

Would you say this is something I could/should look into?

I haven't worked on LLVM until once day I randomly got around to fixing AVR bugs, so this is a nice opportunity, if you're not prone to banging your head against the wall 😄

The instructions boil down to:

  1. Compile LLVM locally (https://llvm.org/docs/CMake.html, using -DLLVM_TARGETS_TO_BUILD='AVR'),
  2. Prepare a minimal *.ll file (either through cargo rustc -- --emit=llvm-ir or by writing one manually, e.g. this boi looks like a starting point),
  3. Now, if:
    3a. ... you're investigating a codegen problem, use llc -mtriple=avr to compile *.ll into an *.obj file and then either use simavr or avr-objdump to see the assembly; if it looks wrong / doesn't work, investigate LLVM's source code.
    3b. ... you're investigating a crash, just launch llc or whatever that causes the crash to happen and investigate LLVM's source code, grepping by the assertion's error message etc.
  4. Having a fix ready, use llvm-lit ../test/CodeGen/AVR to run all AVR-related tests and see if you didn't break anything else by accident.

Having that said, yesterday I managed to fix that other bug and now I'm looking into this one, so don't feel forced to play around with LLVM on your own, it's always a mix of fun and frustration.

And I suppose until this is fixed in LLVM, there's not much we can do to work around this, is there?

It looks like the underlying codegen bug is related to the object/stack size going above 62 bytes, so the only solution for now seems to be "avoid huge objects".

Fortunately, once a fix lands in LLVM, it can be quickly cherry-picked to rustc's fork of LLVM, so once the fix is ready, the timeline to getting it into rustc is hours/days, not half a year.

@Patryk27
Copy link
Contributor

Got progress! The minimal offending *.ll is surprisingly small:

define void @main(ptr %self) {
start:
  %1 = getelementptr i8, ptr %self, i16 61
  %2 = load i32, ptr %1, align 1
  store i32 %2, ptr null, align 1
  ret void
}

(got it via cargo rustc -- --emit=llvm-ir and then using llvm-reduce)

Now, onto discovering what's going on here 🔍

@Patryk27
Copy link
Contributor

Patryk27 commented Aug 20, 2024

Alright, it looks like the problem is within those two checks:

Those functions are related to the "if offset is too high, use the alternative route" logic and they both are off by one - the maximum allowed offset should be 62, not 63 (which is what isUInt<6>() allows).

Adjusting the checks to <= 62 and recompiling rustc makes it generate correct code 🎉

So now lemme just prepare a proper LLVM test, pull request and let's keep the fingers crossed.

@tronje
Copy link
Author

tronje commented Aug 20, 2024

Wow, thank you so much for the great work!

@Patryk27
Copy link
Contributor

Patryk27 commented Sep 12, 2024

Status: waiting for llvm/llvm-project#106993.

@Patryk27
Copy link
Contributor

Status: fixed in the newest toolchain 🙂

@tronje
Copy link
Author

tronje commented Oct 11, 2024

Status: fixed in the newest toolchain 🙂

You absolute hero, thank you very much!! 🚀

@tronje
Copy link
Author

tronje commented Oct 14, 2024

I haven't had a chance to test on hardware yet, but using the latest nightly toolchain does still generate ldd instructions with offsets greater than 62 (specifically, 63):

     670:	9f ad       	ldd	r25, Y+63	; 0x3f
     680:	3f ac       	ldd	r3, Y+63	; 0x3f
     68c:	5f ac       	ldd	r5, Y+63	; 0x3f
     722:	3f ac       	ldd	r3, Y+63	; 0x3f
     72e:	7f ac       	ldd	r7, Y+63	; 0x3f
     73a:	ff ac       	ldd	r15, Y+63	; 0x3f
    30fa:	9f ad       	ldd	r25, Y+63	; 0x3f
    310e:	9f ad       	ldd	r25, Y+63	; 0x3f

Reading the comment here, I suppose those would be 8-bit loads, and therefore okay?

I'll hopefully get a chance to test for real later this week.

Just in case:

$ rustup run nightly rustc --version
rustc 1.84.0-nightly (27861c429 2024-10-13)

@Patryk27
Copy link
Contributor

Patryk27 commented Oct 14, 2024

I suppose those would be 8-bit loads, and therefore okay?

Yes, those are alright - that particular commit you linked is about limiting 16-bit loads, since those are expanded into a pair of:

ldd reg1, Y+offset
ldd reg2, Y+offset+1

... and so a 16-bit load can have a maximum offset of 62 (as that will make the second ldd have offset of 63, which is the maximum allowed).

@tronje
Copy link
Author

tronje commented Oct 14, 2024

Okay, thanks for clarifying!

I remembered my little test case here: https://github.com/tronje/llvm-avr-compiler-bug

And this is indeed now fixed with the new toolchain. Thank you again! I'll close this now then.

@tronje tronje closed this as completed Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-bug Not a bug in avr-hal, but a bug in the rust compiler/LLVM
Projects
None yet
Development

No branches or pull requests

3 participants