Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVR: miscompilation, or stack overflow, when running cryptographic code #109000

Open
xphoniex opened this issue Mar 10, 2023 · 34 comments
Open

AVR: miscompilation, or stack overflow, when running cryptographic code #109000

xphoniex opened this issue Mar 10, 2023 · 34 comments
Labels
A-codegen Area: Code generation A-lto Area: Link Time Optimization C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-AVR Target: AVR processors (ATtiny, ATmega, etc.) P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@xphoniex
Copy link
Contributor

Using rust to compile for Arduino target, I'm see a lot of weird and random behaviors from compiler when making changes to profile and compiler setting. (I'm using avr-hal for uno, here's a sample)

As an example, if I use hmac-sha256 crate:

    let h = hmac_sha256::HMAC::mac(b"hello", b"key"); // hmac for input "hello" with key "key"
    print_hex_arr(" mac", &mut serial, &h);
    let h = hmac_sha256::Hash::hash(b"hello"); // sha256 of "hello"
    print_hex_arr("hash", &mut serial, &h);

I get different outputs depending on opt-level and lto, correct one is:

 mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

which I'm getting from opt-level=2 with both lto=true and unspecified, however:

// opt-level = "s", lto = true
 mac = d238e536e20f0b1b210644248134891454c20ad29c10f75756218b6ab8f5c17d
hash = 5f23619d4ed28dbf06d25969fa262384869011f4a44469c1eb38c50b15b01c4b

// opt-level = "z", lto = true
 mac = 03942da43b5034c3f87f9652c4d569392f444af18c66a7c587db8065d4c79faf
hash = 5f23619d4ed28dbf06d25969fa262384869011f4a44469c1eb38c50b15b01c4b

// opt-level = "s | z", lto unspecified
 mac = d238e536e20f0b1b210644248134891454c20ad29c10f75756218b6ab8f5c17d
hash = 5f23619d4ed28dbf06d25969fa262384869011f4a44469c1eb38c50b15b01c4b

not only that, even if I run fns from another crate, even with the working opt-level = 2, I get incorrect output:

  other_crate::do_something();
  let h = hmac_sha256::Hash::hash(b"hello");
  print_hex_arr("hash", &mut serial, &h);

has a wrong output and instead this works:

  let h = hmac_sha256::Hash::hash(b"hello");
  print_hex_arr("hash", &mut serial, &h);
  other_crate::do_something();

before I compile a bug report at gcc, I want to make sure the fault is not at rust side. here's the final command that rustc is running to link the final elf:

$ avr-gcc -mmcu=atmega328p -Wl,--as-needed,--print-memory-usage,--detailed-mem-usage -fpack-struct -fshort-enums -Wstack-usage=20 -Wall -Wextra -fstack-usage /tmp/rustcsspEI5/symbols.o /project/target/avr-atmega328p/release/deps/arduino_lib-1e0b86c11d560d13.arduino_lib.9b917980-cgu.0.rcgu.o -Wl,--as-needed -L /project/target/avr-atmega328p/release/deps -L /project/target/release/deps -L . -L /home/usr/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/avr-atmega328p/lib -Wl,-Bstatic /project/target/avr-atmega328p/release/deps/libcompiler_builtins-4dcc5d36d44c3317.rlib -Wl,-Bdynamic -lgcc -Wl,-znoexecstack -L /home/usr/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/avr-atmega328p/lib -o /project/target/avr-atmega328p/release/deps/arduino_lib-1e0b86c11d560d13.elf -Wl,--gc-sections -Wl,-O1 -Wl,--strip-all

as you can see it's using -O1 which I haven't specified anywhere. (See avr-atmega328p.json)

I have tried overriding in my avr-atmega328p.json file inside pre-link-args but that doesn't affect anything. Can someone shed some light on what is going wrong here? If it's the -O1 flag, how do I override that?

@saethlin
Copy link
Member

We have some history with AVR miscompilations, so just glancing at this issue my priors are that it is more likely than not that this is a bug somewhere in rustc/LLVM.

You've included a fair bit of information in here, but there's no reproducer that I can run to observe the bug. Can you provide a program (snippet or link to a repo/example) that demonstrates the bug, and precise instructions on how to compile it and observe the bug? It would be a great help if someone tackling this doesn't need to have any particular hardware on hand in order to observe the bug.

@workingjubilee workingjubilee added O-AVR Target: AVR processors (ATtiny, ATmega, etc.) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Mar 11, 2023
@workingjubilee
Copy link
Member

To add to saethlin's request for a reproducer:

An entire repo is often fine as a reproducer for AVR issues, even though it is usually not for other targets. Often the problem is either so vague that further minimization is futile (so it can't be minimized) or so glaring that further minimization is pointless (making it apparent at a glance at the Cargo.toml or something like that). In particular, a full repo can be desired because some common ecosystem dependencies have used unsound code recently and you may have included one of them in your dependency graph, so please check in your Cargo.lock when pushing the repo so I can see what exact versions of crates you resolved.

Also, for that repo, please include the exact rustc/cargo build commands you are using to produce this issue, ideally. A file in the repo listing all of them would not be too much detail.

@workingjubilee workingjubilee added A-codegen Area: Code generation A-lto Area: Link Time Optimization C-bug Category: This is a bug. and removed A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Mar 11, 2023
@xphoniex
Copy link
Contributor Author

xphoniex commented Mar 11, 2023

zipped file: avr-bug.zip

$ tree .
.
├── avr-atmega328p.json
├── .cargo
│   └── config.toml
├── Cargo.lock
├── Cargo.toml
├── rust-toolchain.toml
└── src
    └── main.rs

avr-atmega328p.json

{
  "arch": "avr",
  "atomic-cas": false,
  "cpu": "atmega328p",
  "data-layout": "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8",
  "eh-frame-header": false,
  "exe-suffix": ".elf",
  "executables": true,
  "late-link-args": {
    "gcc": [
      "-lgcc"
    ]
  },
  "linker": "avr-gcc",
  "linker-is-gnu": true,
  "llvm-target": "avr-unknown-unknown",
  "max-atomic-width": 8,
  "no-default-libraries": false,
  "pre-link-args": {
    "gcc": [
      "-mmcu=atmega328p",
      "-Wl,--as-needed,--print-memory-usage,--detailed-mem-usage",
      "-fpack-struct",
	  "-fshort-enums",
	  "-Wstack-usage=20",
	  "-Wall","-Wextra","-fstack-usage"
    ]
  },
  "target-c-int-width": "16",
  "target-pointer-width": "16"
}

.cargo/config.toml

rustflags = ["-Z emit-stack-sizes"]

[target.avr-atmega328p]
runner = "qemu-system-avr -M uno -nographic -serial tcp::5678,server=on -bios"

[build]
target = "avr-atmega328p.json"

[unstable]
build-std = ["core"]

Cargo.toml

[package]
name = "avr-bug"
version = "0.1.0"
edition = "2021"

[dependencies]
panic-halt = "0.2.0"
ufmt = "0.1.0"
noble-secp256k1 = { git = "https://github.com/xphoniex/noble-secp256k1-rs", default-features = false, features = ["8-bit"] }
hmac-sha256 = { version = "1.1.6", default-features = false, features = ["opt_size"] }

[dependencies.arduino-hal]
git = "https://github.com/rahix/avr-hal"
rev = "4c9c44c314eb061ee20556ef10d45dea36e75ee4"
features = ["arduino-uno"]

[dependencies.avr-device]
version = "0.5.0"

[profile.dev]
panic = "abort"
debug = true
lto = true
opt-level = "s"

[profile.release]
panic = "abort"
codegen-units = 1
debug = true
lto = true
#opt-level = "s"
#opt-level = "z"
opt-level = 2
strip = true

[profile.dev.package.compiler_builtins]
overflow-checks = false

rust-toolchain.toml

[toolchain]
channel = "nightly"
components = [ "rust-src" ]
profile = "minimal"

src/main.rs

#![no_std]
#![no_main]

use panic_halt as _;

use arduino_hal::prelude::*;

use core::fmt::Debug;
use ufmt::uWrite;

fn print_hex_arr<S>(tag: &str, serial: &mut S, arr: &[u8])
where
    S: uWrite,
    <S as uWrite>::Error: Debug,
{
    ufmt::uwrite!(serial, "{} = ", tag).unwrap();
    for e in arr.iter() {
        ufmt::uwrite!(serial, "{:02x}", *e).unwrap();
    }
    ufmt::uwrite!(serial, "\r\n").unwrap();
}

fn print_hex_arr_rev<S>(tag: &str, serial: &mut S, arr: &[u8])
where
    S: uWrite,
    <S as uWrite>::Error: Debug,
{
    ufmt::uwrite!(serial, "{} = ", tag).unwrap();
    for e in arr.iter().rev() {
        ufmt::uwrite!(serial, "{:02x}", *e).unwrap();
    }
    ufmt::uwrite!(serial, "\r\n").unwrap();
}

#[arduino_hal::entry]
fn main() -> ! {
    let dp = arduino_hal::Peripherals::take().unwrap();
    let pins = arduino_hal::pins!(dp);
    let mut serial = arduino_hal::default_serial!(dp, pins, 57600);

    /*
    // Uncommenting this part changes the output of hmacs
    use noble_secp256k1::awint::{cc, inlawi, inlawi_ty, Bits, InlAwi};
    use noble_secp256k1::{BigNum, Curve};

    let mut private_key = inlawi!(0x02_u512);
    let curve = Curve::secp256k1();
    let public_key = curve.multiply_simple(&mut private_key);

    let mut buf = [0; 32];
    public_key.x.to_u8_slice(&mut buf);
    print_hex_arr_rev("x", &mut serial, &buf);
    public_key.y.to_u8_slice(&mut buf);
    print_hex_arr_rev("y", &mut serial, &buf);
    */

    let h = hmac_sha256::HMAC::mac(b"hello", b"key");
    print_hex_arr(" mac", &mut serial, &h);
    let h = hmac_sha256::Hash::hash(b"hello");
    print_hex_arr("hash", &mut serial, &h);

    loop {}
}
$ rustc -vV
rustc 1.69.0-nightly (ef934d9b6 2023-02-08)
binary: rustc
commit-hash: ef934d9b632b8ac00276558824664c104b92b5f0
commit-date: 2023-02-08
host: x86_64-unknown-linux-gnu
release: 1.69.0-nightly
LLVM version: 15.0.7

@xphoniex
Copy link
Contributor Author

xphoniex commented Mar 11, 2023

In order to run this you need both avr toolchain such as avr-gcc and qemu-system-avr. In a terminal run cargo run -r and in another:

$ telnet localhost 5678
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
 mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

then try changing opt-level, lto, and uncomment the other crate in the main.rs, and the move code around. you should be able to re-produce the bugs.

@workingjubilee
Copy link
Member

For future reference, creating and pushing a repo to GitHub or another host that I can clone with a single command would have been preferred over something that requires me to manually reconstruct the files inside a repository.

@xphoniex
Copy link
Contributor Author

For future reference, creating and pushing a repo to GitHub or another host that I can clone with a single command would have been preferred over something that requires me to manually reconstruct the files inside a repository.

You're right, added the zip file.

@workingjubilee
Copy link
Member

Hmm. I don't know, in that case! Fair enough?!

Thank you for the clear instructions, I will try to reproduce this soon.

Also you will want to update to a version of rustc of about February 19 or later, as an ABI fix for AVR targets was merged about then. It doesn't affect your visible source but it may affect your dependencies.

@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
@xphoniex
Copy link
Contributor Author

polite bump. any updates on this?

@jyn514 jyn514 added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Jun 1, 2023
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jun 1, 2023
@saethlin
Copy link
Member

saethlin commented Jun 1, 2023

The reproducer doesn't reproduce the bug on the latest nightly. At least it doesn't when I try it, I always get the correct output regardless of opt level.

If I turn off LTO though, I get

$ cargo +nightly build --release -j1
   Compiling compiler_builtins v0.1.92
   Compiling core v0.0.0 (/home/ben/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core)
LLVM ERROR: Expected a constant shift amount!
error: could not compile `core` (lib)

So perhaps we still have a bug but I think the one you meant to report here is fixed? Can you confirm?

@xphoniex
Copy link
Contributor Author

xphoniex commented Jun 1, 2023

Thanks for the reply. Only half is fixed:

After updating to nightly 1.72, this consistently produces the correct output regardless of compiler settings:

let h = hmac_sha256::HMAC::mac(b"hello", b"key"); // hmac for input "hello" with key "key"
print_hex_arr(" mac", &mut serial, &h);
let h = hmac_sha256::Hash::hash(b"hello"); // sha256 of "hello"
print_hex_arr("hash", &mut serial, &h);

(sidenote: how can I point my cargo to a locally compiled rustc? I can then git bisect and find out what fixed this from 1.62)

However if you go back to main.rs, you'll see a code block which is commented out:

    /*
    // Uncommenting this part changes the output of hmacs
    use noble_secp256k1::awint::{cc, inlawi, inlawi_ty, Bits, InlAwi};
    use noble_secp256k1::{BigNum, Curve};

    let mut private_key = inlawi!(0x02_u512);
    let curve = Curve::secp256k1();
    let public_key = curve.multiply_simple(&mut private_key);

    let mut buf = [0; 32];
    public_key.x.to_u8_slice(&mut buf);
    print_hex_arr_rev("x", &mut serial, &buf);
    public_key.y.to_u8_slice(&mut buf);
    print_hex_arr_rev("y", &mut serial, &buf);
    */

    let h = hmac_sha256::HMAC::mac(b"hello", b"key");
    print_hex_arr(" mac", &mut serial, &h);
    let h = hmac_sha256::Hash::hash(b"hello");
    print_hex_arr("hash", &mut serial, &h);

Uncomment it and let's run the code, cargo run -r in one terminal and telnet localhost 5678 in another. Remember, we expect:

 mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

But we get:

opt-level = 2:

x = c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5
y = b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe663b82f6f04ef2777
**O = dcfd06ba19e34eb9402ff79e0ca8507a8cc0f5a39a8c6b51b5e533570b2799de // ??
�= df04b29ead3cfe390f33ebe4749c7f518d6c1ebc450eb6e2f41d5c9e61d04376 // ??

opt-level = "s":

x = c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5
y = b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe663b82f6f04ef2777
 mac = 350cc5a62c6000f16a0ea550bb2aa4805fcb76530167a9d2f7ea75c5cea7b366 // ??
hash = 8cfa54f76e17be6575e0894ebbac4460cc02b6c8f818c96593dfa77ce158a6a4 // ??

opt-level = "z":

x = 0c8333020c4688a754bf3ad462f1e9f1fac80649a463ae4e4c1b04ead21e5da0 // ??
y = b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe763b8371104fdb818 // ??
 mac = cd3a72deda62ab9e5ce61720ced04b523d6227b9d36419d400f2088318413030 // ??
hash = 19217799055b90ffb23be75faefbd3df8bfeb0fc19778d82c2d3510c32978017 // ??

So, my initial guess what that some weird stack overflow is happening (from the code block we just uncommented) and decided to move the hmac part on top of the code so like:

    let h = hmac_sha256::HMAC::mac(b"hello", b"key");
    print_hex_arr(" mac", &mut serial, &h);
    let h = hmac_sha256::Hash::hash(b"hello");
    print_hex_arr("hash", &mut serial, &h);
    
    // Uncommenting this part changes the output of hmacs
    ...

Here are the outputs I'm getting now:

opt-level = 2:

 mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b // ✓
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 // ✓
// GETS STUCK HERE 

opt-level = "s":

 mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b // ✓
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 // ✓
x = 2351285e80f3f9e23a1fbceecedc1804c922223a2292951d54872339f728099e // ??
y = 29c0e92da06bc8723a6045500f56c85a5a484f4c4f3e33405a3015be140a3a6f // ??

opt-level = "z":

 mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b // ✓
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 // ✓
x = 79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798 // ??
y = 016f2077de62f7d85e99bad51fc94eb5f70363ae9ba724d3ff7b53573a192b48 // ??

as you can see, plenty of different variations, depending on just the compiler setting. this might have something to do with overflow, but shouldn't compiler catch it?

$ avr-size target/avr-atmega328p/release/avr-bug.elf
   text	   data	    bss	    dec	    hex	filename
  16752	    486	      1	  17239	   4357	target/avr-atmega328p/release/avr-bug.elf

@apiraino
Copy link
Contributor

apiraino commented Jun 1, 2023

WG-prioritization assigning priority (Zulip discussion).

@xphoniex a note about support of compile target avr-unknown-gnu-atmega328 that I don't see mentioned in this issue. This is a tier 3 compile target (policy) so this in practice means that support is volunteer-based and contributions are welcome.

@rustbot label -I-prioritize +P-low

@rustbot rustbot added P-low Low priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jun 1, 2023
@saethlin
Copy link
Member

saethlin commented Jun 1, 2023

(sidenote: how can I point my cargo to a locally compiled rustc? I can then git bisect and find out what fixed this from 1.62)

https://rustc-dev-guide.rust-lang.org/building/how-to-build-and-run.html?highlight=toolchain%20link#creating-a-rustup-toolchain

You probably want to cargo +stage1.

@Patryk27
Copy link
Contributor

Patryk27 commented Jun 2, 2023

this might have something to do with overflow, but shouldn't compiler catch it?

This could be another case of compiler_builtins providing its own intrinsics in place of libgcc ones - could you try adding this to .cargo/config:

build-std-features = ["compiler-builtins-mangled-names"]

... and trying again?

@xphoniex
Copy link
Contributor Author

xphoniex commented Jun 2, 2023

this might have something to do with overflow, but shouldn't compiler catch it?

This could be another case of compiler_builtins providing its own intrinsics in place of libgcc ones - could you try adding this to .cargo/config:

build-std-features = ["compiler-builtins-mangled-names"]

... and trying again?

.cargo/config.toml ?:

rustflags = ["-Z emit-stack-sizes"]
build-std-features = ["compiler-builtins-mangled-names"]

also tried

[unstable]
build-std = ["core"]
build-std-features = ["compiler-builtins-mangled-names"]

same output as before.

@xphoniex
Copy link
Contributor Author

xphoniex commented Jun 2, 2023

(sidenote: how can I point my cargo to a locally compiled rustc? I can then git bisect and find out what fixed this from 1.62)

https://rustc-dev-guide.rust-lang.org/building/how-to-build-and-run.html?highlight=toolchain%20link#creating-a-rustup-toolchain

You probably want to cargo +stage1.

This was such a bad idea, started with

$ git checkout master
$ git bisect start
$ git bisect good
$ git bisect bad 1.66.0

but it won't compile with 1.66 because:

unknown print request `split-debuginfo`

which I suppose is asked for by cargo?

decided to start from 1.67 and:

error: failed to load manifest for workspace member `/github.com/rust-lang/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/sysroot`

Caused by:
  failed to read `/github.com/rust-lang/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/sysroot/Cargo.toml`

Caused by:
  No such file or directory (os error 2)

tried stage2, no luck.

and git bisect is now stuck:

$ git bisect bad
The merge base f058493307813a5298851f79ad6187f4ad2c7e15 is bad.
This means the bug has been fixed between f058493307813a5298851f79ad6187f4ad2c7e15 and [33c3d101280c8eb3cd8af421bfb56a8afcc3881d].
$ git bisect bad
The merge base f058493307813a5298851f79ad6187f4ad2c7e15 is bad.
This means the bug has been fixed between f058493307813a5298851f79ad6187f4ad2c7e15 and [33c3d101280c8eb3cd8af421bfb56a8afcc3881d].
...

also tried 1.68 separately, and that doesn't work either, same error as 1.67.

@Patryk27
Copy link
Contributor

Patryk27 commented Jun 3, 2023

Okie, I think I got it - your code prints invalid results¹ because you've got a stack overflow 😇

Since AVR doesn't provide memory hardware watches (and rustc/llvm don't insert stack canaries), stack overflow in here doesn't cause the program to halt but rather simply trashes some other stuff present in RAM - in this case, this stuff happens to be apparently some values used to calculate the hashes.

We can prove it with simavr - first, let's start the simulator:

simavr -m atmega328p ./target/avr-atmega328p/release/avr-bug.elf -g

... then, in another tab, let's run avr-gcc:

$ avr-gdb
(gdb) file ./target/target/avr-atmega328p/release/avr-bug.elf
(gdb) layout asm
(gdb) layout regs
(gdb) target remote :1234

Now, before we're off to detecting stack trashing, we must talk about memory layout - on this AVR, since there's only two kilobytes of RAM, it's pretty straightforward:

0 .. X    - "constant" stuff (strings, static arrays, static variables etc.)
X .. 2 KB - stack

When an AVR program starts, it copies "static stuff" from flash memory into RAM² - that stuff ends up at the beginning of the memory; after that, we've got the stack which starts at the end of RAM and grows down (i.e. towards the lower addresses).

That is to say, if a program uses 32 bytes of static data, it leaves 2048 - 32 = 2016 bytes for stack frames; any stack frame larger than that will, sooner or later, unavoidably overwrite some random stuff present in RAM.

Okie, okie, let's get back to business - let's setup a memory watch to find out what messes up the RAM!

After running a single instruction:

(gdb) si

... the list of registers at the top gets filled with values - in particular, we're interested in the SP (stack pointer) register:

SP             0x8ff               0x8008ff 

... and, in-particular in-particular, we're interested in the second value there - 0x8008ff; since we haven't run any function yet, this register determines the end of our RAM (and, conversely, the beginning of our stack).

(this register has two values because 0x8ff is how the AVR sees it, while the second value is how gdb sees it - tldr)

Before we're able to setup the watch, we have to find one more value: size of our "constant stuff" section; to do that we can simply move forward a few instructions, up to an intrinsic called __do_copy_data:

Remote debugging using :1234
0x00000000 in __vectors ()
(gdb) si
0x00000068 in __trampolines_start ()
(gdb) si
0x0000006a in __trampolines_start ()
(gdb) si
0x0000006c in __trampolines_start ()
(gdb) si
0x0000006e in __trampolines_start ()
(gdb) si
0x00000070 in __trampolines_start ()
(gdb) si
0x00000072 in __trampolines_start ()
(gdb) si
__do_copy_data () at ../../../../gcc-12.2.0/libgcc/config/avr/lib1funcs.S:2405
(gdb)  

__do_copy_data is an internal function (linked automatically by avr-gcc and run before Rust's main()) that copies stuff from flash into RAM - in my case I've gotten:

   >0x74 <__do_copy_data>           ldi     r17, 0x02       ; 2                                                                                                                                         │
│   0x76 <__do_copy_data+2>         ldi     r26, 0x00       ; 0                                                                                                                                         │
│   0x78 <__do_copy_data+4>         ldi     r27, 0x01       ; 1                                                                                                                                         │
│   0x7a <__do_copy_data+6>         ldi     r30, 0x94       ; 148                                                                                                                                       │
│   0x7c <__do_copy_data+8>         ldi     r31, 0x3D       ; 61                                                                                                                                        │
│   0x7e <__do_copy_data+10>        rjmp    .+4             ;  0x84 <__do_copy_data+16>                                                                                                                 │
│   0x80 <__do_copy_data+12>        lpm     r0, Z+                                                                                                                                                      │
│   0x82 <__do_copy_data+14>        st      X+, r0                                                                                                                                                      │
│   0x84 <__do_copy_data+16>        cpi     r26, 0xD4       ; 212                                                                                                                                       │
│   0x86 <__do_copy_data+18>        cpc     r27, r17                                                                                                                                                    │
│   0x88 <__do_copy_data+20>        brne    .-10            ;  0x80 <__do_copy_data+12> 

... where cpi + cpc near the bottom determine the size of our "constant stuff" - and while it's not immediately obvious, if you step through the logic, it comes down to:

2 * 255 + 212 = 722
    r17

This means that our stack effectively ends at:

0x8008ff - 2048 + 722 = 0x8003d1

... so let's create a watch precisely for that:

(gdb) watch *0x8003d1

... let the program continue:

(gdb) continue

... and voilà, our RAM gets corrupted:

Continuing.

Hardware watchpoint 2: *0x8003d1

Old value = 0
New value = 56
0x00002480 in awint_core::logic::mul::<impl awint_core::data::bits::Bits>::mul_assign ()

... anything after this instruction is effectively undefined behavior (from Rust's point of view, at least).

We can even print stacktrace, for fun:

(gdb) bt
#0  0x00002480 in awint_core::logic::mul::<impl awint_core::data::bits::Bits>::mul_assign ()
#1  0x0000395e in noble_secp256k1::point::Point::double_assign_mod ()
#2  0x0000105a in avr_bug::__avr_device_rt_main ()
#3  0x0000101c in main ()

Soooo.... tl;dr he data too big for he god damn RAM.

Not all hope is lost, though! -- if you really need to compute those hashes on AVR, I think you could relatively easily use the avr-progmem crate to store those big bois:

https://github.com/jedisct1/rust-hmac-sha256/blob/e787a2e40ce00c7e27b336207cdec730226ff6f1/src/lib.rs#L127

... into flash so that they are not copied into RAM (effectively trading-off a bit of performance here).

That's just a hunch, but doing that would be my first step - and you could even upstream it later by hiding this avr-progmem behind some feature-flag or #[cfg(arch = "avr")].

(due diligence: I've been testing the program with https://reviews.llvm.org/D152059 and https://reviews.llvm.org/D152063 patches applied; it's possible that the current nightly compiler miscompiles something, but everything seems to be alright with those two patches.)

¹ or sometimes it hangs because (depending on the compiler flags and Moon's alignment) the code panics which causes it to enter the panic_halt's panic handler which is basically loop {}; you can detect that by pressing Ctrl+C in gdb and looking at the stack trace

² that's kinda-sorta required because most AVR instructions don't support reading stuff directly from the flash memory - and so if the program did not do that, each read into the flash memory would require an extra register and an extra instruction which could become wasteful in the long term

@fvilante
Copy link

fvilante commented Jul 1, 2023

Thanks @Patryk27 for the entire time you dedicated to present this solution.

@xphoniex problems of stack overflow in AVR unfurtunately are nasty because it produces undefined behaviour but no notification about it is generated, until we get it in the run-time.

Maybe eventually in future, someone can produce a script to use SIMAVR and GDB to test this case condition automagically.

Are we able to close this issue?

@xphoniex
Copy link
Contributor Author

xphoniex commented Jul 1, 2023

Thanks @Patryk27 for the write up, needed some time to verify this. I can't re-produce this entirely on my side:

I don't have simavr so I tried to see if qemu works:

$ qemu-system-avr -S -s -M uno -nographic -bios target/avr-atmega328p/release/avr-bug.elf

in another terminal:

$ avr-gdb -ex 'target remote :1234' target/avr-atmega328p/release/avr-bug.elf

got me connected. however, I can't see $SP being set after first si, unlike in your case:

Edit: can see $SP after one si in simavr.

0x00000000 in __vectors ()
(gdb) si
0x00000068 in __trampolines_start ()
(gdb) i r
r0             0x0                 0
r1             0x0                 0
r2             0x0                 0
r3             0x0                 0
r4             0x0                 0
r5             0x0                 0
r6             0x0                 0
r7             0x0                 0
r8             0x0                 0
r9             0x0                 0
r10            0x0                 0
r11            0x0                 0
r12            0x0                 0
r13            0x0                 0
r14            0x0                 0
r15            0x0                 0
r16            0x0                 0
r17            0x0                 0
r18            0x0                 0
r19            0x0                 0
r20            0x0                 0
r21            0x0                 0
r22            0x0                 0
r23            0x0                 0
r24            0x0                 0
r25            0x0                 0
r26            0x0                 0
r27            0x0                 0
r28            0x0                 0
r29            0x0                 0
r30            0x0                 0
r31            0x0                 0
SREG           0x80                128
SP             0x0                 0x0 <__vectors>
PC2            0x68                104
pc             0x34                0x68 <__trampolines_start>

let's continue anyway:

(gdb) display $SP
1: $SP = (void *) 0x0 <__vectors>
(gdb) si
0x0000006a in __trampolines_start ()
1: $SP = (void *) 0x0 <__vectors>
(gdb) si
0x0000006c in __trampolines_start ()
1: $SP = (void *) 0x0 <__vectors>
(gdb) si
0x0000006e in __trampolines_start ()
1: $SP = (void *) 0x0 <__vectors>
(gdb) si
0x00000070 in __trampolines_start ()
1: $SP = (void *) 0x0 <__vectors>
(gdb) si
0x00000072 in __trampolines_start ()
1: $SP = (void *) 0x800800
(gdb) si
0x00000074 in __do_copy_data ()
1: $SP = (void *) 0x8008ff

so my $SP is finally set to 0x8008ff right on __do_copy_data.

there seems to be an issue with the way you've calculated the height of static data:

(gdb) x/12i $pc
=> 0x74 <__do_copy_data>:	ldi	r17, 0x02	; 2
   0x76 <__do_copy_data+2>:	ldi	r26, 0x00	; 0
   0x78 <__do_copy_data+4>:	ldi	r27, 0x01	; 1
   0x7a <__do_copy_data+6>:	ldi	r30, 0x02	; 2
   0x7c <__do_copy_data+8>:	ldi	r31, 0x44	; 68
   0x7e <__do_copy_data+10>:	rjmp	.+4      	;  0x84 <__do_copy_data+16>
   0x80 <__do_copy_data+12>:	lpm	r0, Z+
   0x82 <__do_copy_data+14>:	st	X+, r0
   0x84 <__do_copy_data+16>:	cpi	r26, 0xE4	; 228
   0x86 <__do_copy_data+18>:	cpc	r27, r17
   0x88 <__do_copy_data+20>:	brne	.-10     	;  0x80 <__do_copy_data+12>
   0x8a <__do_clear_bss>:	ldi	r18, 0x02	; 2

IIUC, this loop keeps increasing X (which is actually r26:r27) at line 0x82 until it becomes $r17:0xE4 which is 0x02e4.

So the height is 0x0100 - 0x02e4 = 0x01e4 // 484, you can verify this using a break:

(gdb) break *0x8a
Breakpoint 1 at 0x8a
(gdb) continue
Continuing.

Breakpoint 1, 0x0000008a in __do_clear_bss ()
(gdb) i r $r26
r26            0xe4                228
(gdb) i r $r27
r27            0x2                 2

So, the end of our stack is either

0x8008ff - 2048 + 484 = 0x8002e3
// or
0x8008ff - 2048 + 484 + 255 = 0x8003e3

but when I create a watch for that:

(gdb) watch *0x8002e3
Hardware watchpoint 2: *0x8002e3
(gdb) watch *0x8003e3
Hardware watchpoint 3: *0x8003e3
(gdb) continue
Continuing.

program outputs:

 mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
x = 29f76222af71953c82f73506a881cc859bc20e5eb0edf6755466ce740c5b18a3
y = 01ce9b739b46c5a1353f47830a39c591552f4c874a29307f551f13010f013562

but I'm not getting this:

Continuing.

Hardware watchpoint 2: *0x8003d1

Old value = 0
New value = 56
0x00002480 in awint_core::logic::mul::<impl awint_core::data::bits::Bits>::mul_assign ()

manually examining using x/x shows the the value to have changed:

0x00000000 in __vectors ()
(gdb) x/x 0x8002e3
0x8002e3 <anon.4077e3db088e300c71796ada4487f52c.8+42>:	0x00000000
(gdb) watch *0x8002e3
Hardware watchpoint 1: *0x8002e3
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
avr_bug::__avr_device_rt_main () at src/main.rs:62
62	    loop {}
(gdb) x/x 0x8002e3
0x8002e3 <anon.4077e3db088e300c71796ada4487f52c.8+42>:	0x5ffe6f4d

--

Edit: watches are hit when using simavr, but unexpected fn:

Old value = 101
New value = 357
avr_device::interrupt::free<avr_device::devices::atmega328p::{impl#68}::take::{closure_env#0}, core::option::Option<avr_device::devices::atmega328p::Peripherals>> (f=...)
    at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/avr-device-0.5.0/src/interrupt.rs:228
228	            unsafe { restore(irq_flag); }

@Patryk27
Copy link
Contributor

Patryk27 commented Jul 1, 2023

Oh, neat - thanks for answering, I'll re-check my logic today / tomorrow 🙂

@xphoniex
Copy link
Contributor Author

xphoniex commented Jul 1, 2023

Oh, neat - thanks for answering, I'll re-check my logic today / tomorrow 🙂

Thank you for the help. Updated my post.

simavr seems to be working slightly differently than qemu-system-avr, latter keeps looping and pushes a single core utilization to 100%, and never seems to be yielding once it enters the loop.

Former is lighter on CPU, and releases, hence why gdb can show the Old value / New value, I guess?

So this is no longer rust's fault, but still three questions remaining:

  • How we can get the output using qemu-system-avr, if anyone knows.
  • correct stack size, 2048 - (0xff + 0xe4) or 2048 - (0xff + 0xff + 0xe4)
  • and finally, how can I get the current stack usage, after tweaking etc?

avr-size seems to be useless.

$ avr-size target/avr-atmega328p/release/avr-bug.elf
   text	   data	    bss	    dec	    hex	filename
  17410	    484	      1	  17895	   45e7	target/avr-atmega328p/release/avr-bug.elf

Edit: just realized the 484 shown here matches what I calculated in the above post, 0x0100 - 0x02e4 = 0x01e4. I was expecting it to show me the max stack usage, but it's actually static stack.

I can manually set watch points and run the program, but that has to be scripted and run multiple times to pinpoint it. And backtrace doesn't seem to have this info. Any ideas re this? This is the most important thing, as I can estimate how much stack I have left, etc.

Another idea would be to re-compile qemu or simavr with variable stack size.

There's japaric/cargo-call-stack, however it doesn't support avr.

@xphoniex
Copy link
Contributor Author

xphoniex commented Jul 2, 2023

Did some further digging, are you sure about this:

When an AVR program starts, it copies "static stuff" from flash memory into RAM² - that stuff ends up at the beginning of the memory; after that, we've got the stack which starts at the end of RAM and grows down (i.e. towards the lower addresses).

If I understood correctly, static part of memory gets filled on start up (in __do_copy_data) from 0x800100 and grows up. Once all the static stuff is copied, the program starts and stack grows downward from 0x8008ff.

For example, if I slightly change the code by moving this part inside its own block:

{
    let h = hmac_sha256::HMAC::mac(b"hello", b"key");
    print_hex_arr(" mac", &mut serial, &h);
    let h = hmac_sha256::Hash::hash(b"hello");
    print_hex_arr("hash", &mut serial, &h);
}

and break on the first print_hex_arr, I can get:

# program output
 mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b..
 
# inside gdb
(gdb) x/24x 0x8008ff - 96
0x80089f:	0x00000000	0x00000000	0xb9b30793	0x17b5ef15
0x8008af:	0x8c4df11f	0xc7bc5fb5	0xefc0c698	0x6dd65614
0x8008bf:	0xa76a1aed	0x7b8ba523	0x00000000	0x00000000
0x8008cf:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008df:	0x00000000	0x00000000	0x08000000	0x000002ff
0x8008ef:	0x00000000	0x00000000	0x00000000	0x00780200

which is showing h on stack, at the end of memory, but growing upwards, we were expecting downwards. (this also doesn't work if code is not inside its own block, and if break is on the second print)

however, when I place a watch right after the end of the static area:

0x00000000 in __vectors ()
(gdb) watch *(0x800100 + 484)
Hardware watchpoint 1: *(0x800100 + 484)
(gdb) cont
Continuing.

Hardware watchpoint 1: *(0x800100 + 484)

Old value = 0
New value = 1
avr_device::interrupt::free<avr_device::devices::atmega328p::{impl#68}::take::{closure_env#0}, core::option::Option<avr_device::devices::atmega328p::Peripherals>> (f=...)
    at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/avr-device-0.5.0/src/interrupt.rs:228
228	            unsafe { restore(irq_flag); }
(gdb) c
Continuing.

Hardware watchpoint 1: *(0x800100 + 484)

Old value = 1
New value = 1281
0x000029dc in hmac_sha256::State::blocks (self=0x800509, input=...) at src/lib.rs:196
196	        while inlen >= 64 {

(gdb) x/180x (0x800100 + 1448)
0x8006a8:	0x5c5c5c5c	0x5c5c5c5c	0x5c5c5c5c	0x5c5c5c5c
0x8006b8:	0x5c5c5c5c	0x5c5c5c5c	0x5c5c5c5c	0x5c5c5c5c
0x8006c8:	0x5c5c5c5c	0x5c5c5c5c	0x5c5c5c5c	0x3e5c5c5c
0x8006d8:	0xec3dec8b	0x4d7eabbb	0xc97fe6dd	0xbe1727b9
0x8006e8:	0x5b16cd5c	0xdcce6cd0	0x62590d78	0x00a9c445
0x8006f8:	0x00004000	0x00000000	0x00000000	0x00000000
0x800708:	0x00000000	0x00000000	0x00000000	0x00000000
0x800718:	0x00000000	0x00000000	0x00000000	0x00000000
0x800728:	0x00000000	0x00000000	0x00000000	0x00000000
0x800738:	0x00000000	0x00000000	0x00000000	0x00000000
0x800748:	0x00000000	0x00000000	0x00000000	0x00000000
0x800758:	0x00000000	0x00000000	0x00000000	0x00000000
0x800768:	0x00000000	0x00000000	0x00000000	0x00000000
0x800778:	0x00000000	0x00000000	0x00000000	0x00000000
0x800788:	0x00000000	0x00000000	0x00000000	0x00000000
0x800798:	0x00000000	0x00000000	0x00000000	0x00000000
0x8007a8:	0x00000000	0x00000000	0x00000000	0x00000000
0x8007b8:	0x00000000	0x00000000	0x00000000	0x00000000
0x8007c8:	0x00000000	0x00000000	0x00000000	0x00000000
0x8007d8:	0x00000000	0x00000000	0x00000000	0x00000000
0x8007e8:	0x00000000	0x00000000	0x00000000	0x00000000
0x8007f8:	0x00000000	0x00000000	0x00000000	0x00000000
0x800808:	0x00000000	0x00000000	0x00000000	0x00000000
0x800818:	0x00000000	0x00000000	0x00000000	0x00000000
0x800828:	0x00000000	0x00000000	0x00000000	0x00000000
0x800838:	0x00000000	0x00000000	0x00000000	0x00000000
0x800848:	0x00000000	0x00000000	0x00000000	0x00000000
0x800858:	0x00000000	0x00000000	0x00000000	0x00000000
0x800868:	0x00000000	0x00000000	0x00000000	0x00000000
0x800878:	0x00000000	0x00000000	0x00000000	0x00000000
0x800888:	0x00000000	0x00000000	0x00000000	0x00000000
0x800898:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008a8:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008b8:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008c8:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008d8:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008e8:	0xff080000	0x00000002	0x00000000	0x00000000
0x8008f8:	0x00000000	0x4f007802	Cannot access memory at address 0x800900

this has grown the static area upwards, whereas I was expecting the static area to remain the same size, but the program stack to grow downwards and overwrite it.

how is it possible to still create static stuff after __do_copy_data? I thought static part is taken care of by compiler, and copied on start, but not in runtime?

avr-size also shows static size as 484:

$ avr-size target/avr-atmega328p/release/avr-bug.elf
   text	   data	    bss	    dec	    hex	filename
  17374	    484	      1	  17859	   45c3	target/avr-atmega328p/release/avr-bug.elf

so either:

  • the whole stack grows from 0x800100, and only upwards, regardless of static/runtime
  • or those addresses are reserved as static, but initialized in runtime (why is avr-size not counting them then?)
  • or we can create static stuff, during runtime? (this shouldn't be possible)

fvilante added a commit to fvilante/avr_bug that referenced this issue Aug 1, 2023
@fvilante
Copy link

fvilante commented Aug 1, 2023

TLDR

I created an issue reproducer repository at https://github.com/fvilante/avr_bug

Detail

According to previous requested by @workingjubilee here, I created a git repo with the reproducer based on @xphoniex (issue author) comment.

The motivation is because I was not able to compile the code as originally given.

@xphoniex I made some small adjustments to the reproducer you have sent which are visible in the repository log history. Please clone this repository and check if you can validate it as an official reproducer for the issue.

It's very important that we have a deterministic reproducer that works the same way independent of time or machine it is run.

IMPORTANT If you consider the case, update the original issue with a link to this repository.

fvilante added a commit to fvilante/julia_rust_embedded that referenced this issue Aug 1, 2023
@fvilante
Copy link

fvilante commented Aug 1, 2023

Testing in real hardware

TLDR: It has been proved that the problem is not in the rust compiler side.

I followed an different approach than @Patryk27 and decided to experiment to take the core effect of the reproducer code (which is to be able to generate two hash values, in different optimizations levels, on an AVR platform) and run it in real hardware (avr328p based) using a different rust project that I've been configured by my own.

I assumed that the core effect desired by the reproducer boils down to:

 let h1 = hmac_sha256::HMAC::mac(b"hello", b"key");
 let h2 = hmac_sha256::Hash::hash(b"hello");

Then I inserted this two lines into another project that I've being using since a year, that targets the same hardware of the reproducer. It runs on a board with an atmega328p and an lcd display. I defined that the values of variables h1 and h2 should be printed on the display, with an interval of 1 sec between each other, so I can check visually its actual values.

Results

TLDR: The result was as expected; The same (and correct) output value no matter what kind of optimization level I use.

This video shows the result in real-time, which is as expected as below and independent of the optimization level tested.

mac = 9307b3b915efb5171ff14d8cb55fbcc798c6c0ef1456d66ded1a6aa723a58b7b
hash = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

I tested for following optimizations levels:

opt-level = "s", lto = true
opt-level = "z", lto = true

NOTE: My project enviroment does not allow to compile lto = false or opt-level other than "s" or "z". This is intentional and not related with this @xphoniex issue.. There is no point in using this compiler parameters in AVR targets given its flash size constraints.

Discussion

Although this present test does not answer the question what's wrong with the reproducer code? it shows that it is possible, with rust language, to obtain the desired reproducer effect (to produce the hashes) independent of the compiler's optimization level set.

In my code I did not use ufmt dependency. And I do not send the output to the serial, instead I send it direct to the lcd display. But I have settled up in my project a working serial port driver (which is tested in field). I did not use it because I had not a cable in the test moment.

But we can conclude that if the issue is about a possible mis-behaviour of the rust compiler, then the test probes that it is not the case.

The rust compiler is able to compile a program for AVR target that correct calculates hash functions values and displays its values into a lcd display (and hopefully into a serial port).

If there is no problem with the Rust compiler then the reproducer problem should be with some dependency it is using (for example ufmt or serial) and/or with the compiler options it's using (check this compiler correction I've made on reproducer code).

It's also important to note that I've found that stackoverflow error (same as primarily suggested by @Patryk27 in his comment) seems to be a sneaky kind of error sometime present in AVR development. And as far as I know, there is no known method to prevent it at compile time.

But if this is the case, then the best forum to discuss how to detect a SO errors in AVR? should not be in the rust language issue list but in the particular community specific for this target (for example: gitter avr lobby), once AVR is a tier 3 policy target which means that it's supported mainly by its community.

I consider that in the community we have a lack of documentation on how to detect stack overflow errors. Even I do not know a consistent method to safely detect this kind of error yet. I was reading @Patryk27 and your (@xphoniex) comment's exchange about that subject, and found that you also had problem to detect the exact point where stack reach its limits.

Eventually you may help the community to write some blog post about it.

What's next

I think this present hardware test make this issue eligible to be closed, once it has been proved the rust compiler capability to produce the desired effect. Showing therefore that the issue has other cause.

But @xphoniex please take your time to check by your own the data presented here and let us know your considerations.

@xphoniex
Copy link
Contributor Author

xphoniex commented Aug 2, 2023

Thank you Flavio, for putting in the effort.

The motivation is because I was not able to compile the code as originally given.

Can you give me the compiler error you get with the zip file? Remember you have to cargo build | run -r.

It's very important that we have a deterministic reproducer that works the same way independent of time or machine it is run.

So, I went back and tried different toolchains, and to my surprise, I was getting the correct hashes. After downloading the zip file again, and comparing all files one by one, I realized this line had fixed the issue:

build-std-features = ["compiler-builtins-mangled-names"]

which had been suggested by Patryk here. Funnily enough, when I tried this remedy before, here, it didn't solve the issue for me back then. Right now I can't re-produce it.

Anyway, you should be able to reproduce with these settings:

# Cargo.toml
opt-level = "s | z"

# rust-toolchain.toml
channel = "nightly-2022-07-10 | nightly-2022-08-11 | nightly-2022-07-10"
 mac = d238e536e20f0b1b210644248134891454c20ad29c10f75756218b6ab8f5c17d
hash = 5f23619d4ed28dbf06d25969fa262384869011f4a44469c1eb38c50b15b01c4b
  • I'm no longer able to re-produce mac = 03942d... specified in the first post.
  • As mentioned, I can't reproduce why compiler-builtins-mangled-names didn't help before, but setting it now fixes all the issues, except for opt-level of 1, which might be due to SO

So, the original issue with mac/hmac is resolved somewhere in 1.70 - 1.72 but to get it working with prior versions you can use compiler-builtins-mangled-names feature.

This is in fact a compiler issue, and if someone can point down the commit which has resolved it, I'd be grateful, as I wasn't able to using git bisect.


Why I still kept this issue open?

Most of what Patryk wrote is correct expect the way he cacluated the copy size of the static data in __do_copy_data. That routine copies the same bytes as mentioned in data section output of avr-size.

Problem now is, I stepped through the code line by line inside gdb, and I can see the stack is growing from both sides:

  1. normal stack vars are placed at the end of the memory as expected
  2. stuff for hmac lib are placed at the end of the just-copied static area

I don't know why the 2nd is happening, and I've tried talking to a few people with knowledge of AVR but no one's engaged me so far.

  • If this is how const/static area grows, why is avr-size not detecting the area which should be reserved for hmac?
  • Possibly avr-size bug?
  • Is this because of crates and somehow a rust issue, where hmac - being an external crate - won't end up showing?
  • Is this an LLVM issue?

I'm not 100% sure this is not a Rust/LLVM issue given the aforementioned points.

Let me know what you think.

@workingjubilee
Copy link
Member

workingjubilee commented Aug 3, 2023

  • If this is how const/static area grows, why is avr-size not detecting the area which should be reserved for hmac?
  • Possibly avr-size bug?
  • Is this because of crates and somehow a rust issue, where hmac - being an external crate - won't end up showing?

First, before anything: Rust will naturally prefer to statically link all crates and do considerable inlining, or recommend such to LLVM, which can make it hard to track things down at the best of times. And the crate you named uses inline(always) on many functions.

However, it is also very common for tools like avr-size (or indeed gdb!) to expect certain behaviors from tools that emit or manipulate binaries that may be, for instance, overfitted on the specific behaviors of GCC. It is often hard for them to do otherwise, sometimes, as they may be trying to extract data that is slightly more subtle than mandated by the object format's annotations. While avr-gcc may be the linker here, obviously Rust and LLVM have had more than a little input. Even worse, there are multiple different ways of evaluating the "size" of a binary's sections that can give different results. And it's not that one is right and one is wrong, necessarily.

So, armed with that knowledge, finding ourselves at the bottom of a very deep and dark rabbit hole, that might be filled with monsters, actually, I must ask: Won't end up showing... where?

@xphoniex
Copy link
Contributor Author

xphoniex commented Aug 3, 2023

First, before anything: Rust will naturally prefer to statically link all crates and do considerable inlining, or recommend such to LLVM, which can make it hard to track things down at the best of times. And the crate you named uses inline(always) on many functions.

My understanding was that inline isn't respected across crate, incorrect?

So, armed with that knowledge, finding ourselves at the bottom of a very deep and dark rabbit hole, that might be filled with monsters, actually, I must ask: Won't end up showing... where?

The data required for hmac crate, is not included in data section of the elf file. We now how two steps, one to copy (static) data for the lib, and then (static) data for hmac crate. See next post too please.

@xphoniex
Copy link
Contributor Author

xphoniex commented Aug 3, 2023

preface: tried posting this to simavr google group but couldn't, and the gitter room suggested is not very active, it's our own Flavio, and Patryk.

How does stack memory grow on AVR?

I've coded a small program for atmega328p, in rust/llvm, and it's not giving me the expected output. I don't know if the issue is on the compiler side or memory, so I'm trying to rule out the latter first (stack overflow).

  • according to the chip manual, atmega328p's SRAM is from 0x0100 - 0x08ff
  • in gdb, the aforementioned range translates to 0x800100 - 0x8008ff
  • when program starts, in __do_copy_data, static stuff are copied to 0x800100, growing upwards
  • after __do_copy_data, stack grows downward from 0x8008ff

Is everything I said above correct? cause in practice, something else is happening. Initially, __do_copy_data copies the same bytes as shown in avr-size data section to the beginning of SRAM. Then after sing enough steps, stack data show up, at the end of SRAM, but eventually at some point, some stuff gets copied to the the area right after static area as well. Here's proof:

In one terminal:

$ simavr -m atmega328p avr-bug.elf -g

In another:

$ avr-gdb -ex 'target remote :1234' avr-bug.elf
0x00000000 in __vectors ()
(gdb) define nxt
Type commands for definition of "nxt".
End with a line saying just "end".
>s
>x/60x (0x800100 + 182)
>x/60x (0x8008ff - 240)
>end

then keep nxt'ing, first stack stuff should show up after 6 nxt:

avr_device::interrupt::disable_save () at /home/usr/.cargo/registry/src/index.crates.io-6f17d22bba15001f/avr-device-0.5.0/src/interrupt.rs:107
107	                asm!(
0x8001b6 <DEVICE_PERIPHERALS>:	0x00000000	0x00000000	0x00000000	0x00000000
0x8001c6:	0x00000000	0x00000000	0x00000000	0x00000000
0x8001d6:	0x00000000	0x00000000	0x00000000	0x00000000
0x8001e6:	0x00000000	0x00000000	0x00000000	0x00000000
0x8001f6:	0x00000000	0x00000000	0x00000000	0x00000000
0x800206:	0x00000000	0x00000000	0x00000000	0x00000000
0x800216:	0x00000000	0x00000000	0x00000000	0x00000000
0x800226:	0x00000000	0x00000000	0x00000000	0x00000000
0x800236:	0x00000000	0x00000000	0x00000000	0x00000000
0x800246:	0x00000000	0x00000000	0x00000000	0x00000000
0x800256:	0x00000000	0x00000000	0x00000000	0x00000000
0x800266:	0x00000000	0x00000000	0x00000000	0x00000000
0x800276:	0x00000000	0x00000000	0x00000000	0x00000000
0x800286:	0x00000000	0x00000000	0x00000000	0x00000000
0x800296:	0x00000000	0x00000000	0x00000000	0x00000000
0x80080f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80081f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80082f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80083f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80084f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80085f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80086f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80087f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80088f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80089f:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008af:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008bf:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008cf:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008df:	0x00000000	0x00000000	0x08000000	0x000001ff
0x8008ef:	0x00000000	0x00000000	0x00000000	0x004a0100

and after another 7 nxt, you'll see something at the beginning of free SRAM area

avr_device::interrupt::free<avr_device::devices::atmega328p::{impl#68}::take::{closure_env#0}, core::option::Option<avr_device::devices::atmega328p::Peripherals>> (f=...)
    at /home/usr/.cargo/registry/src/index.crates.io-6f17d22bba15001f/avr-device-0.5.0/src/interrupt.rs:228
228	            unsafe { restore(irq_flag); }
0x8001b6 <DEVICE_PERIPHERALS>:	0x00000001	0x00000000	0x00000000	0x00000000
0x8001c6:	0x00000000	0x00000000	0x00000000	0x00000000
0x8001d6:	0x00000000	0x00000000	0x00000000	0x00000000
0x8001e6:	0x00000000	0x00000000	0x00000000	0x00000000
0x8001f6:	0x00000000	0x00000000	0x00000000	0x00000000
0x800206:	0x00000000	0x00000000	0x00000000	0x00000000
0x800216:	0x00000000	0x00000000	0x00000000	0x00000000
0x800226:	0x00000000	0x00000000	0x00000000	0x00000000
0x800236:	0x00000000	0x00000000	0x00000000	0x00000000
0x800246:	0x00000000	0x00000000	0x00000000	0x00000000
0x800256:	0x00000000	0x00000000	0x00000000	0x00000000
0x800266:	0x00000000	0x00000000	0x00000000	0x00000000
0x800276:	0x00000000	0x00000000	0x00000000	0x00000000
0x800286:	0x00000000	0x00000000	0x00000000	0x00000000
0x800296:	0x00000000	0x00000000	0x00000000	0x00000000
0x80080f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80081f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80082f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80083f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80084f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80085f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80086f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80087f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80088f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80089f:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008af:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008bf:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008cf:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008df:	0x00000000	0x00000000	0x08000000	0x000001ff
0x8008ef:	0x00000000	0x00000000	0x00000000	0x004a0100

I don't know what that irq_flag is, but we'll ignore it for now, this area remains zeros until we nxt 448 more times:

awint_core::data::inlawi::InlAwi<512, 33>::unstable_from_u8_slice<512, 33> (buf=...)
    at /home/usr/.cargo/registry/src/index.crates.io-6f17d22bba15001f/awint_core-0.7.0/src/data/inlawi.rs:164
164	        let mut raw = [0; LEN];
0x8001b6 <DEVICE_PERIPHERALS>:	0x056608c4	0x0666084e	0x00d005d8	0x00661000
0x8001c6:	0x00000000	0x00000000	0x00000000	0x00000000
0x8001d6:	0x00000000	0x00000000	0x00000000	0x00000000
0x8001e6:	0x00000000	0x00000000	0x00000000	0x00000000
0x8001f6:	0x00000000	0x00000000	0x00000000	0x00000000
0x800206:	0x00000000	0x00000000	0x00000000	0x00000000
0x800216:	0x00000000	0x00000000	0x00000000	0x00000000
0x800226:	0x00000000	0x00000000	0x00000000	0x00000000
0x800236:	0x00000000	0x00000000	0x00000000	0x00000000
0x800246:	0x00000000	0x00000000	0x00000000	0x00000000
0x800256:	0x00000000	0x00000000	0x00000000	0x00000000
0x800266:	0x00000000	0x00000000	0x00000000	0x00000000
0x800276:	0x00000000	0x00000000	0x00000000	0x00000000
0x800286:	0x00000000	0x00000000	0x00000000	0x00000000
0x800296:	0x00000000	0x00000000	0x00000000	0x00000000
0x80080f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80081f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80082f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80083f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80084f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80085f:	0x00000000	0x98000000	0x5b16f817	0xd959f281
0x80086f:	0xdb2dce28	0x07029bfc	0x95ce870b	0xac55a062
0x80087f:	0x7ef9dcbb	0x0079be66	0x00000000	0x00000000
0x80088f:	0x00000000	0x00000000	0x00000000	0x00000000
0x80089f:	0x00000000	0x00000000	0x10d4b802	0x47d08ffb
0x8008af:	0x8554199c	0x17b448a6	0x1108a8fd	0xa4fbfc0e
0x8008bf:	0xa3c4655d	0x3ada7726	0x00000048	0x00000000
0x8008cf:	0x00000000	0x00000000	0x00000000	0x00000000
0x8008df:	0x00000000	0x00000000	0x08020000	0x000001ff
0x8008ef:	0x00000000	0x00000000	0x00000000	0x004a0100

the code being executed (or about to execute?) is let mut raw = [0; LEN]; which should place raw on stack, thus at the end of SRAM.
previous line was also let mut pad = inlawi!(0u512); which is not doing anything special either, again should be placed on stack, end of SRAM.

what are those 0x056608c4 0x0666084e 0x00d005d8 0x00661000 values doing there?

did I get the growing direction wrong or is it a rust/llvm bug?

avr-bug.elf.zip

@fvilante
Copy link

fvilante commented Aug 3, 2023

@xphoniex based on your suggestion here I was able to reproduce the error using an older version of rust compiler (nightly-2022-07-10).

Which means that the issue is related to particular versions of Rust compiler, and that the same problem does not occurs in recent versions.

This has been previously anticipated by @saethlin here when he says:

We have some history with AVR miscompilations

Certainly one of the causes of this Rust AVR's compiling quality variation, is the fact that there is not any kind of regression tests being performed by the compiler team's development workflow for this target. Which means that in the proccess of development new broken avr compilers may be produced in future, with no alert (and by different reasons other than related to #109000).

As @hahix, the author of some dependencies you use in your reference code stated: even if regression tests on AVR would have no effect in terms to interrupt the compiler's development workflow. It would be a great start to have a flag indicating that something has been broken.

In this sense, why not close this issue in favor of #107612 ?

See also: This @Patryk27's comment here

Note: Altough I would suggest @workingjubilee to reframe the title of issue #107612 to also consider 8-bits targets since AVR microcontrollers are typically 8-bits. Something like this would be fine Consider simulation testing for math on 16 and 8-bit targets.

@xphoniex
Copy link
Contributor Author

xphoniex commented Aug 3, 2023

In this sense, why not close this issue in favor of #107612 ?

There are two problems in this issue, one is a math problem as you put it, and the other is likely a stack overflow.

First one is solved and no longer an issue, but second one is still persisting.

In the re-producer zip, hmac-sha256 crate is used which has some const values, and I thought that was the reason why the const area of SRAM grows post __do_copy_data even thought I still consider it a bug that avr-size does not take that size into account.

But in my previous post, I have demonstrated that even if hmac-sha256 crate is removed, one of these lines:

let mut raw = [0; LEN];
let mut pad = inlawi!(0u512);

is writing values 0x056608c4 0x0666084e 0x00d005d8 0x00661000 to the static area of SRAM, whereas both of these are non-static and should be on regular stack.

If you have an explanation for this we can close this issue?

@fvilante
Copy link

fvilante commented Aug 3, 2023

TLDR It seems your new question departs from the topic raised in the original post. I recomend close this and open a new issue.


If you have an explanation for this we can close this issue?

It seems to me that you probably unintentionally made a topic drift.

In other words, for all ways I look at your original post at this moment, there is not a single one where I can realize that your initial questions were not answered.

If you agree with above statement and you still want to open a new issue, remember to include complete information:

  • Exact rustc/cargo build commands you are using to produce the issue.
  • Reproducer that we can run to observe the bug (avoiding using of unnecessary hardware).
  • A link to github repo with the reproducer.

NOTE: github repo enables everyone working on the reproducer to easily reference, link and justify their delta variation until the solution is reached.

@workingjubilee
Copy link
Member

My understanding was that inline isn't respected across crate, incorrect?

I mean, I didn't deeply analyze the code structure of that crate, but in the general case it is often very respected cross-crate! As I understand it, it causes the Rust compiler to delay code generation in many cases until the final crate, thus making it much more like "true inlining", by forcing codegen to happen "on-the-spot". This also can cause compilation performance regressions, by forcing repeated needless codegen.

I agree that we seem to be on to a new topic and the current primary issue seems to be resolved, so I will close this. Even if we are not, it is usually better to initially prefer to split up a nebulous issue rather than assuming two problems are intrinsically connected, so if there are any remaining subissues of this that seem unresolved, please open a new issue for them.

@xphoniex
Copy link
Contributor Author

xphoniex commented Aug 4, 2023

The issue still exists

Here's video evidence to prove it:

output

I was able to produce this after a mere 10 second of playing with the code. This is a telltale that there are deeper problems when it comes to memory, caused by the compiler.

I politely ask that you @workingjubilee re-open this issue, and not close it again until it's been fully resolved. All the relevant context, info, and re-producer exists on this issue, so there's no need to re-open another issue and copy/paste the same info there. There is however, a need to not to clutter this space, so that someone with knowledge of llvm/rust can follow along, and maybe eventually help, and resolve this matter.


@saethlin I read your post on Zulip, and I understand inactiveness of this issue is your concern. I've provided evidence in prior posts in gdb, and you can see the gif I've attached as evidence that this issue still exists in compiler. Please let me know if you need anything else.

It's not inactive because it's solved, it's inactive because no one has responded to the issues I raised. I kindly ask you to make your instance clear and ask @workingjubilee to re-open this issue.


This is not a new question, it doesn't depart from original post, and I have clearly stated it in my original post - which is unedited - I'll highlight the relevant parts for you:

Screenshot from 2023-08-04 07-20-23

not only that, even if I run fns from another crate, even with the working opt-level = 2, I get incorrect output:

"not only that" means I have't even told you the most important part yet. This was the centerpiece of my issue, as calling other fns, even small ones like a debug / print, was corrupting the output.

I have posted an example of what I meant by other_crate::do_something() in my re-producer:

Screenshot from 2023-08-04 07-27-41

and I have posted the compiler version needed to re-produce this issue, but you seem to have glossed over this info.

if my issue was that I was getting incorrect hmac, I would just set opt to "2" and call it a day, use some common sense.

@oli-obk
Copy link
Contributor

oli-obk commented Aug 4, 2023

All the relevant context, info, and re-producer exists on this issue, so there's no need to re-open another issue and copy/paste the same info there. There is however, a need to not to clutter this space, so that someone with knowledge of llvm/rust can follow along, and maybe eventually help, and resolve this matter.

I read over this issue from front to back in one sitting and I cannot tell whether the issue you're having is still a stack overflow, a miscompilation, an unsound dependency, or a combination of these. You can always link to this issue from a new more targetted one.

you seem to have glossed over this info.

I have not found this information, it may have been there, but it wasn't obvious to me either. A new less cluttered issue may be easier to follow.

use some common sense.

that's uncalled for and not making people more likely to want to participate in this issue. Consider avoiding such comments in the future.

@workingjubilee
Copy link
Member

@xphoniex There are very few people who understand Rust and LLVM well enough to even try to tackle this issue. There are even fewer who have have made commits to rust-lang/rust specifically to fix issues for AVR and MSP430. I am one of them who has, and I do it because I care a lot about Rust's multi-platform viability. If I am asking you to reorganize the data in a way I prefer, I am not bothered if you say you don't understand why I have a given preference, but it is not some plot to dismiss the issue at hand. It is, rather, part of how I sort and think about issues and how to resolve them. Nonetheless, I have nothing against reopening this issue I suppose, but please trust that my request was not some idle whim.

@workingjubilee workingjubilee reopened this Aug 4, 2023
@RalfJung RalfJung changed the title Getting different output for the same program depending on compiler setting AVR: miscompilation, or stack overflow, when running cryptographic code May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-lto Area: Link Time Optimization C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-AVR Target: AVR processors (ATtiny, ATmega, etc.) P-low Low priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants