Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gear-lazy-pages: Undefined behavior on Windows #4341

Open
ark0f opened this issue Nov 15, 2024 · 1 comment
Open

gear-lazy-pages: Undefined behavior on Windows #4341

ark0f opened this issue Nov 15, 2024 · 1 comment
Labels
C0-bug Something isn't working D2-node Gear Node D4-test Autotests, and examples P0-dropeverything Highest priority

Comments

@ark0f
Copy link
Member

ark0f commented Nov 15, 2024

Problem

It is confirmed x86_64-pc-windows-msvc has undefined behavior for code transitively using lazy-pages (e.g. vectored exception handler) since #4289

Steps

  1. Comment out lines in root Cargo.toml:
[profile.dev.package.corosensei]
opt-level = 3
  1. Set env var:
$ENV:RUST_LOG="debug"
  1. Run cargo test -p pallet-gear wait_lock

Possible Solution

Current possible solution to always compile corosensei in release which seems to be ineffective since we have reproduced it in wine even with applied fix: https://github.com/gear-tech/gear/actions/runs/11786273088/job/32829243330?pr=4334#step:16:749.

Notes

Reproduction only possible in debug mode.
Any additional function call in exception handler leads test to pass.

What code has been tried:

  1. Increased stack usage:
let _buf = [0; 4096]; // compiler inserts `__chkstk` function call
  1. Explicit function call:
log::debug!("any message");
  1. Changes in user_signal_handler_internal:
- process::process_lazy_pages(rt_ctx, exec_ctx, handler, page)
+ let res = process::process_lazy_pages(rt_ctx, exec_ctx, handler, page);
// compiler insert `memcpy` function call for `res`
+ res

What code has no effects:

  1. Any code that doesn't involve function call (e.g. branches, integer operations and so one)
  2. SetThreadStackGuarantee
  3. Increased stack size up to 16 MB
  4. mfence instruction at the start and at the end of exception handler

Relevant Log Output

Since behavior is undefined we get different errors even with different tools and on different machines without any code changes.

CI dedicated Windows machine:

cargo test - memory allocation of X bytes failed

$ENV:RUST_LOG="debug"
cargo test -p pallet-gear wait_lock -- --nocapture

... debug logs ...

memory allocation of 72060563285967349 bytes failed
error: test failed, to rerun pass `-p pallet-gear --lib`

Caused by:
  process didn't exit successfully: `C:\gear\target\debug\deps\pallet_gear-06c5fbaa7f64edbf.exe wait_lock --nocapture` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)

Note: debugger shows code tried to panic!() and format region::Error::ProcfsInput variant which is UNIX-only and this it's impossible on Windows

cargo nextest - BorrowMutError

https://github.com/gear-tech/gear/actions/runs/11635541579/job/32461288334?pr=4289#step:16:445

CI dedicated Linux machine with WINE:

cargo nextest - BorrowMutError

https://github.com/gear-tech/gear/actions/runs/11786273088/job/32829243330?pr=4334#step:16:749

VMWare Fusion Windows 11 ARM with x86 emulation:

cargo test

$ENV:RUST_LOG="debug"
cargo test -p wait_lock

... debug logs ...

error: test failed, to rerun pass `-p pallet-gear --lib`

Caused by:
  process didn't exit successfully: `C:\gear\target\debug\deps\pallet_gear-06c5fbaa7f64edbf.exe wait_lock` (exit code: 0xe06d7363)
note: test exited abnormally; to see the full output pass --nocapture to the harness.

cargo test -- --nocapture

$ENV:RUST_LOG="debug"
cargo test -p wait_lock -- --nocapture

... debug logs ...

thread 'tests::default_wait_lock_timeout' panicked at /rustc/612a33f20b9b2c27380edbc4b26a01433ed114bc\library\std\src\io\mod.rs:1693:36:
range start index 196 out of range for slice of length 35
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: test failed, to rerun pass `-p pallet-gear --lib`

Caused by:
  process didn't exit successfully: `C:\gear\target\debug\deps\pallet_gear-06c5fbaa7f64edbf.exe wait_lock --nocapture` (exit code: 0xe06d7363)

cargo nextest run --no-capture

$ENV:RUST_LOG="debug"
cargo nextest run -p pallet-gear wait_lock --no-capture

... debug logs ...

range start index 196 out of range for slice of length 35
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
       ABORT [   2.202s] pallet-gear tests::default_wait_lock_timeout
     Message [         ] code 0xe06d7363: OS Error -529697949 (FormatMessageW() returned error 317) (os error -529697949)
   Canceling due to test failure
------------
     Summary [   2.224s] 1 test run: 0 passed, 1 failed, 233 skipped
       ABORT [   2.202s] pallet-gear tests::default_wait_lock_timeout
     Message [         ] code 0xe06d7363: OS Error -529697949 (FormatMessageW() returned error 317) (os error -529697949)
error: test run failed

Note: test passes without --no-capture argument

@ByteNacked's native Windows machine:

cargo test -- --nocapture - Externalities not allowed to fail within runtime: "A"

$ENV:RUST_LOG="debug"
cargo test -p pallet-gear wait_lock -- --nocapture

... debug logs ...

thread 'tests::default_wait_lock_timeout' panicked at C:\.cargo\git\checkouts\polkadot-sdk-94df07da2390bde9\2686925\substrate\primitives\state-machine\src
Externalities not allowed to fail within runtime: "A"
note: run with RUST _BACKTRACE=1 environment variable to display a backtrace
error: test failed, to rerun pass *-p pallet-gear --lib

Caused by:
    process didn't exit successfully: "C: \gear\target \debug\deps\pallet_gear-06c5fbaa7f64edbf.exe wait_lock --nocapture (exit code: 0xe06d7363)

@ark0f ark0f added D2-node Gear Node P0-dropeverything Highest priority C0-bug Something isn't working D4-test Autotests, and examples labels Nov 15, 2024
@ark0f ark0f changed the title Undefined behavior on Windows with lazy-pages gear-lazy-pages: Undefined behavior on Windows with lazy-pages Nov 15, 2024
@ark0f ark0f changed the title gear-lazy-pages: Undefined behavior on Windows with lazy-pages gear-lazy-pages: Undefined behavior on Windows Nov 15, 2024
@ark0f
Copy link
Member Author

ark0f commented Nov 20, 2024

We found AddressSanitizer aborts execution on unreachable instruction in wasmer:

wasmerio/wasmer#5260

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C0-bug Something isn't working D2-node Gear Node D4-test Autotests, and examples P0-dropeverything Highest priority
Projects
None yet
Development

No branches or pull requests

1 participant