Skip to content
This repository was archived by the owner on Jun 26, 2020. It is now read-only.

Conversation

@abrown
Copy link
Member

@abrown abrown commented Nov 13, 2019

  • This has been not been discussed in an issue.
  • A short description of what this does, why it is needed: when trying to root cause verifier errors in a file with many functions, the printed error message (e.g. inst2: arg 0 (v4) has type i16x8, expected i8x16) is usually not enough to troubleshoot the location of the failure. This change adds logging like the following:
ERROR cranelift_codegen::verifier        > Found verifier errors in function:
function u0:3(i64 vmctx, i8x16) -> i8x16 system_v {
                                ebb0(v0: i64, v1: i8x16):
@0072                               v3 = raw_bitcast.i16x8 v1
@0072                               v4 = ineg v3
@0074                               jump ebb1(v4)

                                ebb1(v2: i8x16):
@0074                               return v2
}
 - at inst2: arg 0 (v4) has type i16x8, expected i8x16

The logging level can be controlled with the RUST_LOG environment variable but due to env_logger having a default log level of ERROR these messages will always be printed. If we don't want that, we can downgrade these logs to WARNING so they can optionally be turned on by something like RUST_LOG=cranelift_codegen=warning.

  • This PR contains test cases: n/a.
  • A reviewer from the core maintainer team has been assigned for this PR.

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 13, 2019

I think https://docs.rs/cranelift-codegen/0.50.0/cranelift_codegen/print_errors/fn.pretty_verifier_error.html should be used instead. Also it would be nice to make the verifier error unwraps in for example cranelift_module give a nice panic message, instead of adding this error logging. This error logging makes verifier errors show up twice when using for example clif-util compile, as it already shows the function annnotated with verifier errors.

@abrown
Copy link
Member Author

abrown commented Nov 13, 2019

@bjorn3, I tried to use pretty_verifier_error before but ran into issues; where do you think pretty_verifier_error could be used that wouldn't involve refactoring a lot of this file?

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 14, 2019

I tried to use pretty_verifier_error before but ran into issues

What issues?

where do you think pretty_verifier_error could be used that wouldn't involve refactoring a lot of this file?

Wherever a VerifierErrors is unwrapped. (for example ctx.compile(isa).unwrap())

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 14, 2019

It would also be possible to add a stringified version of the entity to VerifierError, instead of just the id.

@bnjbvr
Copy link
Member

bnjbvr commented Nov 14, 2019

Making errors more discoverable and easy to understand is a valuable goal. Just trying to understand the use case: is it when there's a verifier error in wasmtime itself? Or are you using the clif-util and sometimes verifier errors aren't displayed? In the former case, wasmtime could use the exposed pretty printers that are in print_errors. In the latter case, could we add additional logging?

Also it would be nice to make the verifier error unwraps in for example cranelift_module give a nice panic message, instead of adding this error logging.

Are you suggesting to cause panics whenever there are verifier errors? If so, I think we should let this decision to the users of any cranelift module, and not do this by ourselves. Embedders should be able to handle invalid wasm code (for instance) and not crash.

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 14, 2019

I mean that when somebody already unwraps, it should give a nicer error message by default.

@abrown
Copy link
Member Author

abrown commented Nov 15, 2019

I mean that when somebody already unwraps, it should give a nicer error message by default.

Yeah, in both cases--clif-util and wasmtime--the error messaging could be improved. I think I see what @bjorn3 is saying: if anywhere a CodegenError is used in a Result we use pretty_error we get good, contextual errors, as in

let CodeInfo { total_size, .. } = comp_ctx
.compile(isa)
.map_err(|e| pretty_error(&comp_ctx.func, context.isa, e))?;

This seems to be largely the case as I search around in cranelift but not in wasmtime. In wasmtime the CodegenError ends up wrapped as a CompileError (and later emitted) so there is not currently a good place to use pretty_error on the CodegenError: https://github.com/bytecodealliance/wasmtime/blob/1a0ed6e388d96c5844044f81f91fe2770ed7d857/crates/environ/src/cranelift.rs#L242-L248

When I mentioned having issues with this before, it is because I tried to make VerifierErrors smarter so it could carry around the information it needed to pretty-print later on--something along the lines of PrettyVerifierError. I abandoned because I was running into lifetime issues and re-factoring too much code. I think the idea was in the right direction, though: whether someone can't call pretty_error as in wasmtime (this isn't really true, we could change the error structure there, but you know what I mean) or they just don't know about pretty_error and panic (as @bjorn3 describes) the errors would just print with all necessary context.

Since holding on to references to Function and TargetIsa in the error context is likely an impossibility, I went with the next most general mechanism: logging. (If anyone has any great ideas to re-work the pretty-printing to be both lazy AND have the context, I'm all ears). But if we don't want to log, then I think I will try to re-factor something in wasmtime to get better errors.

@abrown
Copy link
Member Author

abrown commented Nov 15, 2019

And another thought: do we really need to have lazy errors? What if CodegenError::Verifier contained a String instead of VerifierErrors. This forces the pretty-printing work to be done before the error is returned and limits how the user may choose to display these errors, but it would solve the problem above.

pub enum CodegenError {
/// A list of IR verifier errors.
///
/// This always represents a bug, either in the code that generated IR for Cranelift, or a bug
/// in Cranelift itself.
#[error("Verifier errors")]
Verifier(#[from] VerifierErrors),

@bnjbvr
Copy link
Member

bnjbvr commented Nov 15, 2019

do we really need to have lazy errors?

I think this is useful because then we can only provide context once (i.e. printing lines of code around the error location), otherwise we couldn't print two different errors within a single context.

(this isn't really true, we could change the error structure there, but you know what I mean)

I imagine you mean it's complicated to have wasmtime convert from our error type to its own error type, calling the pretty printing methods btw. Otherwise, it might be worth doing this in wasmtime, wouldn't it?

I mean that when somebody already unwraps, it should give a nicer error message by default.

I have looked at the pretty printing, and it seems it's consuming errors as it prints them, so I wonder if @bjorn3 meant that if a verifier errors vector has been emptied, and we're calling the pretty printers again, then we should panic. If so, I agree it'd be nice to signal it to the caller of the pretty print functions.

[edit] Dismiss this last part, the pretty print function takes ownership of the VerifierErrors anyways.

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 15, 2019

No. When somebody uses ctx.compile(isa).unwrap() it should print a nicer error, which includes some context about which entity caused the error. For example inst42 (v0 = iconst.i8 1000), instead of just inst42. This String context can be added as extra field to the VerifierError. The Debug impl will print it, but pretty_verifier_error will just ignore it and show the error in the context of the function.

@abrown
Copy link
Member Author

abrown commented Nov 15, 2019

I think I'm on board with what is suggested:

  • be eager about stringifying the instruction (e.g. inst42 (v0 = iconst.i8 1000)) as the VerifierError is created--this makes VerifierError much more useful on its own even if it has a bit of overhead
  • be lazy about stringifying VerifierErrors; this means changing wasmtime (and any other users of CodegenError to use pretty_error

I can convert this PR to do the first point if we all agree?

@bnjbvr
Copy link
Member

bnjbvr commented Nov 15, 2019

I think this sounds great 👍

@abrown abrown force-pushed the log-verifier-errors branch from abbc3d3 to ccc46bd Compare November 15, 2019 22:20
@abrown abrown requested a review from bnjbvr November 15, 2019 22:25
@abrown abrown force-pushed the log-verifier-errors branch from ccc46bd to 4515e21 Compare November 15, 2019 22:42
@abrown
Copy link
Member Author

abrown commented Nov 15, 2019

An explanation of these commits:

  • the first commit adds context to verifier errors on instructions: inst42: v2 is not... is now inst42 (v3 = not v2): v2 is not.... To do this, I tried hacking away at the existing macros and got frustrated enough that I implemented this behavior separately as methods. Judging from this experience and other anecdotal evidence, I suggest we avoid macros and just use format! wherever needed.
  • the second commit re-adds the log message: as I thought about it more, the only complaint was that the pretty errors would be printed twice (@bjorn3 above). To avoid this, I reduced the log level of the message to DEBUG. This way, if we have cases where we aren't seeing the right errors but we want to without re-compiling, we can enable this. My thought process was "Why not? We only pay the logging cost when we have errors and logging is enabled..." (I will still try to add the pretty-printing to wasmtime, though).

Finally, the build failures are somewhat surprising:

FAIL filetests/verifier/bad_layout.clif: verifier(%test_2):
unexpected error ebb0: block does not end in a terminator instruction

FAIL filetests/cfg/traps_early.clif: verifier(%nonsense):
unexpected error inst5: invalid reference to entry ebb ebb0
unexpected error inst4 (v3 = iadd.i32 v1, v3): uses value v3 from itself

I don't see that I made a mistake in moving over to the new fatal/nonfatal methods (though this is a real possibility) so I wonder if I uncovered some hidden bugs.

@abrown abrown force-pushed the log-verifier-errors branch from 4515e21 to adcaf4c Compare November 15, 2019 22:54
@abrown
Copy link
Member Author

abrown commented Nov 15, 2019

I don't see that I made a mistake in moving over to the new fatal/nonfatal methods (though this is a real possibility) so I wonder if I uncovered some hidden bugs.

Nope 😆. I found it.

abrown added a commit to abrown/wasmtime that referenced this pull request Nov 15, 2019
As discussed in bytecodealliance/cranelift#1226, the context of Cranelift errors is lost after exiting the scope containing the Cranelift function. `CodegenError` then only contains something like `inst2: arg 0 (v4) has type i16x8, expected i8x16`, which is rarely enough information for investigating a codegen failure. This change uses Cranelift's `pretty_error` function to improve the error messages wrapped in `CompileError`; `CompileError` has lost the reference to `CodegenError` due to `pretty_error` taking ownership but this seems preferable since no backtrace is attached and losing the pretty-printed context would be worse (if `CodegenError` gains a `Backtrace` or implements `Clone` we can revisit this).
sunfishcode pushed a commit to bytecodealliance/wasmtime that referenced this pull request Nov 16, 2019
As discussed in bytecodealliance/cranelift#1226, the context of Cranelift errors is lost after exiting the scope containing the Cranelift function. `CodegenError` then only contains something like `inst2: arg 0 (v4) has type i16x8, expected i8x16`, which is rarely enough information for investigating a codegen failure. This change uses Cranelift's `pretty_error` function to improve the error messages wrapped in `CompileError`; `CompileError` has lost the reference to `CodegenError` due to `pretty_error` taking ownership but this seems preferable since no backtrace is attached and losing the pretty-printed context would be worse (if `CodegenError` gains a `Backtrace` or implements `Clone` we can revisit this).
Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

}

/// Diagnose a fatal error, and return `Err`.
fn fatal(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Can the fatal and nonfatal macros be entirely removed now, by replacing other uses in the other files contained in the verifier directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fatal! still has 52 usages, nonfatal! has 4 usages, and report! has 23 usages; I'll create an issue to do that as a separate PR because I think I would like to refactor that stuff on to VerifierErrors: #1248.

Previously, verifier errors looked like `inst42: v2 is not...`; now, when printed, these same errors will have the instruction context: `inst42 (v3 = not v2): v2 is not...`. This should make it easier to troubleshoot verifier errors when the full pretty-printed error string is not available (see `pretty_error`).
@abrown abrown force-pushed the log-verifier-errors branch from adcaf4c to 82a54e5 Compare November 20, 2019 21:07
@abrown abrown merged commit 07e3493 into bytecodealliance:master Nov 20, 2019
@abrown abrown deleted the log-verifier-errors branch November 20, 2019 21:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants