Log verifier errors #1226

abrown · 2019-11-13T22:45:26Z

This has been not been discussed in an issue.
A short description of what this does, why it is needed: when trying to root cause verifier errors in a file with many functions, the printed error message (e.g. inst2: arg 0 (v4) has type i16x8, expected i8x16) is usually not enough to troubleshoot the location of the failure. This change adds logging like the following:

ERROR cranelift_codegen::verifier        > Found verifier errors in function:
function u0:3(i64 vmctx, i8x16) -> i8x16 system_v {
                                ebb0(v0: i64, v1: i8x16):
@0072                               v3 = raw_bitcast.i16x8 v1
@0072                               v4 = ineg v3
@0074                               jump ebb1(v4)

                                ebb1(v2: i8x16):
@0074                               return v2
}
 - at inst2: arg 0 (v4) has type i16x8, expected i8x16

The logging level can be controlled with the RUST_LOG environment variable but due to env_logger having a default log level of ERROR these messages will always be printed. If we don't want that, we can downgrade these logs to WARNING so they can optionally be turned on by something like RUST_LOG=cranelift_codegen=warning.

This PR contains test cases: n/a.
A reviewer from the core maintainer team has been assigned for this PR.

bjorn3 · 2019-11-13T22:56:10Z

I think https://docs.rs/cranelift-codegen/0.50.0/cranelift_codegen/print_errors/fn.pretty_verifier_error.html should be used instead. Also it would be nice to make the verifier error unwraps in for example cranelift_module give a nice panic message, instead of adding this error logging. This error logging makes verifier errors show up twice when using for example clif-util compile, as it already shows the function annnotated with verifier errors.

abrown · 2019-11-13T23:30:17Z

@bjorn3, I tried to use pretty_verifier_error before but ran into issues; where do you think pretty_verifier_error could be used that wouldn't involve refactoring a lot of this file?

bjorn3 · 2019-11-14T08:57:00Z

I tried to use pretty_verifier_error before but ran into issues

What issues?

where do you think pretty_verifier_error could be used that wouldn't involve refactoring a lot of this file?

Wherever a VerifierErrors is unwrapped. (for example ctx.compile(isa).unwrap())

bjorn3 · 2019-11-14T08:59:44Z

It would also be possible to add a stringified version of the entity to VerifierError, instead of just the id.

bnjbvr · 2019-11-14T09:54:53Z

Making errors more discoverable and easy to understand is a valuable goal. Just trying to understand the use case: is it when there's a verifier error in wasmtime itself? Or are you using the clif-util and sometimes verifier errors aren't displayed? In the former case, wasmtime could use the exposed pretty printers that are in print_errors. In the latter case, could we add additional logging?

Also it would be nice to make the verifier error unwraps in for example cranelift_module give a nice panic message, instead of adding this error logging.

Are you suggesting to cause panics whenever there are verifier errors? If so, I think we should let this decision to the users of any cranelift module, and not do this by ourselves. Embedders should be able to handle invalid wasm code (for instance) and not crash.

bjorn3 · 2019-11-14T10:23:22Z

I mean that when somebody already unwraps, it should give a nicer error message by default.

abrown · 2019-11-15T01:10:13Z

I mean that when somebody already unwraps, it should give a nicer error message by default.

Yeah, in both cases--clif-util and wasmtime--the error messaging could be improved. I think I see what @bjorn3 is saying: if anywhere a CodegenError is used in a Result we use pretty_error we get good, contextual errors, as in

cranelift/cranelift-filetests/src/test_compile.rs

Lines 43 to 45 in 8ef665d

    
           let CodeInfo { total_size, .. } = comp_ctx 
        
               .compile(isa) 
        
               .map_err(|e| pretty_error(&comp_ctx.func, context.isa, e))?;

This seems to be largely the case as I search around in cranelift but not in wasmtime. In wasmtime the CodegenError ends up wrapped as a CompileError (and later emitted) so there is not currently a good place to use pretty_error on the CodegenError: https://github.com/bytecodealliance/wasmtime/blob/1a0ed6e388d96c5844044f81f91fe2770ed7d857/crates/environ/src/cranelift.rs#L242-L248

When I mentioned having issues with this before, it is because I tried to make VerifierErrors smarter so it could carry around the information it needed to pretty-print later on--something along the lines of PrettyVerifierError. I abandoned because I was running into lifetime issues and re-factoring too much code. I think the idea was in the right direction, though: whether someone can't call pretty_error as in wasmtime (this isn't really true, we could change the error structure there, but you know what I mean) or they just don't know about pretty_error and panic (as @bjorn3 describes) the errors would just print with all necessary context.

Since holding on to references to Function and TargetIsa in the error context is likely an impossibility, I went with the next most general mechanism: logging. (If anyone has any great ideas to re-work the pretty-printing to be both lazy AND have the context, I'm all ears). But if we don't want to log, then I think I will try to re-factor something in wasmtime to get better errors.

abrown · 2019-11-15T01:15:19Z

And another thought: do we really need to have lazy errors? What if CodegenError::Verifier contained a String instead of VerifierErrors. This forces the pretty-printing work to be done before the error is returned and limits how the user may choose to display these errors, but it would solve the problem above.

cranelift/cranelift-codegen/src/result.rs

Lines 10 to 17 in d23030f

    
           pub enum CodegenError { 
        
               /// A list of IR verifier errors. 
        
               /// 
        
               /// This always represents a bug, either in the code that generated IR for Cranelift, or a bug 
        
               /// in Cranelift itself. 
        
               #[error("Verifier errors")] 
        
               Verifier(#[from] VerifierErrors),

bnjbvr · 2019-11-15T13:32:47Z

do we really need to have lazy errors?

I think this is useful because then we can only provide context once (i.e. printing lines of code around the error location), otherwise we couldn't print two different errors within a single context.

(this isn't really true, we could change the error structure there, but you know what I mean)

I imagine you mean it's complicated to have wasmtime convert from our error type to its own error type, calling the pretty printing methods btw. Otherwise, it might be worth doing this in wasmtime, wouldn't it?

I mean that when somebody already unwraps, it should give a nicer error message by default.

I have looked at the pretty printing, and it seems it's consuming errors as it prints them, so I wonder if @bjorn3 meant that if a verifier errors vector has been emptied, and we're calling the pretty printers again, then we should panic. If so, I agree it'd be nice to signal it to the caller of the pretty print functions.

[edit] Dismiss this last part, the pretty print function takes ownership of the VerifierErrors anyways.

bjorn3 · 2019-11-15T14:07:43Z

No. When somebody uses ctx.compile(isa).unwrap() it should print a nicer error, which includes some context about which entity caused the error. For example inst42 (v0 = iconst.i8 1000), instead of just inst42. This String context can be added as extra field to the VerifierError. The Debug impl will print it, but pretty_verifier_error will just ignore it and show the error in the context of the function.

abrown · 2019-11-15T17:33:55Z

I think I'm on board with what is suggested:

be eager about stringifying the instruction (e.g. inst42 (v0 = iconst.i8 1000)) as the VerifierError is created--this makes VerifierError much more useful on its own even if it has a bit of overhead
be lazy about stringifying VerifierErrors; this means changing wasmtime (and any other users of CodegenError to use pretty_error

I can convert this PR to do the first point if we all agree?

bnjbvr · 2019-11-15T18:09:14Z

I think this sounds great 👍

abrown · 2019-11-15T22:50:34Z

An explanation of these commits:

the first commit adds context to verifier errors on instructions: inst42: v2 is not... is now inst42 (v3 = not v2): v2 is not.... To do this, I tried hacking away at the existing macros and got frustrated enough that I implemented this behavior separately as methods. Judging from this experience and other anecdotal evidence, I suggest we avoid macros and just use format! wherever needed.
the second commit re-adds the log message: as I thought about it more, the only complaint was that the pretty errors would be printed twice (@bjorn3 above). To avoid this, I reduced the log level of the message to DEBUG. This way, if we have cases where we aren't seeing the right errors but we want to without re-compiling, we can enable this. My thought process was "Why not? We only pay the logging cost when we have errors and logging is enabled..." (I will still try to add the pretty-printing to wasmtime, though).

Finally, the build failures are somewhat surprising:

FAIL filetests/verifier/bad_layout.clif: verifier(%test_2):
unexpected error ebb0: block does not end in a terminator instruction

FAIL filetests/cfg/traps_early.clif: verifier(%nonsense):
unexpected error inst5: invalid reference to entry ebb ebb0
unexpected error inst4 (v3 = iadd.i32 v1, v3): uses value v3 from itself

I don't see that I made a mistake in moving over to the new fatal/nonfatal methods (though this is a real possibility) so I wonder if I uncovered some hidden bugs.

abrown · 2019-11-15T22:55:33Z

I don't see that I made a mistake in moving over to the new fatal/nonfatal methods (though this is a real possibility) so I wonder if I uncovered some hidden bugs.

Nope 😆. I found it.

As discussed in bytecodealliance/cranelift#1226, the context of Cranelift errors is lost after exiting the scope containing the Cranelift function. `CodegenError` then only contains something like `inst2: arg 0 (v4) has type i16x8, expected i8x16`, which is rarely enough information for investigating a codegen failure. This change uses Cranelift's `pretty_error` function to improve the error messages wrapped in `CompileError`; `CompileError` has lost the reference to `CodegenError` due to `pretty_error` taking ownership but this seems preferable since no backtrace is attached and losing the pretty-printed context would be worse (if `CodegenError` gains a `Backtrace` or implements `Clone` we can revisit this).

bnjbvr

Makes sense, thanks!

bnjbvr · 2019-11-19T10:31:52Z

cranelift-codegen/src/verifier/mod.rs

+    }
+
+    /// Diagnose a fatal error, and return `Err`.
+    fn fatal(


Nice! Can the fatal and nonfatal macros be entirely removed now, by replacing other uses in the other files contained in the verifier directory?

fatal! still has 52 usages, nonfatal! has 4 usages, and report! has 23 usages; I'll create an issue to do that as a separate PR because I think I would like to refactor that stuff on to VerifierErrors: #1248.

cranelift-codegen/src/verifier/mod.rs

Previously, verifier errors looked like `inst42: v2 is not...`; now, when printed, these same errors will have the instruction context: `inst42 (v3 = not v2): v2 is not...`. This should make it easier to troubleshoot verifier errors when the full pretty-printed error string is not available (see `pretty_error`).

abrown force-pushed the log-verifier-errors branch from abbc3d3 to ccc46bd Compare November 15, 2019 22:20

abrown requested a review from bnjbvr November 15, 2019 22:25

abrown force-pushed the log-verifier-errors branch from ccc46bd to 4515e21 Compare November 15, 2019 22:42

abrown force-pushed the log-verifier-errors branch from 4515e21 to adcaf4c Compare November 15, 2019 22:54

bjorn3 approved these changes Nov 15, 2019

View reviewed changes

abrown mentioned this pull request Nov 15, 2019

Improve error messages received from Cranelift bytecodealliance/wasmtime#583

Merged

bnjbvr approved these changes Nov 19, 2019

View reviewed changes

abrown added 2 commits November 20, 2019 12:51

Log verifier errors

82a54e5

abrown mentioned this pull request Nov 20, 2019

Remove error macros in the verifier #1248

Closed

abrown force-pushed the log-verifier-errors branch from adcaf4c to 82a54e5 Compare November 20, 2019 21:07

abrown merged commit 07e3493 into bytecodealliance:master Nov 20, 2019

abrown deleted the log-verifier-errors branch November 20, 2019 21:30

Log verifier errors #1226

Log verifier errors #1226

Uh oh!

Conversation

abrown commented Nov 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjorn3 commented Nov 13, 2019

Uh oh!

abrown commented Nov 13, 2019

Uh oh!

bjorn3 commented Nov 14, 2019

Uh oh!

bjorn3 commented Nov 14, 2019

Uh oh!

bnjbvr commented Nov 14, 2019

Uh oh!

bjorn3 commented Nov 14, 2019

Uh oh!

abrown commented Nov 15, 2019

Uh oh!

abrown commented Nov 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bnjbvr commented Nov 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjorn3 commented Nov 15, 2019

Uh oh!

abrown commented Nov 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bnjbvr commented Nov 15, 2019

Uh oh!

abrown commented Nov 15, 2019

Uh oh!

abrown commented Nov 15, 2019

Uh oh!

bnjbvr left a comment

Choose a reason for hiding this comment

Uh oh!

bnjbvr Nov 19, 2019

Choose a reason for hiding this comment

Uh oh!

abrown Nov 20, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abrown commented Nov 13, 2019 •

edited

Loading

abrown commented Nov 15, 2019 •

edited

Loading

bnjbvr commented Nov 15, 2019 •

edited

Loading

abrown commented Nov 15, 2019 •

edited

Loading