Skip to content

Commit

Permalink
Improved wgpu error handling, no more crashes through wgpu validation…
Browse files Browse the repository at this point in the history
… errors (#4509)

### What

* Fixes #4457
* solves the crash part of #4455
* but keeping it open because I still want to check on the error message


There's a bunch of things that probably should have been separate PRs
but by the time I ventured there it got too hard to entangle, so please
bear with me and read this detailed list of changes carefully (if you
understand this list you've done half the review ;-)):

* re_renderer now uses `cfg_aliases` and `cfg_if` in various places to
make cfg decisions more readable
* `ErrorTracker` is now used in all build configurations. Previously, it
was only active on debug native builds
* The first thing `RendererContext` does now is to install global error
handlers that feed into the "top level" `ErrorTracker`. Previously this
was done delayed and only for debug+native
* `ErrorTracker` never panics now
* `ErrorTracker` no longer uses a frame counting heuristic (!!) to
discard _all_ errors, but *instead* relies on precise device timeline
frame counters to discard individual errors
* this fixes issues where previously an error would disappear and
reappear without being logged
* as commented several times in the code: Keep in mind the three
staggered timelines of WebGPU
       * `Content` (your code)
       * `Device` (everything happening on wgpu::Device)
* `Queue` (everything happening on the GPU, represented by wgpu::Queue)
* `Content` and `Device` are mostly in sync on wgpu-core, but NOT on
WebGPU in general!
* split out the wgpu-core specific parts of `ErrorTracker` into its own
module and fall back to simple string hashing for WebGPU
* on WebGPU we're dealing with a JS interface that doesn't provide these
types. Additionally, in a pure WebGPU build
(the only webgpu enabled build we have today), we don't even depend on
wgpu-core!
* Manual test of re_renderer samples shows that we now no longer crash
on WebGPU either!
* the wgpu-core error type handling is unchanged except for a change in
label hashing
* previously, in the absence of debug labels, error de-duplication would
fail. In practice leading to error spam on release builds
* Introduce `WgpuErrorScope` utility for safe & fully covering wgpu
error scopes
* The resulting futures are handled with `now_or_never` on wgpu-core
backend since we know this is safe. On WebGPU we hand off execution to
the browser.
* Each frame now has a top level `WgpuErrorScope` in order to catch all
errors
* this is not a _strict_ requirement on wgpu-core backends since
wgpu-core will always feed the fallback error handler. **But Chrome/Dawn
WebGPU does not**. The spec allows for this behavior



<img width="1623" alt="image"
src="https://github.com/rerun-io/rerun/assets/1220815/be42c6cc-a780-4f48-b24c-8f9dc36b872a">
Screenshot of a viewer in release mode with an intentionally broken mesh
shader. Also tried other breakages like render pipeline format
mismatches.



Things related, but NOT covered by this PR:
* crashes on startup / adapter selection
* this requires further improvements to egui: We need to improve the
hooks that go into creating the egui render state. Need to consider
allowing to create your own renderstate. This would also allow us to
move the actual device/queue/adapter ownership to re_renderer
* #4507
* punting on this one. It's a nice-to-have extension of the work
presented here
* close look into #4455
* the crash is almost certainly fixed now and replaced by a recoverable
error 🥳, but I want to understand what sets the wrong viewport - either
the one in egui we already fixed or the ViewBuilder (which surely could
use some safety check)

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested the web demo (if applicable):
  * Full build: [app.rerun.io](https://app.rerun.io/pr/4509/index.html)
* Partial build:
[app.rerun.io](https://app.rerun.io/pr/4509/index.html?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json)
- Useful for quick testing when changes do not affect examples in any
way
* [x] The PR title and labels are set such as to maximize their
usefulness for the next release's CHANGELOG

- [PR Build Summary](https://build.rerun.io/pr/4509)
- [Docs
preview](https://rerun.io/preview/754b7c4109ba687a6192689ef9c8974dd014d770/docs)
<!--DOCS-PREVIEW-->
- [Examples
preview](https://rerun.io/preview/754b7c4109ba687a6192689ef9c8974dd014d770/examples)
<!--EXAMPLES-PREVIEW-->
- [Recent benchmark results](https://build.rerun.io/graphs/crates.html)
- [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)
  • Loading branch information
Wumpf authored Dec 13, 2023
1 parent ac7daa2 commit ac75d50
Show file tree
Hide file tree
Showing 17 changed files with 796 additions and 465 deletions.
113 changes: 96 additions & 17 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ bytemuck = { version = "1.11", features = ["extern_crate_alloc"] }
camino = "1.1"
cargo_metadata = "0.18"
cargo-run-wasm = "0.3.2"
cfg_aliases = "0.1"
cfg-if = "1.0"
clang-format = "0.3"
clap = "4.0"
Expand Down
12 changes: 8 additions & 4 deletions crates/re_renderer/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ import-gltf = ["dep:gltf"]
serde = ["dep:serde"]

## Render using webgl instead of webgpu on wasm builds.
webgl = ["wgpu/webgl"]
webgl = ["wgpu/webgl", "dep:wgpu-core"]

[dependencies]
re_error.workspace = true
Expand All @@ -50,6 +50,7 @@ ahash.workspace = true
anyhow.workspace = true
bitflags.workspace = true
bytemuck.workspace = true
cfg-if.workspace = true
clean-path.workspace = true
document-features.workspace = true
ecolor = { workspace = true, features = ["bytemuck"] }
Expand All @@ -67,32 +68,35 @@ static_assertions.workspace = true
thiserror.workspace = true
type-map.workspace = true
wgpu.workspace = true
wgpu-core.workspace = true

# optional
arrow2 = { workspace = true, optional = true }
gltf = { workspace = true, optional = true }
serde = { workspace = true, features = ["derive"], optional = true }
tobj = { workspace = true, optional = true }
wgpu-core = { workspace = true, optional = true } # Needed for error handling when wgpu-core implemented backend is used.

# native
[target.'cfg(not(target_arch = "wasm32"))'.dependencies]
crossbeam.workspace = true
notify.workspace = true
wgpu-core.workspace = true

# For examples:
# webgpu
[target.'cfg(all(target_arch = "wasm32", not(features = "webgl")))'.dependencies]
wasm-bindgen-futures.workspace = true

[dev-dependencies]
unindent.workspace = true

# For build.rs:
[build-dependencies]

# Rerun
re_build_tools.workspace = true

# External
anyhow.workspace = true
cfg_aliases.workspace = true
clean-path.workspace = true
pathdiff.workspace = true
walkdir.workspace = true
18 changes: 18 additions & 0 deletions crates/re_renderer/build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,24 @@ fn should_run() -> bool {
}

fn main() {
// TODO(andreas): Create an upstream PR `cfg_aliases` to fix this.
// Workaround for CARGO_CFG_DEBUG_ASSERTIONS not being set as expected.
// `cfg_aliases` relies on this.
if std::env::var("PROFILE") == Ok("debug".to_owned()) {
std::env::set_var("CARGO_CFG_DEBUG_ASSERTIONS", "1");
}

#[allow(clippy::str_to_string)]
// TODO(andreas): Create an upstream PR to `cfg_aliases` fix this.
{
cfg_aliases::cfg_aliases! {
native: { not(target_arch = "wasm32") },
webgl: { all(not(native), feature = "webgl") },
webgpu: { all(not(webgl), not(native)) },
load_shaders_from_disk: { all(native, debug_assertions) } // Shader reloading is only supported on native-debug currently.
}
}

if !should_run() {
return;
}
Expand Down
8 changes: 4 additions & 4 deletions crates/re_renderer/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -153,10 +153,7 @@ pub struct RenderContextConfig {
///
/// Other backend might work as well, but lack of support isn't regarded as a bug.
pub fn supported_backends() -> wgpu::Backends {
if cfg!(target_arch = "wasm32") {
// Web - WebGL is used automatically when wgpu is compiled with `webgl` feature.
wgpu::Backends::GL | wgpu::Backends::BROWSER_WEBGPU
} else {
if cfg!(native) {
// Native.
// Only use Vulkan & Metal unless explicitly told so since this reduces surfaces and thus surprises.
//
Expand All @@ -167,5 +164,8 @@ pub fn supported_backends() -> wgpu::Backends {
// For changing the backend we use standard wgpu env var, i.e. WGPU_BACKEND.
wgpu::util::backend_bits_from_env()
.unwrap_or(wgpu::Backends::VULKAN | wgpu::Backends::METAL)
} else {
// Web - WebGL is used automatically when wgpu is compiled with `webgl` feature.
wgpu::Backends::GL | wgpu::Backends::BROWSER_WEBGPU
}
}
Loading

0 comments on commit ac75d50

Please sign in to comment.