[encoding] Initial attempt at a BumpEstimator utility #436

armansito · 2024-02-13T22:44:15Z

Several vello stages dynamically bump allocate intermediate data structures. Due to graphics API limitations the backing memory for these data structures must have been allocated at the time of command submission even though the precise memory requirements are unknown.

Vello currently works around this issue in two ways (see #366):

It prescribes a mechanism in which allocation failures get detected by fencing back to the CPU. The client responds to this event by creating larger GPU buffers using the bump allocator state obtained via read-back. The client has the choice of dropping a frame or submitting the fine stage only after any allocation failures have been resolved.
The encoding crate hard-codes the buffers to be large enough to be able to render paris-30k, making it unlikely for simple scenes to under-allocate. This comes at the cost of a fixed memory watermark of >50MB.

There may be situations when neither of these solutions are desirable while the cost of additional CPU-side pre-processing is not considered prohibitive for performance. It may also be acceptable to pay the cost of generally allocating more than what's required in order to make underallocation impossible (except perhaps for OOM situations).

In that spirit, this change introduces the beginnings of a heuristic-based conservative memory estimation utility. It currently estimates only the LineSoup buffer (which contains the curve flattening output) within a factor of 1.1x-3.3x on the Vello test scenes (paris-30k is estimated at 1.5x the actual requirement).

Curves are estimated using Wang's formula which is fast to evaluate but produces a less optimal result than Vello's analytic approach. The overestimation is more pronounced with increased curvature variation.
Explicit lines (such as line-tos) get estimated precisely.
As an initial stage, only the LineSoup buffer is supported. Support for the other buffers will be added as follow-up work, as they require experiments with additional heuristics.
A BumpEstimator is integrated with the Scene API (gated by a feature flag) but the results are currently unused. Glyph runs are not supported as the estimator is not yet aware of the path data stored in the glyph cache. Transformed scene fragments are supported by applying fine-grained scaling to curve line counts, skipping explicit lines which are scale-invariant.

crates/encoding/src/estimate.rs

raphlinus

I think this can go in, minor performance tweak and cleanup suggested. If we weren't planning on iterating toward putting the estimation at resolve time, I'd ask for CI changes, but if this is a reasonably temporary state, I'm ok with it not going in. Just don't be surprised if the feature has some breakage :)

raphlinus · 2024-03-04T23:35:38Z

Cargo.toml

+# Enables GPU memory usage estimation. This performs additional computations
+# in order to estimate the minimum required allocations for buffers backing
+# bump-allocated GPU memory.
+bump_estimate = ["vello_encoding/bump_estimate"]


Because we now have more possibilities for things to break, one of the things I'd like to see is running at least cargo check in CI with the feature enabled and disabled.

However, since the longer term plan is most likely to move estimation to resolve time, which hopefully will mean that the cost of estimation will be a runtime rather than a compile time choice and so won't need a feature gate, I'm not going to ask for the CI changes now.

The CI already runs:

--no-default-features

default features

--all-features

src/scene.rs

Several vello stages dynamically bump allocate intermediate data structures. Due to graphics API limitations the backing memory for these data structures must have been allocated at the time of command submission even though the precise memory requirements are unknown. Vello currently works around this issue in two ways (see #366): 1. Vello currently prescribes a mechanism in which allocation failures get detected by fencing back to the CPU. The client responds to this event by creating larger GPU buffers using the bump allocator state obtained via read-back. The client has the choice of dropping skipping a frame or submitting the fine stage only after any allocations failures get resolved. 2. The encoding crate hard-codes the buffers to be large enough to be able to render paris-30k, making it unlikely for simple scenes to under-allocate. This comes at the cost of a fixed memory watermark of >50MB. There may be situations when neither of these solutions are desirable while the cost of additional CPU-side pre-processing is not considered prohibitive for performance. It may also be acceptable to pay the cost of generally allocating more than what's required in order to make the this problem go away entirely (except perhaps for OOM situations). In that spirit, this change introduces the beginnings of a heuristic-based conservative memory estimation utility. It currently estimates only the LineSoup buffer (which contains the curve flattening output) within a factor of 1.1x-3.3x on the Vello test scenes (paris-30k is estimated at 1.5x the actual requirement). - Curves are estimated using Wang's formula which is fast to evaluate but produces a less optimal result than Vello's analytic approach. The overestimation is more pronounced with increased curvature variation. - Explicit lines (such as line-tos) get estimated precisely - Only the LineSoup buffer is supported. - A BumpEstimator is integrated with the Scene API (gated by a feature flag) but the results are currently unused. Glyph runs are not supported as the estimator is not yet aware of the path data stored in glyph cache.

armansito requested review from raphlinus and dfrg February 13, 2024 22:47

armansito mentioned this pull request Feb 20, 2024

[encoding] Bump estimate for segments #454

Merged

xStrom reviewed Feb 24, 2024

View reviewed changes

crates/encoding/src/estimate.rs Outdated Show resolved Hide resolved

raphlinus approved these changes Mar 5, 2024

View reviewed changes

armansito force-pushed the bump-estimate branch 2 times, most recently from 3f56e51 to 7bfca34 Compare March 12, 2024 19:47

armansito force-pushed the bump-estimate branch from 7bfca34 to ac079c1 Compare March 12, 2024 20:07

armansito added this pull request to the merge queue Mar 12, 2024

Merged via the queue into main with commit f55f82f Mar 12, 2024
9 checks passed

armansito deleted the bump-estimate branch March 12, 2024 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[encoding] Initial attempt at a BumpEstimator utility #436

[encoding] Initial attempt at a BumpEstimator utility #436

armansito commented Feb 13, 2024

raphlinus left a comment

raphlinus Mar 4, 2024

xStrom Mar 5, 2024

[encoding] Initial attempt at a BumpEstimator utility #436

[encoding] Initial attempt at a BumpEstimator utility #436

Conversation

armansito commented Feb 13, 2024

raphlinus left a comment

Choose a reason for hiding this comment

raphlinus Mar 4, 2024

Choose a reason for hiding this comment

xStrom Mar 5, 2024

Choose a reason for hiding this comment