-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[encoding] Initial attempt at a BumpEstimator utility #436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can go in, minor performance tweak and cleanup suggested. If we weren't planning on iterating toward putting the estimation at resolve time, I'd ask for CI changes, but if this is a reasonably temporary state, I'm ok with it not going in. Just don't be surprised if the feature has some breakage :)
# Enables GPU memory usage estimation. This performs additional computations | ||
# in order to estimate the minimum required allocations for buffers backing | ||
# bump-allocated GPU memory. | ||
bump_estimate = ["vello_encoding/bump_estimate"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we now have more possibilities for things to break, one of the things I'd like to see is running at least cargo check in CI with the feature enabled and disabled.
However, since the longer term plan is most likely to move estimation to resolve time, which hopefully will mean that the cost of estimation will be a runtime rather than a compile time choice and so won't need a feature gate, I'm not going to ask for the CI changes now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI already runs:
--no-default-features
- default features
--all-features
3f56e51
to
7bfca34
Compare
Several vello stages dynamically bump allocate intermediate data structures. Due to graphics API limitations the backing memory for these data structures must have been allocated at the time of command submission even though the precise memory requirements are unknown. Vello currently works around this issue in two ways (see #366): 1. Vello currently prescribes a mechanism in which allocation failures get detected by fencing back to the CPU. The client responds to this event by creating larger GPU buffers using the bump allocator state obtained via read-back. The client has the choice of dropping skipping a frame or submitting the fine stage only after any allocations failures get resolved. 2. The encoding crate hard-codes the buffers to be large enough to be able to render paris-30k, making it unlikely for simple scenes to under-allocate. This comes at the cost of a fixed memory watermark of >50MB. There may be situations when neither of these solutions are desirable while the cost of additional CPU-side pre-processing is not considered prohibitive for performance. It may also be acceptable to pay the cost of generally allocating more than what's required in order to make the this problem go away entirely (except perhaps for OOM situations). In that spirit, this change introduces the beginnings of a heuristic-based conservative memory estimation utility. It currently estimates only the LineSoup buffer (which contains the curve flattening output) within a factor of 1.1x-3.3x on the Vello test scenes (paris-30k is estimated at 1.5x the actual requirement). - Curves are estimated using Wang's formula which is fast to evaluate but produces a less optimal result than Vello's analytic approach. The overestimation is more pronounced with increased curvature variation. - Explicit lines (such as line-tos) get estimated precisely - Only the LineSoup buffer is supported. - A BumpEstimator is integrated with the Scene API (gated by a feature flag) but the results are currently unused. Glyph runs are not supported as the estimator is not yet aware of the path data stored in glyph cache.
7bfca34
to
ac079c1
Compare
Several vello stages dynamically bump allocate intermediate data structures. Due to graphics API limitations the backing memory for these data structures must have been allocated at the time of command submission even though the precise memory requirements are unknown.
Vello currently works around this issue in two ways (see #366):
It prescribes a mechanism in which allocation failures get detected by fencing back to the CPU. The client responds to this event by creating larger GPU buffers using the bump allocator state obtained via read-back. The client has the choice of dropping a frame or submitting the fine stage only after any allocation failures have been resolved.
The encoding crate hard-codes the buffers to be large enough to be able to render paris-30k, making it unlikely for simple scenes to under-allocate. This comes at the cost of a fixed memory watermark of >50MB.
There may be situations when neither of these solutions are desirable while the cost of additional CPU-side pre-processing is not considered prohibitive for performance. It may also be acceptable to pay the cost of generally allocating more than what's required in order to make underallocation impossible (except perhaps for OOM situations).
In that spirit, this change introduces the beginnings of a heuristic-based conservative memory estimation utility. It currently estimates only the LineSoup buffer (which contains the curve flattening output) within a factor of 1.1x-3.3x on the Vello test scenes (paris-30k is estimated at 1.5x the actual requirement).
Curves are estimated using Wang's formula which is fast to evaluate but produces a less optimal result than Vello's analytic approach. The overestimation is more pronounced with increased curvature variation.
Explicit lines (such as line-tos) get estimated precisely.
As an initial stage, only the LineSoup buffer is supported. Support for the other buffers will be added as follow-up work, as they require experiments with additional heuristics.
A BumpEstimator is integrated with the Scene API (gated by a feature flag) but the results are currently unused. Glyph runs are not supported as the estimator is not yet aware of the path data stored in the glyph cache. Transformed scene fragments are supported by applying fine-grained scaling to curve line counts, skipping explicit lines which are scale-invariant.