You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I might change the title to "piet-gpu-hal postmortem" if there is thought this one is too dramatic.
As of linebender/vello#209, the existing GLSL pipeline of piet-gpu is deleted, and with it piet-gpu-hal. Going forward, the GPU infrastructure is wgpu. There are lots of advantages to this decision, most especially the opportunity to work more closely with other projects in the wgpu ecosystem, but there was also a lot promising about piet-gpu-hal, and I learned a lot from developing it. Here, I'll try to share some of that.
Goals
The goals of piet-gpu-hal were admirable, and I think are still valid.
Small runtime, especially no shader compilation at runtime.
Access to advanced features such as subgroups, descriptor indexing, memory model.
Runtime query, including unified memory to elide need for staging buffers.
Unlock full performance potential.
Emphasis on compute.
Implementation choices
Author shaders in GLSL. This is the only language that supports all Vulkan features (otherwise HLSL is better).
Use spirv-cross for shader compilation.
HAL API was loosely based on gfx-hal, but with more flexibility to diverge from Vulkan semantics.
Talk about timer queries on Metal.
This is fundamentally object-oriented.
Friction
The process of developing on piet-gpu-hal had a lot of friction.
Primitive tooling
Complex build process for shaders.
Desire to have precompiled shaders in git for easy use complicated things.
Shader compilation in GitHub Actions.
Portability concerns: DXC signing, we never did get Metal compiled all the way to AIR.
Adding features to HAL was painful.
We never did get render pipeline done; impedance mismatch with GLSL combined samplers
No great story for interop with other GPU applications
Problems are conceptually solvable, but we didn't actually do it.
Probably good to link prototype bevy integration here.
Lots of tricky stuff like using common allocation pools
Experience with WebGPU is much easier, higher velocity.
Other workloads (AI/ML) never materialized, but do link Matt Keeter's experiment.
Ecosystem
Major expected benefit of using wgpu is working more closely with the ecosystem.
wgpu is responsive, and fixes there benefit many more people
Tools like wgpu-analyzer work.
Can deploy to Web using WebGPU.
Lots of interest in integrating 3d content, games.
Very excited about potential collaboration with Bevy.
Going forward
Plan for next few months is to stay in pure WGSL / WebGPU, main goal is to maintain development velocity, build out needed features in piet-gpu. Take the hit on runtime shader compilation and other performance paper cuts. We're not prioritizing micro performance work anyway right now (subgroups, descriptor indexing / bindless).
On roadmap, we'd like to gain back the performance regression. There are two viable paths for this.
One is to enhance wgpu. High order bit is precompiled shaders, and we have an issue for this. These could be firmly based in WGSL, or the infrastructure could support authoring in other languages. (But which one? GLSL would meet requirements but is not pleasant). Staying in WGSL, to unlock advanced features such as subgroups and descriptor indexing would require WGSL / WebGPU extensions. This work will happen anyway, will just take a while. It is appealing to use piet-gpu as a playground for doing WebGPU extension work, as the timelines match up fairly well. However, we could find ourselves blocked.
Two is to reinstate native infra. It will look different than piet-gpu-hal though. New Engine abstraction is more declarative, less object oriented. Types manipulated by application are basically simple value types, without platform dependence and complex lifetimes. Can imagine native players that are not wgpu. These wouldn't be object oriented, not based on dynamic dispatch (or complex trait magic), but would be more specialized to the platform. (I'm wondering how much detail is relevant here; the ideas are not fully baked, and this is supposed to be a retrospective).
Descriptor indexing / bindless is hard because it's not just WGSL but also implicates the API. Important to avoid copies on image scaling. For now, plan is to have big image atlas and render all images into it; not ideal but viable. Discussed in image issue.
Prospect: really advanced GPU capabilities like Metal indirect command encoding and VK_NV_device_generated_commands, useful for hybrid model where some rendering is done in compute shaders, some in rasterization pipeline. Limited version (fixed set of draws, but open ended instances) possible with DX12 ExecuteIndirect. Unlikely to land in WebGPU any time soon, if ever, because huge differences between platforms.
Other things we'd like that are unlikely to land in WebGPU but might be extensions:
Incremental present
Ability to write swapchain surface from compute shader Zulip discussion
TODO... I'm sure there are others
Conclusion
There is no good, general GPU infrastructure. Dealing with the diversity of both hardware and APIs increases the complexity. (Diversity of APIs for similar/same hardware is accidental complexity, but we have to deal with it).
Best bet for right now is WebGPU. There's a performance hit, but the lower friction is worth it. I predict a lot of workloads will be ported run on WebGPU (cite wonnx for ML), then that will motivate performance work. Otherwise, for native, "just use CUDA" and "handroll infrastructure for the project" will prevail. (maybe cite MediaPipe as an example of ML with handrolled native GPU infra, possibly also Kompute as a Vulkan-specific approach. Metal is relevant because M1/M2 GPU has great power efficiency & nice ability to access large RAM space)
The original vision of having lightweight portable GPU runtime for piet-gpu is still valid. It's still on the roadmap, but deferred; that work can happen after core piet-gpu is running well and applications are built.
Next few months, we hope for increased velocity of piet-gpu development, and collaboration with other projects in wgpu ecosystem, plus helping make wgpu and WebGPU better. Exciting times!
The text was updated successfully, but these errors were encountered:
I might change the title to "piet-gpu-hal postmortem" if there is thought this one is too dramatic.
As of linebender/vello#209, the existing GLSL pipeline of piet-gpu is deleted, and with it piet-gpu-hal. Going forward, the GPU infrastructure is wgpu. There are lots of advantages to this decision, most especially the opportunity to work more closely with other projects in the wgpu ecosystem, but there was also a lot promising about piet-gpu-hal, and I learned a lot from developing it. Here, I'll try to share some of that.
Goals
The goals of piet-gpu-hal were admirable, and I think are still valid.
Implementation choices
Friction
The process of developing on piet-gpu-hal had a lot of friction.
Experience with WebGPU is much easier, higher velocity.
Other workloads (AI/ML) never materialized, but do link Matt Keeter's experiment.
Ecosystem
Major expected benefit of using wgpu is working more closely with the ecosystem.
Going forward
Plan for next few months is to stay in pure WGSL / WebGPU, main goal is to maintain development velocity, build out needed features in piet-gpu. Take the hit on runtime shader compilation and other performance paper cuts. We're not prioritizing micro performance work anyway right now (subgroups, descriptor indexing / bindless).
On roadmap, we'd like to gain back the performance regression. There are two viable paths for this.
One is to enhance wgpu. High order bit is precompiled shaders, and we have an issue for this. These could be firmly based in WGSL, or the infrastructure could support authoring in other languages. (But which one? GLSL would meet requirements but is not pleasant). Staying in WGSL, to unlock advanced features such as subgroups and descriptor indexing would require WGSL / WebGPU extensions. This work will happen anyway, will just take a while. It is appealing to use piet-gpu as a playground for doing WebGPU extension work, as the timelines match up fairly well. However, we could find ourselves blocked.
Two is to reinstate native infra. It will look different than piet-gpu-hal though. New
Engine
abstraction is more declarative, less object oriented. Types manipulated by application are basically simple value types, without platform dependence and complex lifetimes. Can imagine native players that are not wgpu. These wouldn't be object oriented, not based on dynamic dispatch (or complex trait magic), but would be more specialized to the platform. (I'm wondering how much detail is relevant here; the ideas are not fully baked, and this is supposed to be a retrospective).Descriptor indexing / bindless is hard because it's not just WGSL but also implicates the API. Important to avoid copies on image scaling. For now, plan is to have big image atlas and render all images into it; not ideal but viable. Discussed in image issue.
Prospect: really advanced GPU capabilities like Metal indirect command encoding and VK_NV_device_generated_commands, useful for hybrid model where some rendering is done in compute shaders, some in rasterization pipeline. Limited version (fixed set of draws, but open ended instances) possible with DX12 ExecuteIndirect. Unlikely to land in WebGPU any time soon, if ever, because huge differences between platforms.
Other things we'd like that are unlikely to land in WebGPU but might be extensions:
Conclusion
There is no good, general GPU infrastructure. Dealing with the diversity of both hardware and APIs increases the complexity. (Diversity of APIs for similar/same hardware is accidental complexity, but we have to deal with it).
Best bet for right now is WebGPU. There's a performance hit, but the lower friction is worth it. I predict a lot of workloads will be ported run on WebGPU (cite wonnx for ML), then that will motivate performance work. Otherwise, for native, "just use CUDA" and "handroll infrastructure for the project" will prevail. (maybe cite MediaPipe as an example of ML with handrolled native GPU infra, possibly also Kompute as a Vulkan-specific approach. Metal is relevant because M1/M2 GPU has great power efficiency & nice ability to access large RAM space)
The original vision of having lightweight portable GPU runtime for piet-gpu is still valid. It's still on the roadmap, but deferred; that work can happen after core piet-gpu is running well and applications are built.
Next few months, we hope for increased velocity of piet-gpu development, and collaboration with other projects in wgpu ecosystem, plus helping make wgpu and WebGPU better. Exciting times!
The text was updated successfully, but these errors were encountered: