Implement push-constants (#574)

* Initial implementation of push_constants * Initial implementation of push_constants * Better handling of limits Fix lint errors. * One more lint error. * And one more typo. * Change limits to use hyphens Combine the code that accesses features and limits for adapters and devices, since they are almost identical. Add an error for unknown limit * Forgot to uncomment some lines * Removed a couple of more comments * Fix typo in comment. Minor cleanup. * Move push_constants stuff to extras.py * Fix flake and codegen * Fix failing test * Linux is failing even though my Mac isn't. I have to figure out what's wrong. :-( * And one last lint problem * First pass at documentation. * First pass at documentation. * Undo accidental modification * See * Found one carryover from move to 22.1 that I forgot to include. Undoing all typo mistakes and moving to a different push. * Yikes. One more _api change * Yikes. One more _api change * Apply suggestions from code review Co-authored-by: Almar Klein <[email protected]> * Update comments. Comment @create_and_release as requested. * Tiny change to get tests to run again. * Apply suggestions from code review Co-authored-by: Almar Klein <[email protected]> --------- Co-authored-by: Almar Klein <[email protected]> Co-authored-by: Korijn van Golen <[email protected]>
pygfx · Sep 17, 2024 · 0a243bb · 0a243bb
1 parent 466af69
commit 0a243bb
Show file tree

Hide file tree

Showing 8 changed files with 538 additions and 98 deletions.
diff --git a/docs/backends.rst b/docs/backends.rst
@@ -59,6 +59,103 @@ The wgpu_native backend provides a few extra functionalities:
     :return: Device
     :rtype: wgpu.GPUDevice
 
+The wgpu_native backend provides support for push constants.
+Since WebGPU does not support this feature, documentation on its use is hard to find.
+A full explanation of push constants and its use in Vulkan can be found
+`here <https://vkguide.dev/docs/chapter-3/push_constants/>`_.
+Using push constants in WGPU closely follows the Vulkan model.
+
+The advantage of push constants is that they are typically faster to update than uniform buffers.
+Modifications to push constants are included in the command encoder; updating a uniform
+buffer involves sending a separate command to the GPU.
+The disadvantage of push constants is that their size limit is much smaller. The limit
+is guaranteed to be at least 128 bytes, and 256 bytes is typical.
+
+Given an adapter, first determine if it supports push constants::
+
+    >> "push-constants" in adapter.features
+    True
+
+If push constants are supported, determine the maximum number of bytes that can
+be allocated for push constants::
+
+    >> adapter.limits["max-push-constant-size"]
+    256
+
+You must tell the adapter to create a device that supports push constants,
+and you must tell it the number of bytes of push constants that you are using.
+Overestimating is okay::
+
+    device = adapter.request_device(
+        required_features=["push-constants"],
+        required_limits={"max-push-constant-size": 256},
+    )
+
+Creating a push constant in your shader code is similar to the way you would create
+a uniform buffer.
+The fields that are only used in the ``@vertex`` shader should be separated from the fields
+that are only used in the ``@fragment`` shader which should be separated from the fields
+used in both shaders::
+
+    struct PushConstants {
+        // vertex shader
+        vertex_transform: vec4x4f,
+        // fragment shader
+        fragment_transform: vec4x4f,
+        // used in both
+        generic_transform: vec4x4f,
+    }
+    var<push_constant> push_constants: PushConstants;
+
+To the pipeline layout for this shader, use
+``wgpu.backends.wpgu_native.create_pipeline_layout`` instead of
+``device.create_pipelinelayout``.  It takes an additional argument,
+``push_constant_layouts``, describing
+the layout of the push constants.  For example, in the above example::
+
+    push_constant_layouts = [
+        {"visibility": ShaderState.VERTEX, "start": 0, "end": 64},
+        {"visibility": ShaderStage.FRAGMENT, "start": 64, "end": 128},
+        {"visibility": ShaderState.VERTEX + ShaderStage.FRAGMENT , "start": 128, "end": 192},
+    ],
+
+Finally, you set the value of the push constant by using
+``wgpu.backends.wpgu_native.set_push_constants``::
+
+    set_push_constants(this_pass, ShaderStage.VERTEX, 0, 64, <64 bytes>)
+    set_push_constants(this_pass, ShaderStage.FRAGMENT, 64, 128, <64 bytes>)
+    set_push_constants(this_pass, ShaderStage.VERTEX + ShaderStage.FRAGMENT, 128, 192, <64 bytes>)
+
+Bytes must be set separately for each of the three shader stages.  If the push constant has
+already been set, on the next use you only need to call ``set_push_constants`` on those
+bytes you wish to change.
+
+.. py:function:: wgpu.backends.wpgu_native.create_pipeline_layout(device, *, label="", bind_group_layouts, push_constant_layouts=[])
+
+   This method provides the same functionality as :func:`wgpu.GPUDevice.create_pipeline_layout`,
+   but provides an extra `push_constant_layouts` argument.
+   When using push constants, this argument is a list of dictionaries, where each item
+   in the dictionary has three fields: `visibility`, `start`, and `end`.
+
+    :param device: The device on which we are creating the pipeline layout
+    :param label: An optional label
+    :param bind_group_layouts:
+    :param push_constant_layouts: Described above.
+
+.. py:function:: wgpu.backends.wgpu_native.set_push_constants(render_pass_encoder, visibility, offset, size_in_bytes, data, data_offset=0)
+
+    This function requires that the underlying GPU implement `push_constants`.
+    These push constants are a buffer of bytes available to the `fragment` and `vertex`
+    shaders. They are similar to a bound buffer, but the buffer is set using this
+    function call.
+
+    :param render_pass_encoder: The render pass encoder to which we are pushing constants.
+    :param visibility: The stages (vertex, fragment, or both) to which these constants are visible
+    :param offset: The offset into the push constants at which the bytes are to be written
+    :param size_in_bytes: The number of bytes to copy from the ata
+    :param data: The data to copy to the buffer
+    :param data_offset: The starting offset in the data at which to begin copying.
+
 
 The js_webgpu backend
 ---------------------

diff --git a/tests/test_set_constant.py b/tests/test_set_constant.py
@@ -0,0 +1,164 @@
+import numpy as np
+import pytest
+
+import wgpu.utils
+from tests.testutils import can_use_wgpu_lib, run_tests
+from wgpu import TextureFormat
+from wgpu.backends.wgpu_native.extras import create_pipeline_layout, set_push_constants
+
+if not can_use_wgpu_lib:
+    pytest.skip("Skipping tests that need the wgpu lib", allow_module_level=True)
+
+
+"""
+This code is an amazingly slow way of adding together two 10-element arrays of 32-bit
+integers defined by push constants and store them into an output buffer.
+
+The first number of the addition is purposely pulled using the vertex stage, and the
+second number from the fragment stage, so that we can ensure that we are correctly
+using stage-separated push constants correctly.
+
+The source code assumes the topology is POINT-LIST, so that each call to vertexMain
+corresponds with one call to fragmentMain.
+"""
+COUNT = 10
+
+SHADER_SOURCE = (
+    f"""
+    const COUNT = {COUNT}u;
+"""
+    """
+    // Put the results here
+    @group(0) @binding(0) var<storage, read_write> data: array<u32, COUNT>;
+
+    struct PushConstants {
+        values1: array<u32, COUNT>, // VERTEX constants
+        values2: array<u32, COUNT>, // FRAGMENT constants
+    }
+    var<push_constant> push_constants: PushConstants;
+
+    struct VertexOutput {
+        @location(0) index: u32,
+        @location(1) value: u32,
+        @builtin(position) position: vec4f,
+    }
+
+    @vertex
+    fn vertexMain(
+        @builtin(vertex_index) index: u32,
+    ) -> VertexOutput {
+        return VertexOutput(index, push_constants.values1[index], vec4f(0, 0, 0, 1));
+    }
+
+    @fragment
+    fn fragmentMain(@location(0) index: u32,
+                    @location(1) value: u32
+    ) -> @location(0) vec4f {
+        data[index] = value + push_constants.values2[index];
+        return vec4f();
+    }
+"""
+)
+
+BIND_GROUP_ENTRIES = [
+    {"binding": 0, "visibility": "FRAGMENT", "buffer": {"type": "storage"}},
+]
+
+
+def setup_pipeline():
+    adapter = wgpu.gpu.request_adapter(power_preference="high-performance")
+    device = adapter.request_device(
+        required_features=["push-constants"],
+        required_limits={"max-push-constant-size": 128},
+    )
+    output_texture = device.create_texture(
+        # Actual size is immaterial.  Could just be 1x1
+        size=[128, 128],
+        format=TextureFormat.rgba8unorm,
+        usage="RENDER_ATTACHMENT|COPY_SRC",
+    )
+    shader = device.create_shader_module(code=SHADER_SOURCE)
+    bind_group_layout = device.create_bind_group_layout(entries=BIND_GROUP_ENTRIES)
+    render_pipeline_layout = create_pipeline_layout(
+        device,
+        bind_group_layouts=[bind_group_layout],
+        push_constant_layouts=[
+            {"visibility": "VERTEX", "start": 0, "end": COUNT * 4},
+            {"visibility": "FRAGMENT", "start": COUNT * 4, "end": COUNT * 4 * 2},
+        ],
+    )
+    pipeline = device.create_render_pipeline(
+        layout=render_pipeline_layout,
+        vertex={
+            "module": shader,
+            "entry_point": "vertexMain",
+        },
+        fragment={
+            "module": shader,
+            "entry_point": "fragmentMain",
+            "targets": [{"format": output_texture.format}],
+        },
+        primitive={
+            "topology": "point-list",
+        },
+    )
+    render_pass_descriptor = {
+        "color_attachments": [
+            {
+                "clear_value": (0, 0, 0, 0),  # only first value matters
+                "load_op": "clear",
+                "store_op": "store",
+                "view": output_texture.create_view(),
+            }
+        ],
+    }
+
+    return device, pipeline, render_pass_descriptor
+
+
+def test_normal_push_constants():
+    device, pipeline, render_pass_descriptor = setup_pipeline()
+    vertex_call_buffer = device.create_buffer(size=COUNT * 4, usage="STORAGE|COPY_SRC")
+    bind_group = device.create_bind_group(
+        layout=pipeline.get_bind_group_layout(0),
+        entries=[
+            {"binding": 0, "resource": {"buffer": vertex_call_buffer}},
+        ],
+    )
+
+    encoder = device.create_command_encoder()
+    this_pass = encoder.begin_render_pass(**render_pass_descriptor)
+    this_pass.set_pipeline(pipeline)
+    this_pass.set_bind_group(0, bind_group)
+
+    buffer = np.random.randint(0, 1_000_000, size=(2 * COUNT), dtype=np.uint32)
+    set_push_constants(this_pass, "VERTEX", 0, COUNT * 4, buffer)
+    set_push_constants(this_pass, "FRAGMENT", COUNT * 4, COUNT * 4, buffer, COUNT * 4)
+    this_pass.draw(COUNT)
+    this_pass.end()
+    device.queue.submit([encoder.finish()])
+    info_view = device.queue.read_buffer(vertex_call_buffer)
+    result = np.frombuffer(info_view, dtype=np.uint32)
+    expected_result = buffer[0:COUNT] + buffer[COUNT:]
+    assert all(result == expected_result)
+
+
+def test_bad_set_push_constants():
+    device, pipeline, render_pass_descriptor = setup_pipeline()
+    encoder = device.create_command_encoder()
+    this_pass = encoder.begin_render_pass(**render_pass_descriptor)
+
+    def zeros(n):
+        return np.zeros(n, dtype=np.uint32)
+
+    with pytest.raises(ValueError):
+        # Buffer is to short
+        set_push_constants(this_pass, "VERTEX", 0, COUNT * 4, zeros(COUNT - 1))
+
+    with pytest.raises(ValueError):
+        # Buffer is to short
+        set_push_constants(this_pass, "VERTEX", 0, COUNT * 4, zeros(COUNT + 1), 8)
+
+
+if __name__ == "__main__":
+    run_tests(globals())
diff --git a/tests/test_wgpu_native_basics.py b/tests/test_wgpu_native_basics.py
@@ -424,18 +424,48 @@ def test_features_are_legal():
     )
     # We can also use underscore
     assert are_features_wgpu_legal(["push_constants", "vertex_writable_storage"])
+    # We can also use camel case
+    assert are_features_wgpu_legal(["PushConstants", "VertexWritableStorage"])
 
 
 def test_features_are_illegal():
-    # not camel Case
-    assert not are_features_wgpu_legal(["pushConstants"])
     # writable is misspelled
     assert not are_features_wgpu_legal(
         ["multi-draw-indirect", "vertex-writeable-storage"]
     )
     assert not are_features_wgpu_legal(["my-made-up-feature"])
 
 
+def are_limits_wgpu_legal(limits):
+    """Returns true if the list of features is legal. Determining whether a specific
+    set of features is implemented on a particular device would make the tests fragile,
+    so we only verify that the names are legal feature names."""
+    adapter = wgpu.gpu.request_adapter(power_preference="high-performance")
+    try:
+        adapter.request_device(required_limits=limits)
+        return True
+    except RuntimeError as e:
+        assert "Unsupported features were requested" in str(e)
+        return True
+    except KeyError:
+        return False
+
+
+def test_limits_are_legal():
+    # A standard feature.  Probably exists
+    assert are_limits_wgpu_legal({"max-bind-groups": 8})
+    # Two common extension features
+    assert are_limits_wgpu_legal({"max-push-constant-size": 128})
+    # We can also use underscore
+    assert are_limits_wgpu_legal({"max_bind_groups": 8, "max_push_constant_size": 128})
+    # We can also use camel case
+    assert are_limits_wgpu_legal({"maxBindGroups": 8, "maxPushConstantSize": 128})
+
+
+def test_limits_are_not_legal():
+    assert not are_limits_wgpu_legal({"max-bind-group": 8})
+
+
 if __name__ == "__main__":
     run_tests(globals())
 

diff --git a/tests_mem/testutils.py b/tests_mem/testutils.py
@@ -145,7 +145,40 @@ def ob_name_from_test_func(func):
 
 
 def create_and_release(create_objects_func):
-    """Decorator."""
+    """
+    This wrapper goes around a test that takes a single argument n. That test should
+    be a generator function that yields a descriptor followed
+    n different objects corresponding to the name of the test function.  Hence
+    a test named `test_release_foo_bar` would yield a descriptor followed by
+    n FooBar objects.
+
+    The descriptor is a dictionary with three fields, each optional.
+    In a typical situation, there will be `n` FooBar object after the test, and after
+    releasing, there will be zero. However, sometimes there are auxiliary objects,
+    in which case its necessary to provide one or more fields.
+
+    The keys "expected_counts_after_create" and "expected_counts_after_release" each have
+    as their value a sub-dictionary giving the number of still-alive WGPU objects.
+    The key "expected_counts_after_create" gives the expected state after the
+    n objects have been created and put into a list; "expected_counts_after_release"
+    gives the state after the n objects have been released.
+
+    These sub-dictionaries have as their keys the names of WGPU object types, and
+    their value is a tuple of two integers: the first is the number of Python objects
+    expected to exist and the second is the number of native objects. Any type not in
+    the subdictionary has an implied value of (0, 0).
+
+    The key "ignore" has as its value a collection of object types that we should ignore
+    in this test. Ideally we should not use this, but currently there are a few cases where
+    we cannot reliably predict the number of objects in wgpu-native.
+
+    If the descriptor doesn't contain an "expected_counts_after_create", then the default
+    is {"FooBar": (n, n)}, where "FooBar" is derived from the name of the test.
+
+    If the descriptor doesn't contain an "expected_counts_after_release", then the
+    default is {}, indicated that creating and removing the objects should completely
+    clean itself up.
+    """
 
     def core_test_func():
         """The core function that does the testing."""