Skip to content

Add IR instructions for cooperative matrix/vector ops#10643

Merged
jkwak-work merged 5 commits intoshader-slang:masterfrom
cmarcelo:cmat-cvec-ir-ops
Mar 31, 2026
Merged

Add IR instructions for cooperative matrix/vector ops#10643
jkwak-work merged 5 commits intoshader-slang:masterfrom
cmarcelo:cmat-cvec-ir-ops

Conversation

@cmarcelo
Copy link
Copy Markdown
Contributor

No description provided.

@cmarcelo cmarcelo requested a review from a team as a code owner March 23, 2026 03:34
@cmarcelo cmarcelo requested review from bmillsNV and Copilot and removed request for a team March 23, 2026 03:34
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds five cooperative matrix/vector IR opcodes and three new scalar kinds (BFloat16, FloatE4M3, FloatE5M2); implements IR validation, reflection/bindings, HLSL/SPIR-V/CUDA (OptiX) codegen paths, tests, and tooling updates to support these additions.

Changes

Cohort / File(s) Summary
Public C API & Scalar Declarations
include/slang.h, tools/gfx/slang.slang
Added cooperative enums (SlangCooperativeMatrixUse, SlangCooperativeVectorMatrixLayout) and extended public scalar enums with BFLOAT16 / FLOAT_E4M3 / FLOAT_E5M2 (also added INTPTR/UINTPTR in tooling enum).
Scalar Text, Reflection & Bindings
source/core/slang-type-text-util.cpp, source/slang/slang-reflection-api.cpp, source/slang/slang-reflection-json.cpp, source/slang-wasm/slang-wasm-bindings.cpp
Mapped new scalar text names, made reflection recognize and report the three new scalar types, emit them in JSON, and export them to Emscripten bindings.
IR Ops, Stable Names & Version
source/slang/slang-ir-insts.lua, source/slang/slang-ir-insts-stable-names.lua, source/slang/slang-ir.h
Added five IR opcodes (CoopMatMulAdd, CoopVecMatMul, CoopVecMatMulAdd, CoopVecOuterProductAccumulate, CoopVecReduceSumAccumulate) and bumped IR max supported module version (11→12).
IR Validation & Link Integration
source/slang/slang-ir-validate.cpp, source/slang/slang-ir-validate.h, source/slang/slang-emit.cpp
Implemented cooperative-IR validators, exposed validateCooperativeIRModuleIfEnabled, and invoked cooperative validation in link/emit pipeline.
IR Semantics & Folding
source/slang/slang-ir.cpp, source/slang/slang-emit-c-like.cpp
Marked new coop intrinsics as side-effect-free where appropriate and prevented folding; forced statement emission for coop opcodes in C-like emitter.
CUDA / OptiX Prelude & Codegen
prelude/slang-cuda-prelude.h, source/slang/slang-emit-cuda.cpp
Removed Slang→OptiX enum-wrapper indirections for coop-vec templates, switched templates to use OptiX native types, added OptiX name-mapping helpers, validation, diagnostics, and emission for coop-vec/mat ops.
HLSL Meta & Emitter
source/slang/hlsl.meta.slang, source/slang/slang-emit-hlsl.cpp, source/slang/slang-emit-hlsl.h
Refactored HLSL meta to use C API constants, added intrinsic wrappers for coop-mat/coop-vec ops, removed target-specific mutators, and added emitter helpers to map component/layout to HLSL builtin calls; declared two new emitter helpers.
SPIR-V Emission
source/slang/slang-emit-spirv.cpp
Added SPIR-V emission for cooperative-matrix and cooperative-vector ops, capability/extension gating, operand mapping helpers, signedness/packed handling, and diagnostics/fallbacks.
Cooperative IR Emission Helpers (multi-target)
source/slang/slang-emit-cuda.cpp, source/slang/slang-emit-hlsl.cpp, source/slang/slang-emit-spirv.cpp
Introduced helpers to map component/layout/use to target encodings and unified unsupported-intrinsic handling across targets; added target-specific validation and emission logic for coop ops.
Tests: Codegen & Diagnostics
tests/cooperative-matrix/*, tests/cooperative-vector/*, tests/cuda/*, tests/...
Added numerous new codegen and diagnostic tests for cooperative-matrix/vector ops across CUDA/OptiX, HLSL, and SPIR-V; updated existing CUDA optix tests to match new emission formatting.
Tools & Test Utilities
tools/render-test/shader-input-layout.cpp, tools/slang-test/slang-test-main.cpp, tools/slang-unit-test/unit-test-special-scalar-reflection.cpp
Added runtime conversions/printing for new scalars, broadened nearly-equal float comparisons to include them, and added a unit test verifying reflection of the special scalar types and intptr/uintptr vector elements.
Emitter & Prelude Adjustments
prelude/slang-cuda-prelude.h, source/slang/slang-emit.cpp, source/slang/slang-emit-c-like.cpp
Inserted cooperative-IR validation into linking, adjusted cooperative-matrix use handling to the new enum constants, and ensured coop ops are emitted as statements where required.
🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.96% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive No pull request description was provided by the author, making it impossible to evaluate relevance to the changeset. Add a description explaining the purpose, scope, and motivation for these IR instruction additions.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary change: adding new IR instructions for cooperative matrix and vector operations across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new IR instructions and codegen paths for cooperative matrix/vector operations (SPIR-V NV/KHR and CUDA/OptiX), while extending reflection/support tooling to recognize additional “special” scalar types (bfloat16/float8 variants) and adding regression/codegen tests.

Changes:

  • Introduces new cooperative matrix/vector IR ops (mul-add, matrix-mul, outer-product accumulate, reduce-sum accumulate) plus IR validation and serialization/stable-name updates.
  • Extends reflection/type utilities and test infrastructure to support BFloat16, FloatE4M3, and FloatE5M2 scalar types.
  • Adds/updates backend emission for SPIR-V, HLSL, and CUDA/OptiX, with new/updated .slang tests for codegen and diagnostics.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tools/slang-unit-test/unit-test-special-scalar-reflection.cpp New unit test covering reflection for bfloat16/float8 scalar + vector element types.
tools/slang-test/slang-test-main.cpp Enables float comparisons for new special scalar types in output diffing.
tools/render-test/shader-input-layout.cpp Adds print/texture-data handling for bfloat16 and float8 element types.
tools/gfx/slang.slang Extends public SlangScalarType enum for gfx bindings.
tests/cuda/optix-coopvec.slang Updates OptiX coopvec filecheck expectations.
tests/cuda/optix-coopvec-packed-input-diagnostic.slang Adds diagnostic test for unsupported packed-input coopvec matmul on CUDA/OptiX.
tests/cooperative-vector/training-spirv-codegen.slang New SPIR-V codegen test for cooperative vector training ops.
tests/cooperative-vector/training-hlsl-codegen.slang New HLSL codegen test for cooperative vector training ops.
tests/cooperative-vector/training-cuda-codegen.slang New CUDA/OptiX codegen test for cooperative vector training ops.
tests/cooperative-vector/matrix-mul-spirv-codegen.slang New SPIR-V codegen test for coopvec matrix mul/mul-add ops.
tests/cooperative-vector/matrix-mul-hlsl-codegen.slang New HLSL codegen test for coopvec matrix mul/mul-add (packed + non-packed).
tests/cooperative-matrix/mat-mul-add-cuda-codegen.slang New CUDA codegen test for cooperative matrix mul-add.
source/slang/slang-reflection-json.cpp Emits JSON reflection strings for new scalar types.
source/slang/slang-reflection-api.cpp Extends scalar reflection to recognize non-basic scalar types (bfloat16/float8).
source/slang/slang-ir.h Bumps max supported IR module version for new instructions.
source/slang/slang-ir.cpp Marks new cooperative ops as side-effect-free where appropriate.
source/slang/slang-ir-validate.h Declares new cooperative IR validation entrypoint.
source/slang/slang-ir-validate.cpp Implements validation for new cooperative IR instructions.
source/slang/slang-ir-insts.lua Defines new IR instruction opcodes and operand lists.
source/slang/slang-ir-insts-stable-names.lua Assigns stable IDs for new IR instructions.
source/slang/slang-emit.cpp Runs new cooperative IR validation pass during linking/optimization.
source/slang/slang-emit-spirv.cpp Adds SPIR-V emission for new cooperative ops and enum/value mappings.
source/slang/slang-emit-hlsl.h Declares helpers for mapping coopvec enums during HLSL emission.
source/slang/slang-emit-hlsl.cpp Emits HLSL builtins for new cooperative vector operations.
source/slang/slang-emit-cuda.cpp Emits CUDA WMMA coopmat mul-add and OptiX coopvec operations.
source/slang/slang-emit-c-like.cpp Ensures coopvec ops are emitted as statements (not folded).
source/slang/hlsl.meta.slang Adds new intrinsic ops and rewires coopvec/coopmat implementations to IR instructions.
source/slang-wasm/slang-wasm-bindings.cpp Exposes new scalar types to WASM bindings.
source/core/slang-type-text-util.cpp Adds type-name ↔ scalar-type mappings for new scalar types.
prelude/slang-cuda-prelude.h Updates OptiX coopvec wrapper templates (enum types, mapping removal).
include/slang.h Adds public enums for cooperative matrix/vector metadata and extends scalar type enums.

github-actions[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
prelude/slang-cuda-prelude.h (1)

6487-6491: ⚠️ Potential issue | 🔴 Critical

Fix the first slangOptixCoopVecMatMul overload to honor the transpose parameter.

The first overload (line 6487–6505) accepts bool transpose as a runtime parameter, but the implementation hardcodes false when dispatching to optixCoopVecMatMul (line 6501). The CUDA emit code in source/slang/slang-emit-cuda.cpp forwards the transpose operand for non-StructuredBuffer matrices; when transpose=true, the wrapper silently selects the non-transposed code path instead. Dispatch to the appropriate true/false specialization based on the runtime value.

🐛 Proposed fix
-    return optixCoopVecMatMul<
-        VecTOut,
-        VecTIn,
-        inputInterpretation,
-        matrixLayout,
-        false,
-        N,
-        K,
-        matrixInterpretation>(inputVector, matrix, matrixOffset, matrixStride);
+    if (transpose)
+    {
+        return optixCoopVecMatMul<
+            VecTOut,
+            VecTIn,
+            inputInterpretation,
+            matrixLayout,
+            true,
+            N,
+            K,
+            matrixInterpretation>(inputVector, matrix, matrixOffset, matrixStride);
+    }
+    return optixCoopVecMatMul<
+        VecTOut,
+        VecTIn,
+        inputInterpretation,
+        matrixLayout,
+        false,
+        N,
+        K,
+        matrixInterpretation>(inputVector, matrix, matrixOffset, matrixStride);

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 86407eaa-1bb4-4f58-a1ee-8c917b3f600a

📥 Commits

Reviewing files that changed from the base of the PR and between 133ad1d and 493b4aa.

📒 Files selected for processing (31)
  • include/slang.h
  • prelude/slang-cuda-prelude.h
  • source/core/slang-type-text-util.cpp
  • source/slang-wasm/slang-wasm-bindings.cpp
  • source/slang/hlsl.meta.slang
  • source/slang/slang-emit-c-like.cpp
  • source/slang/slang-emit-cuda.cpp
  • source/slang/slang-emit-hlsl.cpp
  • source/slang/slang-emit-hlsl.h
  • source/slang/slang-emit-spirv.cpp
  • source/slang/slang-emit.cpp
  • source/slang/slang-ir-insts-stable-names.lua
  • source/slang/slang-ir-insts.lua
  • source/slang/slang-ir-validate.cpp
  • source/slang/slang-ir-validate.h
  • source/slang/slang-ir.cpp
  • source/slang/slang-ir.h
  • source/slang/slang-reflection-api.cpp
  • source/slang/slang-reflection-json.cpp
  • tests/cooperative-matrix/mat-mul-add-cuda-codegen.slang
  • tests/cooperative-vector/matrix-mul-hlsl-codegen.slang
  • tests/cooperative-vector/matrix-mul-spirv-codegen.slang
  • tests/cooperative-vector/training-cuda-codegen.slang
  • tests/cooperative-vector/training-hlsl-codegen.slang
  • tests/cooperative-vector/training-spirv-codegen.slang
  • tests/cuda/optix-coopvec-packed-input-diagnostic.slang
  • tests/cuda/optix-coopvec.slang
  • tools/gfx/slang.slang
  • tools/render-test/shader-input-layout.cpp
  • tools/slang-test/slang-test-main.cpp
  • tools/slang-unit-test/unit-test-special-scalar-reflection.cpp

@cmarcelo cmarcelo force-pushed the cmat-cvec-ir-ops branch 2 times, most recently from 55caea3 to 5995e4f Compare March 23, 2026 07:07
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
prelude/slang-cuda-prelude.h (2)

6486-6504: ⚠️ Potential issue | 🔴 Critical

Fix the 5-parameter slangOptixCoopVecMatMul overload to respect the runtime transpose parameter.

The function accepts a transpose parameter but hardcodes it to false when calling optixCoopVecMatMul. The CUDA compiler (slang-emit-cuda.cpp) actively validates and extracts this parameter, emitting it at runtime, but the wrapper ignores it entirely—any caller passing transpose=true will silently receive incorrect results. Since OptiX requires transpose as a template parameter, add a runtime branch to instantiate the correct template variant.

The 6-parameter overload (WITH bias) has the same issue and must also be fixed.

🔧 Preserve the current API by branching to the correct OptiX instantiation
 __forceinline__ __device__ VecTOut slangOptixCoopVecMatMul(
     const VecTIn& inputVector,
     CUdeviceptr matrix,
     unsigned matrixOffset,
     bool transpose,
     unsigned matrixStride)
 {
     constexpr unsigned N = OptixCoopVecTraits<VecTOut>::size; // Output vector size
     constexpr unsigned K = OptixCoopVecTraits<VecTIn>::size;  // Input vector size

-    return optixCoopVecMatMul<
-        VecTOut,
-        VecTIn,
-        inputInterpretation,
-        matrixLayout,
-        false,
-        N,
-        K,
-        matrixInterpretation>(inputVector, matrix, matrixOffset, matrixStride);
+    if (transpose)
+    {
+        return optixCoopVecMatMul<
+            VecTOut,
+            VecTIn,
+            inputInterpretation,
+            matrixLayout,
+            true,
+            N,
+            K,
+            matrixInterpretation>(inputVector, matrix, matrixOffset, matrixStride);
+    }
+
+    return optixCoopVecMatMul<
+        VecTOut,
+        VecTIn,
+        inputInterpretation,
+        matrixLayout,
+        false,
+        N,
+        K,
+        matrixInterpretation>(inputVector, matrix, matrixOffset, matrixStride);
 }

6472-6477: ⚠️ Potential issue | 🔴 Critical

Align the cooperative-vector version gating with the wrappers and fix the ignored transpose parameter.

The OptixCoopVecTraits<OptixCoopVec<T, N>> specialization is guarded by OPTIX_VERSION > 90000 (line 6472), but the three slangOptixCoopVecMatMul wrapper functions are guarded by OPTIX_VERSION >= 90000 (line 6462). This creates a breaking mismatch: OptiX 9.0 code can instantiate these wrappers but will fail when accessing OptixCoopVecTraits<VecTOut>::size and OptixCoopVecTraits<VecTIn>::size. Align both guards to >= 90000.

Additionally, the first wrapper (line 6486) accepts a bool transpose parameter but always passes false to the native optixCoopVecMatMul call (line 6497), completely ignoring the input. Either remove the parameter or use it correctly.

🔧 Guard alignment fix
-#if defined(OPTIX_VERSION) && OPTIX_VERSION > 90000
+#if defined(OPTIX_VERSION) && OPTIX_VERSION >= 90000
 template<typename T, unsigned int N>
 struct OptixCoopVecTraits<OptixCoopVec<T, N>>
 {
     static constexpr unsigned int size = N;
 };
 `#endif`

As per coding guidelines, prelude/**: Built-in language definitions and intrinsics. Changes here affect all Slang programs. Backward compatibility and all target backends must handle these intrinsics correctly.

Also applies to: 6486-6510, 6515-6539, 6547-6568

♻️ Duplicate comments (2)
source/slang/slang-emit-cuda.cpp (1)

824-825: 🧹 Nitpick | 🔵 Trivial

Minor inconsistency: mixed use of as<> + SLANG_ASSERT vs cast<>.

Line 824 uses as<> followed by SLANG_ASSERT, while line 851 uses cast<> directly. The cast<> pattern is preferred when the switch case guarantees the IR opcode, as it provides built-in debug assertions.

Also applies to: 851-851

source/slang/slang-emit-hlsl.cpp (1)

711-724: 🧹 Nitpick | 🔵 Trivial

emitMappedCoopVecMatrixLayout doesn't diagnose unmapped layout values.

Unlike emitMappedCoopVecComponentType which explicitly diagnoses unsupported types (e.g., BFloat16), emitMappedCoopVecMatrixLayout relies solely on SLANG_UNEXPECTED in the mapping function for invalid values. While all current layout enum values are mapped, adding explicit validation would provide better diagnostics for future enum additions.

♻️ Suggested validation pattern
 void HLSLSourceEmitter::emitMappedCoopVecMatrixLayout(IRInst* operand)
 {
     auto intLit = as<IRIntLit>(operand);
     if (!intLit)
     {
         getSink()->diagnose(Diagnostics::UnsupportedTargetIntrinsic{
             .operation = "cooperative vector matrix layout (non-constant operand)",
             .location = operand->sourceLoc});
         m_writer->emit("0");
         return;
     }

+    auto layoutValue = (int32_t)intLit->getValue();
+    switch (layoutValue)
+    {
+    case SLANG_COOPERATIVE_VECTOR_MATRIX_LAYOUT_ROW_MAJOR:
+    case SLANG_COOPERATIVE_VECTOR_MATRIX_LAYOUT_COLUMN_MAJOR:
+    case SLANG_COOPERATIVE_VECTOR_MATRIX_LAYOUT_INFERENCING_OPTIMAL:
+    case SLANG_COOPERATIVE_VECTOR_MATRIX_LAYOUT_TRAINING_OPTIMAL:
+        break;
+    default:
+        getSink()->diagnose(Diagnostics::UnsupportedTargetIntrinsic{
+            .operation = "cooperative vector matrix layout (unsupported value)",
+            .location = operand->sourceLoc});
+        m_writer->emit("0");
+        return;
+    }
+
-    m_writer->emit(_mapSlangCoopVecMatrixLayoutToHLSL((int32_t)intLit->getValue()));
+    m_writer->emit(_mapSlangCoopVecMatrixLayoutToHLSL(layoutValue));
 }

As per coding guidelines, source/slang/**: "(5) Null pointer safety and proper error handling via diagnostics."


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b94505b9-56da-4edc-a1d3-98207cfcdbf3

📥 Commits

Reviewing files that changed from the base of the PR and between 493b4aa and 5995e4f.

📒 Files selected for processing (32)
  • include/slang.h
  • prelude/slang-cuda-prelude.h
  • source/core/slang-type-text-util.cpp
  • source/slang-wasm/slang-wasm-bindings.cpp
  • source/slang/hlsl.meta.slang
  • source/slang/slang-emit-c-like.cpp
  • source/slang/slang-emit-cuda.cpp
  • source/slang/slang-emit-hlsl.cpp
  • source/slang/slang-emit-hlsl.h
  • source/slang/slang-emit-spirv.cpp
  • source/slang/slang-emit.cpp
  • source/slang/slang-ir-insts-stable-names.lua
  • source/slang/slang-ir-insts.lua
  • source/slang/slang-ir-validate.cpp
  • source/slang/slang-ir-validate.h
  • source/slang/slang-ir.cpp
  • source/slang/slang-ir.h
  • source/slang/slang-reflection-api.cpp
  • source/slang/slang-reflection-json.cpp
  • tests/cooperative-matrix/mat-mul-add-cuda-codegen.slang
  • tests/cooperative-vector/matrix-mul-hlsl-codegen.slang
  • tests/cooperative-vector/matrix-mul-spirv-codegen.slang
  • tests/cooperative-vector/training-cuda-codegen.slang
  • tests/cooperative-vector/training-hlsl-codegen.slang
  • tests/cooperative-vector/training-spirv-codegen.slang
  • tests/cuda/optix-coopvec-packed-input-diagnostic.slang
  • tests/cuda/optix-coopvec-transpose-diagnostic.slang
  • tests/cuda/optix-coopvec.slang
  • tools/gfx/slang.slang
  • tools/render-test/shader-input-layout.cpp
  • tools/slang-test/slang-test-main.cpp
  • tools/slang-unit-test/unit-test-special-scalar-reflection.cpp

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tools/slang-test/slang-test-main.cpp (1)

4009-4019: ⚠️ Potential issue | 🟠 Major

Use format-aware epsilon for low-precision float comparisons

Adding BFloat16/FloatE4M3/FloatE5M2 here is correct direction, but using the same differenceThreshold (1e-4) as higher-precision floats is too strict for these formats and can cause flaky/false test failures. Pick epsilon by scalarType (or use ULP-based comparison) before Math::AreNearlyEqual.

Proposed change
-                if (!Math::AreNearlyEqual(valueA, valueB, differenceThreshold))
+                double epsilon = differenceThreshold;
+                switch (scalarType)
+                {
+                case ScalarType::FloatE4M3: epsilon = 1e-1; break;
+                case ScalarType::FloatE5M2: epsilon = 5e-2; break;
+                case ScalarType::BFloat16:  epsilon = 1e-2; break;
+                default: break;
+                }
+                if (!Math::AreNearlyEqual(valueA, valueB, epsilon))
                 {
                     return SLANG_FAIL;
                 }
prelude/slang-cuda-prelude.h (1)

6486-6505: ⚠️ Potential issue | 🔴 Critical

Make transpose a template parameter instead of silently ignoring it.

The function accepts bool transpose as a parameter, but hardcodes false when forwarding to optixCoopVecMatMul. This causes wrong-code behavior: a caller passing transpose=true silently executes the non-transposed operation instead of failing or applying the transpose. OptiX exposes transpose as a real template parameter, so either expose it as a template parameter on the wrapper or reject transpose=true at compile time.

Additionally, the wrapper block is guarded by #if (OPTIX_VERSION >= 90000) while the required OptixCoopVecTraits<OptixCoopVec<T, N>> specialization is guarded by #if defined(OPTIX_VERSION) && OPTIX_VERSION > 90000. This creates a gap at exactly OPTIX_VERSION == 90000 where the wrappers would fail to compile due to undefined traits.

♻️ Duplicate comments (6)
source/slang/slang-reflection-json.cpp (1)

471-473: ⚠️ Potential issue | 🟠 Major

Add missing IntPtr/UIntPtr scalar JSON mappings.

The switch now handles new packed-float scalars, but pointer-sized integer scalars are still not mapped, so they fall through to "unknown" in JSON output.

Proposed fix
         CASE(Float16, float16);
         CASE(Float32, float32);
         CASE(Float64, float64);
+        CASE(IntPtr, intptr);
+        CASE(UIntPtr, uintptr);
         CASE(BFloat16, bfloat16);
         CASE(FloatE4M3, float_e4m3);
         CASE(FloatE5M2, float_e5m2);
As per coding guidelines, "Cross-backend consistency — changes to one emitter may need parallel changes in others."
source/slang/slang-ir-validate.cpp (1)

698-705: ⚠️ Potential issue | 🟡 Minor

Potential null dereference if operand type is null.

The chain getSaturatingAccumulation()->getDataType()->getOp() assumes getDataType() returns non-null. While the operand count is validated, malformed IR could have an operand with a null type. Add a null check for defensive safety.

🛡️ Suggested defensive check
-    if (coopMatMulAdd->getSaturatingAccumulation()->getDataType()->getOp() != kIROp_BoolType)
+    auto satAccumType = coopMatMulAdd->getSaturatingAccumulation()->getDataType();
+    if (!satAccumType || satAccumType->getOp() != kIROp_BoolType)
source/slang/hlsl.meta.slang (2)

30950-30975: ⚠️ Potential issue | 🟠 Major

Packed matrixInterpretation and biasInterpretation are still erased on the IR path.

__getCoopVecComponentScalarType() folds SignedInt8Packed and UnsignedInt8Packed into the same scalar-type IDs as the unpacked forms, but the new intrinsic surface only carries a packed bit for inputInterpretation. Packed matrix or bias interpretations therefore become indistinguishable from unpacked storage once lowered. Please either thread explicit packed flags for matrix/bias too, or reject packed matrix/bias interpretations before calling these intrinsics.

As per coding guidelines, source/slang/**: "IR pass correctness — ensure SSA form and type invariants are maintained."

Also applies to: 31040-31045, 31068-31075, 31724-31729, 31738-31745


31042-31043: ⚠️ Potential issue | 🟠 Major

Don't retype raw matrix and bias storage as Ptr<T[]>.

The pointer overloads still encode matrix and bias storage as Ptr<T[]>, where T is the result element type. That mis-types mixed-element cases like half×half→float or int8×int8→int32, so the IR pointer operand no longer matches the actual backing storage. Keep these operands erased, or add separate matrix/bias pointee-type generics instead of reusing T.

As per coding guidelines, source/slang/**: "IR pass correctness — ensure SSA form and type invariants are maintained."

Also applies to: 31070-31073, 32039-32044, 32101-32112

source/slang/slang-emit-cuda.cpp (2)

824-825: 🧹 Nitpick | 🔵 Trivial

Consider using cast<> for consistency with line 851.

The switch case guarantees the opcode, so the downcast should always succeed. Line 851 already uses cast<IRCoopVecReduceSumAccumulate> without SLANG_ASSERT, while this block uses as<> + SLANG_ASSERT. Using cast<> here would be consistent and provides a built-in debug assertion.

Suggested change
-            auto outerProduct = as<IRCoopVecOuterProductAccumulate>(inst);
-            SLANG_ASSERT(outerProduct);
+            auto outerProduct = cast<IRCoopVecOuterProductAccumulate>(inst);

1349-1375: ⚠️ Potential issue | 🟠 Major

Missing transpose emission in kIROp_CoopVecMatMulAdd.

The kIROp_CoopVecMatMul case (lines 1180-1196) conditionally emits the transpose operand for non-StructuredBuffer matrices, but kIROp_CoopVecMatMulAdd validates the transpose constraint (lines 1305-1323) without ever emitting it in the final call. This creates an inconsistency where transpose is validated but ignored.

Add the same StructuredBuffer check and conditional transpose emission to match MatMul.

Suggested fix
+            bool isStructuredBufferMatrix =
+                as<IRHLSLStructuredBufferTypeBase>(coopVecMatMulAdd->getMatrixPtr()->getDataType()) !=
+                nullptr;
+
             m_writer->emit("(");
             m_writer->emit("slangOptixCoopVecMatMul<");
             emitType(inst->getDataType());
             m_writer->emit(", ");
             emitType(coopVecMatMulAdd->getInput()->getDataType());
             m_writer->emit(", ");
             m_writer->emit(inputInterpretation);
             m_writer->emit(", ");
             m_writer->emit(matrixInterpretation);
             m_writer->emit(", ");
             m_writer->emit(matrixLayout);
             m_writer->emit(", ");
             m_writer->emit(biasInterpretation);
             m_writer->emit(">((");
             emitOperand(coopVecMatMulAdd->getInput(), getInfo(EmitOp::General));
             m_writer->emit("), (CUdeviceptr)(&((");
             emitOperand(coopVecMatMulAdd->getMatrixPtr(), getInfo(EmitOp::General));
             m_writer->emit("))), ");
             emitOperand(coopVecMatMulAdd->getMatrixOffset(), getInfo(EmitOp::General));
+            if (!isStructuredBufferMatrix)
+            {
+                m_writer->emit(", ");
+                emitOperand(coopVecMatMulAdd->getTranspose(), getInfo(EmitOp::General));
+            }
             m_writer->emit(", (CUdeviceptr)(&((");
             emitOperand(coopVecMatMulAdd->getBiasPtr(), getInfo(EmitOp::General));

Verify that the OptiX wrapper template slangOptixCoopVecMatMul with bias parameters accepts a transpose argument:

#!/bin/bash
# Search for the slangOptixCoopVecMatMul template overload that takes bias parameters
rg -n "slangOptixCoopVecMatMul" --type cpp -A 20 | head -100

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cbac7f26-7b5a-4d9e-bdec-2c96ff7c33a4

📥 Commits

Reviewing files that changed from the base of the PR and between 5995e4f and 10f5ad3.

📒 Files selected for processing (32)
  • include/slang.h
  • prelude/slang-cuda-prelude.h
  • source/core/slang-type-text-util.cpp
  • source/slang-wasm/slang-wasm-bindings.cpp
  • source/slang/hlsl.meta.slang
  • source/slang/slang-emit-c-like.cpp
  • source/slang/slang-emit-cuda.cpp
  • source/slang/slang-emit-hlsl.cpp
  • source/slang/slang-emit-hlsl.h
  • source/slang/slang-emit-spirv.cpp
  • source/slang/slang-emit.cpp
  • source/slang/slang-ir-insts-stable-names.lua
  • source/slang/slang-ir-insts.lua
  • source/slang/slang-ir-validate.cpp
  • source/slang/slang-ir-validate.h
  • source/slang/slang-ir.cpp
  • source/slang/slang-ir.h
  • source/slang/slang-reflection-api.cpp
  • source/slang/slang-reflection-json.cpp
  • tests/cooperative-matrix/mat-mul-add-cuda-codegen.slang
  • tests/cooperative-vector/matrix-mul-hlsl-codegen.slang
  • tests/cooperative-vector/matrix-mul-spirv-codegen.slang
  • tests/cooperative-vector/training-cuda-codegen.slang
  • tests/cooperative-vector/training-hlsl-codegen.slang
  • tests/cooperative-vector/training-spirv-codegen.slang
  • tests/cuda/optix-coopvec-packed-input-diagnostic.slang
  • tests/cuda/optix-coopvec-transpose-diagnostic.slang
  • tests/cuda/optix-coopvec.slang
  • tools/gfx/slang.slang
  • tools/render-test/shader-input-layout.cpp
  • tools/slang-test/slang-test-main.cpp
  • tools/slang-unit-test/unit-test-special-scalar-reflection.cpp

github-actions[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (14)
source/core/slang-type-text-util.cpp (1)

28-31: ⚠️ Potential issue | 🟠 Major

Add missing pointer-sized scalar mappings to keep round-trip behavior complete.

SLANG_SCALAR_TYPES still omits IntPtr/UIntPtr, so scalar-name conversion is incomplete for pointer-sized integer scalar types.

Proposed fix
 `#define` SLANG_SCALAR_TYPES(x) \
     x(None, none) \
     x(Void, void) \
     x(Bool, bool) \
     x(Float16, half) \
     x(UInt8, uint8_t) \
     x(Int8, int8_t) \
     x(UInt16, uint16_t) \
     x(Int16, int16_t) \
     x(UInt32, uint32_t) \
     x(Int32, int32_t) \
     x(Int64, int64_t) \
     x(UInt64, uint64_t) \
+    x(IntPtr, intptr_t) \
+    x(UIntPtr, uintptr_t) \
     x(Float32, float) \
     x(Float64, double) \
     x(BFloat16, bfloat16) \
     x(FloatE4M3, float_e4m3) \
     x(FloatE5M2, float_e5m2)
source/slang/slang-ir.cpp (1)

9013-9015: 🧹 Nitpick | 🔵 Trivial

Document why accumulate cooperative ops are intentionally excluded from this side-effect-free list.

Line 9013-Line 9015 look correct for pure value-producing ops, but this block is easy to misread as incomplete. Please add a brief comment clarifying that kIROp_CoopVecOuterProductAccumulate and kIROp_CoopVecReduceSumAccumulate are intentionally excluded because they write to memory and must retain side effects.

As per coding guidelines, source/slang/**: “IR pass correctness — ensure SSA form and type invariants are maintained.”

tests/cooperative-matrix/mat-mul-add-cuda-codegen.slang (1)

1-17: 🧹 Nitpick | 🔵 Trivial

Strengthen the FileCheck assertion to validate template arguments.

The current CHECK pattern only verifies the function symbol exists but doesn't validate the critical template arguments (shape parameters and saturating flag). Since the shader specifically uses 16 x 16 x 16 dimensions with saturate=false, the test would still pass even if CUDA emission dropped or mangled these parameters.

Consider tightening the pattern:

-// CHECK: Slang_CUDA_WMMA::coopMatMulAdd<
+// CHECK: Slang_CUDA_WMMA::coopMatMulAdd<{{.*}}, 16, 16, 16, false>(

As per coding guidelines, "tests/**: Check that expected outputs match the intended behavior, not just current behavior."

tests/cooperative-vector/matrix-mul-spirv-codegen.slang (1)

1-41: 🧹 Nitpick | 🔵 Trivial

Consider adding a mixed signedness test case for better coverage.

The test currently only exercises the signed+signed combination (int8_t input with int32_t result), which always produces the same operand mask (MatrixBSignedComponentsKHR|MatrixResultSignedComponentsKHR). The SPIR-V emitter's operand mask logic handles four independent signed flags, but only one combination is validated.

Adding a case with unsigned input and signed result (or vice versa) would improve coverage:

// Unsigned input, signed result → only MatrixResultSignedComponentsKHR
CoopVec<uint8_t, 4> uvec = coopVecLoad<4, uint8_t>(input);
let mixedResult = coopVecMatMul<int32_t, 4, 4>(
    uvec, CoopVecComponentType::UnsignedInt8, ...);
// CHECK: OpCooperativeVectorMatrixMulNV {{.*}} MatrixResultSignedComponentsKHR

As per coding guidelines, "tests/**: Ensure new features have corresponding tests."

tests/cooperative-vector/training-spirv-codegen.slang (1)

1-38: 🧹 Nitpick | 🔵 Trivial

Consider strengthening SPIR-V checks to validate operands.

The test only verifies the capability and instruction opcodes exist but doesn't validate critical operand values like offsets, stride, or the encoded layout/component type constants. This means regressions in operand lowering could go undetected.

Consider extending the CHECK patterns to validate at least some operands:

// CHECK: OpCooperativeVectorOuterProductAccumulateNV {{%[0-9]+}} {{%[0-9]+}} {{%[0-9]+}} {{%[0-9]+}} %int_0
// CHECK: OpCooperativeVectorReduceSumAccumulateNV {{%[0-9]+}} {{%[0-9]+}} %int_0

As per coding guidelines, "tests/**: Check that expected outputs match the intended behavior, not just current behavior."

tests/cuda/optix-coopvec-packed-input-diagnostic.slang (1)

11-27: ⚠️ Potential issue | 🟠 Major

Add packed-input diagnostic coverage for coopVecMatMulAddPacked.

This test only exercises coopVecMatMulPacked; a regression in coopVecMatMulAddPacked packed-input rejection would go uncaught. Please add a second diagnostic case (same file or sibling test) that expects the multiply-add packed-input unsupported diagnostic.

As per coding guidelines, tests/**: “Ensure new features have corresponding tests.”

source/slang/slang-emit-spirv.cpp (2)

9013-9015: ⚠️ Potential issue | 🟠 Major

Normalize cooperative-op metadata before cast<>ing it.

These paths assume raw IRIntLit/IRBoolLit, but same-file constant handling already has to unwrap IRGlobalValueRef before reading literal values. If a wrapped or dynamic metadata operand reaches SPIR-V emission, this turns a user error into an assertion/null-deref instead of a diagnostic. Please unwrap once and diagnose unsupported non-constant metadata for saturatingAccumulation, coop-vector interpretations, inputInterpretationIsPacked, k, and transpose.

As per coding guidelines, source/slang/**: Null pointer safety and proper error handling via diagnostics.

Also applies to: 9136-9153, 9177-9178, 9221-9222


9180-9187: ⚠️ Potential issue | 🟠 Major

Build the cooperative-vector signedness mask from the interpretation operands.

operandsMask is derived from the storage element types of the input/result coop-vectors, while the instruction encodes explicit inputInterpretation / matrixInterpretation / biasInterpretation operands. Packed or reinterpreted cases can make those disagree, so the component-type operands and signedness bits end up describing different math. The mask should be computed from the same interpretation operands that are emitted into the SPIR-V instruction.

As per coding guidelines, source/slang/**: IR pass correctness — ensure SSA form and type invariants are maintained.

Also applies to: 9224-9231

source/slang/slang-emit-cuda.cpp (1)

1238-1362: ⚠️ Potential issue | 🟠 Major

Forward transpose in the OptiX CoopVecMatMulAdd emission.

Lines 1303-1319 validate getTranspose(), but Lines 1336-1361 never serialize that operand into the emitted slangOptixCoopVecMatMul<...> call. That means transposed kIROp_CoopVecMatMulAdd requests are ignored. This path should mirror kIROp_CoopVecMatMul and only omit transpose for the StructuredBuffer overload.

As per coding guidelines, source/slang/**: "(6) Cross-backend consistency — changes to one emitter may need parallel changes in others."

source/slang/slang-emit-hlsl.cpp (1)

32-101: ⚠️ Potential issue | 🟠 Major

Don’t encode unsupported coop-vector tags as 0.

Lines 45-47, 79-81, and 97-99 all use 0 as the fallback mapping value, and Line 678 does the same after diagnosing BFloat16. The new __builtin_* emitters then keep serializing those forged tags into HLSL; for matrix layouts this is especially dangerous because 0 is already RowMajor. Please return success/failure from these helpers and have the statement emitters stop after the diagnostic instead of continuing with a bogus enum value.

As per coding guidelines, source/slang/**: "(5) Null pointer safety and proper error handling via diagnostics."

Also applies to: 667-698, 887-1031

prelude/slang-cuda-prelude.h (1)

6483-6485: ⚠️ Potential issue | 🔴 Critical

Fix the OptiX 9.0 trait guard before instantiating these wrappers.

The adjacent guard at Line 6472 still uses OPTIX_VERSION > 90000, but all three overloads compile for OPTIX_VERSION >= 90000 and immediately dereference OptixCoopVecTraits on Line 6493/6494, Line 6523/6524, and Line 6553/6554. OptiX 9.0 therefore loses the only specialization and these wrappers stop compiling.

Suggested fix
-#if defined(OPTIX_VERSION) && OPTIX_VERSION > 90000
+#if defined(OPTIX_VERSION) && OPTIX_VERSION >= 90000
 template<typename T, unsigned int N>
 struct OptixCoopVecTraits<OptixCoopVec<T, N>>
 {
     static constexpr unsigned int size = N;
 };
 `#endif`

As per coding guidelines, prelude/**: Built-in language definitions and intrinsics. Changes here affect all Slang programs. Verify backward compatibility and check that all target backends handle new intrinsics.

#!/bin/bash
set -euo pipefail

echo "Version guards and trait uses around the cooperative-vector wrappers:"
rg -n 'OPTIX_VERSION > 90000|OPTIX_VERSION >= 90000|OptixCoopVecTraits<OptixCoopVec|OptixCoopVecTraits<VecT' prelude/slang-cuda-prelude.h

echo
echo "Relevant source window:"
sed -n '6462,6566p' prelude/slang-cuda-prelude.h

Expected result: the trait specialization remains guarded by > 90000 while the three wrapper overloads using OptixCoopVecTraits are compiled under >= 90000.

Also applies to: 6511-6514, 6544-6546

source/slang/slang-ir-validate.cpp (1)

687-704: ⚠️ Potential issue | 🟠 Major

Guard operand/type dereferences before validating.

getOperandCount() does not prove every operand slot is populated, and this file already tolerates null operands in malformed IR. The direct ->getDataType() / ->getOp() chains here can crash the validator instead of emitting IrValidationFailed.

As per coding guidelines, "Null pointer safety and proper error handling via diagnostics."

Also applies to: 793-802, 894-903, 1001-1028, 1048-1060

source/slang/hlsl.meta.slang (2)

30950-31096: ⚠️ Potential issue | 🟠 Major

Don't erase packedness from matrixInterpretation and biasInterpretation.

__getCoopVecComponentScalarType() still collapses SignedInt8Packed/UnsignedInt8Packed into the same scalar IDs as unpacked int8/uint8, while the new __coopVecMatMul* IR surface only preserves packedness for inputInterpretation. Packed matrix/bias operands will therefore lower indistinguishably from unpacked ones. Please thread separate packed flags for matrix/bias too, or reject packed matrix/bias before calling these intrinsics. As per coding guidelines, source/slang/**: "IR pass correctness — ensure SSA form and type invariants are maintained."

Also applies to: 31720-31749


31720-31767: ⚠️ Potential issue | 🟠 Major

Keep the pointer path storage-erased like the buffer path.

The structured-buffer overloads intentionally erase backing element types via Ignored*BufferElementType, but the new array/pointer helpers below immediately retype raw storage as Ptr<T[]>. In mixed-precision cases like half×half→float or int8×int8→int32, that pointee type no longer matches the actual memory layout handed to the IR op. Carry an explicit storage element type on the pointer forms too, or keep them raw/erased. As per coding guidelines, source/slang/**: "IR pass correctness — ensure SSA form and type invariants are maintained."

Also applies to: 31952-32116


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: caac342d-47f8-47ab-852a-b3d5c1570856

📥 Commits

Reviewing files that changed from the base of the PR and between 10f5ad3 and 16e8924.

📒 Files selected for processing (32)
  • include/slang.h
  • prelude/slang-cuda-prelude.h
  • source/core/slang-type-text-util.cpp
  • source/slang-wasm/slang-wasm-bindings.cpp
  • source/slang/hlsl.meta.slang
  • source/slang/slang-emit-c-like.cpp
  • source/slang/slang-emit-cuda.cpp
  • source/slang/slang-emit-hlsl.cpp
  • source/slang/slang-emit-hlsl.h
  • source/slang/slang-emit-spirv.cpp
  • source/slang/slang-emit.cpp
  • source/slang/slang-ir-insts-stable-names.lua
  • source/slang/slang-ir-insts.lua
  • source/slang/slang-ir-validate.cpp
  • source/slang/slang-ir-validate.h
  • source/slang/slang-ir.cpp
  • source/slang/slang-ir.h
  • source/slang/slang-reflection-api.cpp
  • source/slang/slang-reflection-json.cpp
  • tests/cooperative-matrix/mat-mul-add-cuda-codegen.slang
  • tests/cooperative-vector/matrix-mul-hlsl-codegen.slang
  • tests/cooperative-vector/matrix-mul-spirv-codegen.slang
  • tests/cooperative-vector/training-cuda-codegen.slang
  • tests/cooperative-vector/training-hlsl-codegen.slang
  • tests/cooperative-vector/training-spirv-codegen.slang
  • tests/cuda/optix-coopvec-packed-input-diagnostic.slang
  • tests/cuda/optix-coopvec-transpose-diagnostic.slang
  • tests/cuda/optix-coopvec.slang
  • tools/gfx/slang.slang
  • tools/render-test/shader-input-layout.cpp
  • tools/slang-test/slang-test-main.cpp
  • tools/slang-unit-test/unit-test-special-scalar-reflection.cpp

github-actions[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (12)
source/slang/slang-reflection-json.cpp (1)

459-473: ⚠️ Potential issue | 🟠 Major

Include IntPtr/UIntPtr in scalar JSON mapping

This switch still omits pointer-sized integer scalar kinds, so they can fall back to "unknown" (and trigger the debug assert path).

Suggested patch
         CASE(Int64, int64);
         CASE(UInt64, uint64);
+        CASE(IntPtr, intptr);
+        CASE(UIntPtr, uintptr);

         CASE(Float16, float16);
         CASE(Float32, float32);
         CASE(Float64, float64);
         CASE(BFloat16, bfloat16);

As per coding guidelines, "Cross-backend consistency — changes to one emitter may need parallel changes in others."

source/slang/slang-ir.h (1)

2135-2137: ⚠️ Potential issue | 🔴 Critical

Add explicit module-version bounds checks in deserialize path

Line 2136 bumps the max IR module version, but deserialization still appears to accept m_version without enforcing [k_minSupportedModuleVersion, k_maxSupportedModuleVersion] before consuming IR. Please reject out-of-range versions with a diagnostic in the read path (readSerializedModuleInfo/readSerializedModuleIR_).

As per coding guidelines, "Null pointer safety and proper error handling via diagnostics."

source/slang/slang-ir.cpp (1)

9013-9015: 🧹 Nitpick | 🔵 Trivial

Re-raise: clarify intentional omission of accumulate coop ops.

Please add a short comment here stating that kIROp_CoopVecOuterProductAccumulate and kIROp_CoopVecReduceSumAccumulate are intentionally excluded because they write memory and must remain side-effecting.

Suggested patch
+    // `kIROp_CoopVecOuterProductAccumulate` and `kIROp_CoopVecReduceSumAccumulate`
+    // are intentionally excluded: they write to memory and must be side-effecting.
     case kIROp_CoopMatMulAdd:
     case kIROp_CoopVecMatMul:
     case kIROp_CoopVecMatMulAdd:

As per coding guidelines, "IR pass correctness — ensure SSA form and type invariants are maintained."

source/core/slang-type-text-util.cpp (1)

14-31: ⚠️ Potential issue | 🟠 Major

IntPtr/UIntPtr are still missing from scalar text mappings.

SLANG_SCALAR_TYPES (Line 14) is the single source for both findScalarType() and getScalarTypeName(). IntPtr/UIntPtr are present in the public scalar enum but still not mapped here, so intptr_t/uintptr_t won’t round-trip through text utilities.

🔧 Proposed fix
 `#define` SLANG_SCALAR_TYPES(x) \
     x(None, none) \
     x(Void, void) \
     x(Bool, bool) \
     x(Float16, half) \
     x(UInt8, uint8_t) \
     x(Int8, int8_t) \
     x(UInt16, uint16_t) \
     x(Int16, int16_t) \
     x(UInt32, uint32_t) \
     x(Int32, int32_t) \
     x(Int64, int64_t) \
     x(UInt64, uint64_t) \
+    x(IntPtr, intptr_t) \
+    x(UIntPtr, uintptr_t) \
     x(Float32, float) \
     x(Float64, double) \
     x(BFloat16, bfloat16) \
     x(FloatE4M3, float_e4m3) \
     x(FloatE5M2, float_e5m2)
As per coding guidelines `source/core/**`: "Core utilities shared across the compiler. Check for: ... (3) Platform portability (Windows, Linux, macOS)."
tests/cooperative-vector/training-spirv-codegen.slang (1)

3-5: ⚠️ Potential issue | 🟠 Major

The SPIR-V check is still too weak.

These assertions only prove that the training capability and opcodes appear. They still won't catch regressions in the encoded offset/stride/layout/component operands, and they also don't assert the base CooperativeVectorNV capability that becomes necessary once OpTypeCooperativeVectorNV is emitted.

As per coding guidelines, "tests/**: Verify test correctness. Tests use //TEST directives: COMPARE_COMPUTE for GPU compute tests, INTERPRET for CPU interpreter tests. Ensure new features have corresponding tests. Check that expected outputs match the intended behavior, not just current behavior."

tests/cuda/optix-coopvec-packed-input-diagnostic.slang (1)

11-27: ⚠️ Potential issue | 🟠 Major

Add the packed MatMulAdd diagnostic case too.

This file only exercises coopVecMatMulPacked, but the OptiX CUDA path has a parallel packed-input rejection for coopVecMatMulAddPacked as well. If the add variant stops diagnosing packed input, this test still passes.

As per coding guidelines, "tests/**: Verify test correctness. Tests use //TEST directives: COMPARE_COMPUTE for GPU compute tests, INTERPRET for CPU interpreter tests. Ensure new features have corresponding tests. Check that expected outputs match the intended behavior, not just current behavior."

tests/cooperative-vector/matrix-mul-hlsl-codegen.slang (1)

5-8: ⚠️ Potential issue | 🟠 Major

These checks still can't detect a swapped elementCount/k.

Every expectation uses int(4), int(4), so an emission bug that flips those two arguments still passes FileCheck. Make at least one matmul and one matmul-add case use distinct values so the argument order is actually verified.

As per coding guidelines, "tests/**: Verify test correctness. Tests use //TEST directives: COMPARE_COMPUTE for GPU compute tests, INTERPRET for CPU interpreter tests. Ensure new features have corresponding tests. Check that expected outputs match the intended behavior, not just current behavior."

tests/cooperative-vector/training-cuda-codegen.slang (1)

3-4: ⚠️ Potential issue | 🟡 Minor

Loosen the literal matching in these CHECKs.

These assertions still depend on one pretty-printing of the offset and stride. A harmless change from int(0) to 0, or from 32U to 32, would fail the test without changing the generated call semantics.

♻️ Suggested CHECK update
-// CHECK: optixCoopVecOuterProductAccumulate({{.*}}, {{.*}}, (CUdeviceptr)(&({{.*}})), int(0), 32U)
-// CHECK: optixCoopVecReduceSumAccumulate({{.*}}, (CUdeviceptr)(&({{.*}})), int(0))
+// CHECK: optixCoopVecOuterProductAccumulate({{.*}}, {{.*}}, (CUdeviceptr)(&({{.*}})), {{(int\()?0\)?}}, {{32U?}})
+// CHECK: optixCoopVecReduceSumAccumulate({{.*}}, (CUdeviceptr)(&({{.*}})), {{(int\()?0\)?}})

As per coding guidelines, tests/**: “Check that expected outputs match the intended behavior, not just current behavior.”

tests/cooperative-vector/matrix-mul-spirv-codegen.slang (1)

15-38: ⚠️ Potential issue | 🟡 Minor

Add a mixed-signedness operand-mask case.

This setup only exercises the fully signed path, so it never validates the operand-mask logic when input, matrix, bias, and result signedness differ. A bug there would still pass this test. A second case like unsigned input with signed result would cover the missing branch.

As per coding guidelines, tests/**: “Ensure new features have corresponding tests.”

source/slang/slang-emit-spirv.cpp (2)

9132-9163: ⚠️ Potential issue | 🟡 Minor

Unwrap constant refs before remapping coop-vector enums.

These helpers still assume raw IRIntLit/IRBoolLit. If legalization leaves a wrapped constant here (for example via IRGlobalValueRef), SPIR-V emission hits the cast even though the operand is compile-time constant. Normalize once up front so the mapped enum operands and packed flag follow the same constant-handling path as the rest of the emitter.

Patch sketch
+    IRInst* unwrapCoopVecEnumOperand(IRInst* operand)
+    {
+        if (auto globalValueRef = as<IRGlobalValueRef>(operand))
+            return globalValueRef->getValue();
+        return operand;
+    }
+
     void emitMappedCoopVecMatrixLayoutOperand(IRInst* operand)
     {
+        operand = unwrapCoopVecEnumOperand(operand);
         auto intLit = cast<IRIntLit>(operand);
         emitOperand(emitIntConstant(
             mapSlangCoopVecMatrixLayoutToSpv(intLit->getValue()),
             operand->getDataType()));
     }
@@
     void emitMappedCoopVecComponentTypeOperand(
         IRInst* operand,
         IRInst* inputInterpretationIsPacked = nullptr)
     {
+        operand = unwrapCoopVecEnumOperand(operand);
         auto intLit = cast<IRIntLit>(operand);
 
         bool isPacked = false;
         if (inputInterpretationIsPacked)
         {
-            isPacked = cast<IRBoolLit>(inputInterpretationIsPacked)->getValue();
+            auto packedOperand = unwrapCoopVecEnumOperand(inputInterpretationIsPacked);
+            isPacked = cast<IRBoolLit>(packedOperand)->getValue();
         }

As per coding guidelines, source/slang/**: IR pass correctness — ensure SSA form and type invariants are maintained.


9177-9184: ⚠️ Potential issue | 🟠 Major

Build the cooperative-vector signedness mask from the interpretation operands.

operandsMask is still derived from the storage element types, while the instruction explicitly emits inputInterpretation, matrixInterpretation, and biasInterpretation. Packed/reinterpreted paths can make those disagree, and biasInterpretation never contributes to MatrixC signedness here. Use the interpretation operands as the source of truth so the enum operands and signedness bits describe the same math.

As per coding guidelines, source/slang/**: IR pass correctness — ensure SSA form and type invariants are maintained.

Also applies to: 9221-9228

prelude/slang-cuda-prelude.h (1)

6483-6504: ⚠️ Potential issue | 🔴 Critical

Keep the OptixCoopVecTraits specialization visible in OptiX 9.0.

Line 6472 still guards the only OptixCoopVecTraits<OptixCoopVec<T, N>> specialization with OPTIX_VERSION > 90000, while all three wrappers here are enabled from OPTIX_VERSION >= 90000. On OptiX 9.0, OptixCoopVecTraits<VecTOut>::size / OptixCoopVecTraits<VecTIn>::size therefore remain unspecialized, so any instantiation of these overloads fails.

Suggested fix
-#if defined(OPTIX_VERSION) && OPTIX_VERSION > 90000
+#if defined(OPTIX_VERSION) && OPTIX_VERSION >= 90000
 template<typename T, unsigned int N>
 struct OptixCoopVecTraits<OptixCoopVec<T, N>>
 {
     static constexpr unsigned int size = N;
 };
 `#endif`

As per coding guidelines, prelude/**: "Built-in language definitions and intrinsics. Changes here affect all Slang programs. Verify backward compatibility and check that all target backends handle new intrinsics."

#!/bin/bash
set -euo pipefail

sed -n '6464,6566p' prelude/slang-cuda-prelude.h

Also applies to: 6511-6536, 6544-6565


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b0b9e224-0c88-4d6b-beb7-95d796faec58

📥 Commits

Reviewing files that changed from the base of the PR and between 16e8924 and c6f8975.

📒 Files selected for processing (32)
  • include/slang.h
  • prelude/slang-cuda-prelude.h
  • source/core/slang-type-text-util.cpp
  • source/slang-wasm/slang-wasm-bindings.cpp
  • source/slang/hlsl.meta.slang
  • source/slang/slang-emit-c-like.cpp
  • source/slang/slang-emit-cuda.cpp
  • source/slang/slang-emit-hlsl.cpp
  • source/slang/slang-emit-hlsl.h
  • source/slang/slang-emit-spirv.cpp
  • source/slang/slang-emit.cpp
  • source/slang/slang-ir-insts-stable-names.lua
  • source/slang/slang-ir-insts.lua
  • source/slang/slang-ir-validate.cpp
  • source/slang/slang-ir-validate.h
  • source/slang/slang-ir.cpp
  • source/slang/slang-ir.h
  • source/slang/slang-reflection-api.cpp
  • source/slang/slang-reflection-json.cpp
  • tests/cooperative-matrix/mat-mul-add-cuda-codegen.slang
  • tests/cooperative-vector/matrix-mul-hlsl-codegen.slang
  • tests/cooperative-vector/matrix-mul-spirv-codegen.slang
  • tests/cooperative-vector/training-cuda-codegen.slang
  • tests/cooperative-vector/training-hlsl-codegen.slang
  • tests/cooperative-vector/training-spirv-codegen.slang
  • tests/cuda/optix-coopvec-packed-input-diagnostic.slang
  • tests/cuda/optix-coopvec-transpose-diagnostic.slang
  • tests/cuda/optix-coopvec.slang
  • tools/gfx/slang.slang
  • tools/render-test/shader-input-layout.cpp
  • tools/slang-test/slang-test-main.cpp
  • tools/slang-unit-test/unit-test-special-scalar-reflection.cpp

Copy link
Copy Markdown
Collaborator

@csyonghe csyonghe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we also need a new test to cover the metadata reporting is correct.

Copy link
Copy Markdown
Collaborator

@jkwak-work jkwak-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job overall.
I left a few comments.

@jkwak-work
Copy link
Copy Markdown
Collaborator

@cmarcelo , can you share your update or eta for this PR?
I have other PRs that should be rebased on this PR.

Please let us know if you need helps.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

@jkwak-work
Copy link
Copy Markdown
Collaborator

I see the following compilation error from slangpy test:

C:\actions-runner\_work\slangpy\slangpy\slang\external\slang-rhi\src\wgpu\wgpu-shader-object-layout.cpp(55): error C2220: the following warning is treated as an error
C:\actions-runner\_work\slangpy\slangpy\slang\external\slang-rhi\src\wgpu\wgpu-shader-object-layout.cpp(55): warning C4062: enumerator 'slang::TypeReflection::IntPtr' in switch of enum 'slang::TypeReflection::ScalarType' is not handled
C:\actions-runner\_work\slangpy\slangpy\slang\include\slang.h(2356): note: see declaration of 'slang::TypeReflection::ScalarType'
C:\actions-runner\_work\slangpy\slangpy\slang\external\slang-rhi\src\wgpu\wgpu-shader-object-layout.cpp(55): warning C4062: enumerator 'slang::TypeReflection::UIntPtr' in switch of enum 'slang::TypeReflection::ScalarType' is not handled

@cmarcelo , can you address this problem?

@jkwak-work
Copy link
Copy Markdown
Collaborator

Falcor test shows the following compilation error as well:

C:\actions-runner-2\_work\slang\slang\source\slang\slang-emit-cuda.cpp(828): error C2220: the following warning is treated as an error
C:\actions-runner-2\_work\slang\slang\source\slang\slang-emit-cuda.cpp(828): warning C4189: 'coopVecMatMulAdd': local variable is initialized but not referenced

@jkwak-work
Copy link
Copy Markdown
Collaborator

Things I need help:

  • Get some confirmation that this is the public API we want for Add API to list coopMat/coopVec types and combinations #10076. In particular my understanding is that we need a new separate struct otherwise it wouldn't be ABI compatible according to COM standards. Is there someone we should ping here about this?

I think you are referring to my comment here.
And I think what you implement is good enough.
As you pointed out, if we modify and add a new funciton, it will break the backward-compatibility.

  • slang-rhi fails compilation due to promoting warnings to errors and the fact it doesn't yet handle IntPtr and other types in some switches. So there's a bit of a "practical" circular dependency in the CI here. My current idea is to see if we can make slang-rhi code OK with the addition of new Enum values, so that we are able to land this. However, if there's a different way to sort this out let me know.

I will pull and build to reproduce the complation error you are talking about.
It is unclear from the CI result what the problem is.

jkwak-work added a commit to jkwak-work/slang-rhi that referenced this pull request Mar 30, 2026
We are trying to add more types to ScalarType: IntPtr, UIntPtr and
BFloat16; in shader-slang/slang#10643
And slang-rhi build prints a warning that the new types are not handled
in a switch-statement for WGPU.

Because WGPU cannot represent those types, this PR converts the new
types as Undefined and avoid the compilation warnings, which is treated
an errors on CI machines.
@jkwak-work
Copy link
Copy Markdown
Collaborator

I have a fix on slang-rhi side build warning:

jkwak-work added a commit to shader-slang/slang-rhi that referenced this pull request Mar 30, 2026
We are trying to add more types to ScalarType: IntPtr, UIntPtr and
BFloat16; in shader-slang/slang#10643
And slang-rhi build prints a warning that the new types are not handled
in a switch-statement for WGPU.

Because WGPU cannot represent those types, this PR converts the new
types as Undefined and avoid the compilation warnings, which is treated
an errors on CI machines.
github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

@jkwak-work
Copy link
Copy Markdown
Collaborator

jkwak-work commented Mar 31, 2026

@cmarcelo , can you merge my change to fix the compiler warning and the failing tests?

@cmarcelo
Copy link
Copy Markdown
Contributor Author

@cmarcelo , can you merge my change to fix the compiler warning and the failing tests?

* [Fix SPIRV validation: use array ptr for cooperative vector ops cmarcelo/slang#2](https://github.com/cmarcelo/slang/pull/2)

Done.

github-actions[bot]

This comment was marked as outdated.

cmarcelo and others added 5 commits March 31, 2026 09:32
OpCooperativeVectorMatrixMulNV (and related NV ops) require the matrix/buffer
operand to be a pointer to an array type. After SPIRV legalization,
ByteAddressBuffer/StructuredBuffer global params become ptr-to-struct, which
fails the SPIRV validator with "Pointer's Type must be an array type".

Add emitBufferPtrAsArrayPtr() that emits OpAccessChain index 0 to pierce through
the wrapper struct when needed. For Ptr<T[]> inputs that already point directly to
an unsized array, the pointer is returned as-is.

Apply the helper in emitCoopVecMatMulAdd (matrix and bias operands),
emitCoopVecOuterProductAccumulate (matrix operand), and
emitCoopVecReduceSumAccumulate (buffer operand).
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: ✅ Clean — no significant issues found. 2 minor gaps noted below.

This PR replaces inline __target_switch / spirv_asm / __intrinsic_asm blocks in hlsl.meta.slang with 4 new IR instructions (CoopMatMulAdd, CoopVecMatMulAdd, CoopVecOuterProductAccumulate, CoopVecReduceSumAccumulate) and moves target-specific codegen into C++ emitters. The architecture is sound — enum mappings are centralized through SLANG_SCALAR_TYPE_* / SLANG_COOPERATIVE_* constants, side-effect analysis correctly marks the two accumulation ops as having side effects while the two multiply ops are pure, and a new ~400-line IR validation pass catches malformed cooperative operations before they reach backend emitters.

Changes Overview

New IR Instructions (slang-ir-insts.lua, slang-ir-insts-stable-names.lua, slang-ir.cpp, slang-ir.h)

  • What changed: 4 new instruction definitions with named operands (including optional bias params for CoopVecMatMulAdd), stable name entries 779–782, mightHaveSideEffects() updated to mark CoopMatMulAdd/CoopVecMatMulAdd as side-effect-free, module version bumped 12→13.

Standard Library Refactor (hlsl.meta.slang)

  • What changed: Removed ~500 lines of per-target __target_switch / spirv_asm / __intrinsic_asm blocks for cooperative matrix/vector multiply, outer product, and reduce-sum operations. Replaced with __intrinsic_op declarations that lower to the new IR instructions. Added __getCoopVecComponentScalarType to map CoopVecComponentType to SLANG_SCALAR_TYPE_*. Renamed __inputInterpretationPackingFactor__componentPackingFactor.

SPIRV Backend (slang-emit-spirv.cpp)

  • What changed: Added emitCoopMatMulAdd (KHR), emitCoopVecMatMulAdd (NV), emitCoopVecOuterProductAccumulate (NV), emitCoopVecReduceSumAccumulate (NV). Added emitBufferPtrAsArrayPtr helper for SPIRV buffer→array-ptr legalization. Added enum mapping functions (mapSlangCooperativeMatrixUseToSpv, mapSlangCoopVecMatrixLayoutToSpv, mapSlangCoopVecComponentTypeToSpv). Cooperative matrix use values now mapped through mapSlangCooperativeMatrixUseToSpv instead of passing raw enum values.

HLSL Backend (slang-emit-hlsl.cpp, slang-emit-hlsl.h)

  • What changed: Added handlers for CoopVecMatMulAdd (emitting __builtin_MatVecMul/__builtin_MatVecMulAdd), CoopVecOuterProductAccumulate (__builtin_OuterProductAccumulate), and CoopVecReduceSumAccumulate (__builtin_VectorAccumulate). Added _mapSlangCoopVecComponentTypeToHLSL and _mapSlangCoopVecMatrixLayoutToHLSL mapping functions.

CUDA/OptiX Backend (slang-emit-cuda.cpp, slang-cuda-prelude.h)

  • What changed: Removed runtime slangToOptixComponentType/slangToOptixMatrixLayout constexpr mappers from prelude — mapping now happens at compile time in the C++ emitter. Template parameters changed from unsigned to direct OptiX enum types. Added getOptixCoopVecComponentTypeName/getOptixCoopVecMatrixLayoutName mappers. Added handlers for all 4 IR ops with OptiX-specific constraint validation (training-optimal layout, float16 interpretation for outer product).

IR Validation (slang-ir-validate.cpp, slang-ir-validate.h, slang-emit.cpp)

  • What changed: Added validateCooperativeOperations pass (~400 lines) checking operand counts, type compatibility, dimension constraints, and input interpretation validity for all 4 new IR ops. Registered as SLANG_PASS(validateCooperativeOperations, sink) in the emit pipeline.

Public API & Reflection (slang.h, slang-reflection-api.cpp, slang-reflection-json.cpp, slang-type-text-util.cpp, slang-wasm-bindings.cpp)

  • What changed: Added SlangCooperativeMatrixUse and SlangCooperativeVectorMatrixLayout enums. Added SLANG_SCALAR_TYPE_BFLOAT16, SLANG_SCALAR_TYPE_FLOAT_E4M3, SLANG_SCALAR_TYPE_FLOAT_E5M2 scalar types. Extended TypeReflection::ScalarType with IntPtr, UIntPtr, BFloat16, FloatE4M3, FloatE5M2. Updated reflection API to recognize special scalar types.

Tests (10 new test files)

  • What changed: Added codegen tests for SPIRV, HLSL, CUDA across cooperative matrix and vector operations. Added 2 CUDA diagnostic tests for packed-input and transpose constraints. Added unit test for special scalar type reflection.
Findings (2 total)
Severity Location Finding
🟡 Gap hlsl.meta.slang:~30985 BFloat16 missing from __getCoopVecComponentScalarType — backend-specific diagnostics in SPIRV/HLSL emitters are unreachable dead code
🟡 Gap slang-ir-validate.cpp:~1196 Validation pass error paths lack negative/diagnostic test coverage

Copy link
Copy Markdown
Collaborator

@jkwak-work jkwak-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

It is a big change but I reviewed it multiple times over multiple days.

@jkwak-work jkwak-work added this pull request to the merge queue Mar 31, 2026
Merged via the queue into shader-slang:master with commit d22eebf Mar 31, 2026
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr: non-breaking PRs without breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants