[SYCL][DOC] Update SPV_INTEL_joint_matrix#12497
Conversation
The PR adds checked load/store and construct instructions Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
author: Levytskyy, Vyacheslav Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
c687190 to
e132331
Compare
There were incorrectly named and had incorrect operands. See intel#12497 Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
There were incorrectly named and had incorrect operands. See #12497 --------- Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
|
This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days. |
The spec is available here: intel/llvm#12497 The PR doesn't add OpCooperativeMatrixApplyFunctionINTEL instruction as it's still experimental and not properly tested E2E. The PR also fixes few bugs in the related code: 1. CooperativeMatrixMulAddKHR optional operand must be literal, not a constant; 2. Fixed available capabilities table creation for a case, when a single extension adds few capabilities, that occupy not contiguous op codes. Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
The spec is available here: intel/llvm#12497 The PR doesn't add OpCooperativeMatrixApplyFunctionINTEL instruction as it's still experimental and not properly tested E2E. The PR also fixes few bugs in the related code: 1. CooperativeMatrixMulAddKHR optional operand must be literal, not a constant; 2. Fixed available capabilities table creation for a case, when a single extension adds few capabilities, that occupy not contiguous op codes. --------- Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
|
This pull request was closed because it has been stalled for 30 days with no activity. |
Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
dkhaldi
left a comment
There was a problem hiding this comment.
LGTM with a minor comment
| |14|2023-10-11|Dmitry Sidorov|Add matrix prefetch instruction | ||
| |15|2023-11-06|Dmitry Sidorov|Put deprecation note on OpCooperativeMatrixGetElementCoordINTEL | ||
| |16|2023-11-06|Dmitry Sidorov|Add checked load, store and construct instructions | ||
| |17|2024-12-16|Dounia Khaldi|Add and store with offset |
There was a problem hiding this comment.
you missed "load" here --> "Add load and store with offset instructions".
bashbaug
left a comment
There was a problem hiding this comment.
The meaning of "checked" and "offset" isn't immediately apparent, so consider updating the Overview section to describe these new instructions.
Note that many of the comments for OpCooperativeMatrixLoadCheckedINTEL also apply to OpCooperativeMatrixStoreCheckedINTEL.
| instructions. + | ||
| + | ||
| | *{main_capability_name}* + | ||
|
|
There was a problem hiding this comment.
I think we're missing a description of CooperativeMatrixOffsetInstructionsINTEL here.
| Load a cooperative matrix through a pointer. Global matrix size might be not multiple the size of | ||
| the two-dimentional region that is being loaded, in this case the out-of-bounds elements are | ||
| set to 0. + |
There was a problem hiding this comment.
It would be useful to define what is meant by the "global matrix" somewhere.
It's not clear to me why being a multiple of the size of the 2D region being loaded is relevant (or not, in the case of this instruction). Isn't this really about going "off the edge", which could happen even if the block size evenly divides the matrix size, if the X offset or Y offset set appropriately.
Also, there's a minor typo here:
| Load a cooperative matrix through a pointer. Global matrix size might be not multiple the size of | |
| the two-dimentional region that is being loaded, in this case the out-of-bounds elements are | |
| set to 0. + | |
| Load a cooperative matrix through a pointer. Global matrix size might be not multiple the size of | |
| the two-dimensional region that is being loaded, in this case the out-of-bounds elements are | |
| set to 0. + |
| 'X offset' must be a scalar 32-bit integer type. It specifies offset in number of elements | ||
| along X axis from the 'Pointer' where the loaded memory region starts from. + | ||
| + | ||
| 'Y offset' must be a scalar 32-bit integer type. It specifies offset in number of elements | ||
| along Y axis from the 'Pointer' where the loaded memory region starts from. + |
There was a problem hiding this comment.
Note: the "offset" instructions use different names for these operands ("Rows Offset" and "Columns Offset"). More importantly, for the "offset" instructions the "Rows Offset" comes first, so I think the meaning is transposed compared to these "checked" instructions. There's no requirement that they match, but I think it would be less confusing to use a similar convention.
| 'Pointer' is a pointer. Its type must be an *OpTypePointer* whose 'Type' operand | ||
| is a scalar or vector type. If the *Shader* capability was declared, 'Pointer' | ||
| must point into an array and any *ArrayStride* decoration on 'Pointer' is ignored. + |
There was a problem hiding this comment.
Minor: Consider updating this text so it works with untyped pointers also (assuming that's what we want to do).
| 'Width' is the width (number of columns of a big matrix) of the two-dimensional | ||
| region to load the matrix from. It must be a scalar 'integer type'. + |
There was a problem hiding this comment.
We should describe the units for the width, specifically whether it is in units of bytes or elements (I think it's bytes?).
Add a new form of load/store operations for cooperative matrices that accepts two separate arguments: the row index and the column index. Unlike the original approach requiring a pointer to the matrix base, this new form of load/store operations is expected to yield better optimized code on 2dblock read/write instructions on PVC. CapabilityCooperativeMatrixOffsetInstructionsINTEL = 6238 OpCooperativeMatrixLoadOffsetINTEL = 6239 OpCooperativeMatrixStoreOffsetINTEL = 6240 Spec: intel/llvm#12497
Add a new form of load/store operations for cooperative matrices that accepts two separate arguments: the row index and the column index. Unlike the original approach requiring a pointer to the matrix base, this new form of load/store operations is expected to yield better optimized code on 2dblock read/write instructions on PVC. CapabilityCooperativeMatrixOffsetInstructionsINTEL = 6238 OpCooperativeMatrixLoadOffsetINTEL = 6239 OpCooperativeMatrixStoreOffsetINTEL = 6240 Spec: #12497 Original commit: KhronosGroup/SPIRV-LLVM-Translator@193661c352de3ca
Brand-new spec: intel/llvm#12497 Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com> Co-authored-by: Viktoria Maximova <viktoria.maksimova@intel.com>
Spec: intel/llvm#12497 Change summary: * added Packed matrix layout to support Intel VNNI instructions. * remove `JointMatrixGetElementCoord` in favor of the same cooperative matrix instruction. * support new Cooperative Matrix Operands to specify component type interpretation for tf32 and bfloat16 types.
Brand-new spec: #12497 Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com> Co-authored-by: Viktoria Maximova <viktoria.maksimova@intel.com> Original commit: KhronosGroup/SPIRV-LLVM-Translator@60d78aa6d1d98cb
|
Closing the PR. @vmaksimo please re-create it when updates to spec are ready. |
The PR adds checked load/store and construct instructions