Undefined behavior of DML_OPERATOR_GATHER operator on Qualcomm Snapdragon X GPU and NPU #685

SID37 · 2025-01-25T12:59:18Z

I'm using an Asus ProArt PZ13 with Snapdragon X Plus - X1P-42-100, it has two DirectX-12 adapters -- Qualcomm Adreno X1-45 GPU and Qualcomm Hexagon NPU.

I run the example from the documentation (slightly modified), and get the correct results for NPU and GPU:

Axis: 1
IndexDimensions : 1

input tensor {3, 3}:
 [[1, 2, 3],
  [4, 5, 6],
  [7, 8, 9]]

 index tensor {1, 2}
 [[0, 2]]

 output[y, x] = input[y, indices[x]]
 output tensor {3, 2} :
 [[1, 3],
  [4, 6],
  [7, 9]]

But if I change the dimensionality of the index tensor to match the number of rows of the input tensor:

Axis: 1
IndexDimensions : 1

input tensor {3, 3}:
 [[1, 2, 3],
  [4, 5, 6],
  [7, 8, 9]]

 index tensor {3, 2}
 [[0, 2],
  [1, 0]]
  [2, 1],

 output[y, x] = input[y, indices[x]]
 output tensor {3, 2}

Then I get the following result on GPU:

[[1, 3],
 [5, 4],
 [9, 8]]

And the following result on NPU:

[[1, 3],
 [4, 6],
 [7, 9]]

I'm not sure which of the results is correct, the documentation is not very precise about that, but I tried using the index tensor {2, 2} and got error 0x80070057: The parameter is incorrect on both devices, which is more in line with GPU behavior. So I think this is an NPU bug

Attached are the JSON files for DxDispatch gather-examples.zip

Processor: Qualcomm Snapdragon X Plus - X1P-42-100
DirectML: 1.15.4.0
Hexagon NPU driver: 1.0.0.10
Adreno X1-45 GPU driver: v31.0.82.0

The text was updated successfully, but these errors were encountered:

SID37 · 2025-01-25T13:16:40Z

Actually, I'm just trying to use the ArgMax result to sample tensor results by index. I need to do something like this:

    // ...
    dml::Graph graph(dml_device);

    auto value_tensor = dml::InputTensor(graph, 0, dml::TensorDesc(DML_TENSOR_DATA_TYPE_FLOAT32, { 64, 64 }));
    auto range_tensor = dml::InputTensor(graph, 1, dml::TensorDesc(DML_TENSOR_DATA_TYPE_FLOAT32, { 64, 64 }));

    auto index_tensor  = dml::ArgMax(range_tensor, { 1 });
    auto sample_tensor = dml::Gather(value_tensor, index_tensor, 1, 1);

    auto compiled_graph = graph.Compile(DML_EXECUTION_FLAG_NONE, { sample_tensor });

There is DML_OPERATOR_GATHER_ELEMENTS for this, but it doesn't work at all on NPU, it returns error 887A0004 The specified device interface or feature level is not supported on this system. So I tried using Gather and found that it works on GPU as I need it to, but on NPU the result is different

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undefined behavior of DML_OPERATOR_GATHER operator on Qualcomm Snapdragon X GPU and NPU #685

Undefined behavior of DML_OPERATOR_GATHER operator on Qualcomm Snapdragon X GPU and NPU #685

SID37 commented Jan 25, 2025 •

edited

Loading

SID37 commented Jan 25, 2025

Undefined behavior of DML_OPERATOR_GATHER operator on Qualcomm Snapdragon X GPU and NPU #685

Undefined behavior of DML_OPERATOR_GATHER operator on Qualcomm Snapdragon X GPU and NPU #685

Comments

SID37 commented Jan 25, 2025 • edited Loading

SID37 commented Jan 25, 2025

SID37 commented Jan 25, 2025 •

edited

Loading