Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined behavior of DML_OPERATOR_GATHER operator on Qualcomm Snapdragon X GPU and NPU #685

Open
SID37 opened this issue Jan 25, 2025 · 1 comment

Comments

@SID37
Copy link

SID37 commented Jan 25, 2025

I'm using an Asus ProArt PZ13 with Snapdragon X Plus - X1P-42-100, it has two DirectX-12 adapters -- Qualcomm Adreno X1-45 GPU and Qualcomm Hexagon NPU.

I run the example from the documentation (slightly modified), and get the correct results for NPU and GPU:

Axis: 1
IndexDimensions : 1

input tensor {3, 3}:
 [[1, 2, 3],
  [4, 5, 6],
  [7, 8, 9]]

 index tensor {1, 2}
 [[0, 2]]

 output[y, x] = input[y, indices[x]]
 output tensor {3, 2} :
 [[1, 3],
  [4, 6],
  [7, 9]]

But if I change the dimensionality of the index tensor to match the number of rows of the input tensor:

Axis: 1
IndexDimensions : 1

input tensor {3, 3}:
 [[1, 2, 3],
  [4, 5, 6],
  [7, 8, 9]]

 index tensor {3, 2}
 [[0, 2],
  [1, 0]]
  [2, 1],

 output[y, x] = input[y, indices[x]]
 output tensor {3, 2}

Then I get the following result on GPU:

[[1, 3],
 [5, 4],
 [9, 8]]

And the following result on NPU:

[[1, 3],
 [4, 6],
 [7, 9]]

I'm not sure which of the results is correct, the documentation is not very precise about that, but I tried using the index tensor {2, 2} and got error 0x80070057: The parameter is incorrect on both devices, which is more in line with GPU behavior. So I think this is an NPU bug

Attached are the JSON files for DxDispatch gather-examples.zip


Processor: Qualcomm Snapdragon X Plus - X1P-42-100
DirectML: 1.15.4.0
Hexagon NPU driver: 1.0.0.10
Adreno X1-45 GPU driver: v31.0.82.0

@SID37
Copy link
Author

SID37 commented Jan 25, 2025

Actually, I'm just trying to use the ArgMax result to sample tensor results by index. I need to do something like this:

    // ...
    dml::Graph graph(dml_device);

    auto value_tensor = dml::InputTensor(graph, 0, dml::TensorDesc(DML_TENSOR_DATA_TYPE_FLOAT32, { 64, 64 }));
    auto range_tensor = dml::InputTensor(graph, 1, dml::TensorDesc(DML_TENSOR_DATA_TYPE_FLOAT32, { 64, 64 }));

    auto index_tensor  = dml::ArgMax(range_tensor, { 1 });
    auto sample_tensor = dml::Gather(value_tensor, index_tensor, 1, 1);

    auto compiled_graph = graph.Compile(DML_EXECUTION_FLAG_NONE, { sample_tensor });

There is DML_OPERATOR_GATHER_ELEMENTS for this, but it doesn't work at all on NPU, it returns error 887A0004 The specified device interface or feature level is not supported on this system. So I tried using Gather and found that it works on GPU as I need it to, but on NPU the result is different

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant