Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another DxDispatch issue with QCOM Hexagon NPU on Windows (CompileOperator) #666

Open
fobrs opened this issue Nov 14, 2024 · 6 comments
Open

Comments

@fobrs
Copy link

fobrs commented Nov 14, 2024

Using NPU driver version 30.0.32.1000 ( 10/9/2024). on Snapdragon Dev Box

dxdispatch.exe .\models\dml_reduce.json -a Adreno
Running on 'Snapdragon(R) X Elite - X1E001DE - Qualcomm(R) Adreno(TM) GPU'
Compile Op
Dispatch 'sum': 1 iterations, 0.5781 ms median (CPU), 0.014063 ms median (GPU)
Resource 'input': 1, 2, 3, 4, 5, 6, 7, 8, 9
Resource 'output': 6, 15, 24

dxdispatch.exe..\models\dml_reduce.json -a Hexagon
Running on 'Qualcomm(R) Hexagon(TM) NPU'
Compile Op
Failed to execute the model: ERROR while initializing 'sum': C:\projects\DirectML\DxDispatch\src\dxdispatch\DmlDispatchable.cpp(404)\dxdispatchImpl.dll!00007FFA4C17BDB8: (caller: 00007FFA4C2B9884) Exception(1) tid(3754) 887A0004 The specified device interface or feature level is not supported on this system.
[DmlDispatchable::Initialize(m_device->DML()->CompileOperator( m_operator.Get(), dmlDesc.executionFlags, IID_PPV_ARGS(m_compiledOperator.ReleaseAndGetAddressOf())))]

dxdispatch.exe -s
Snapdragon(R) X Elite - X1E001DE - Qualcomm(R) Adreno(TM) GPU
-Version: 31.0.56.0
-Hardware: true
-Integrated: true
-Dedicated Adapter Memory: 0 bytes
-Dedicated System Memory: 0 bytes
-Shared System Memory: 15.79 GB
-D3D12_GRAPHICS: true
-CORE_COMPUTE: true
-GENERIC_ML: true

Qualcomm(R) Hexagon(TM) NPU
-Version: 30.0.32.1000
-Hardware: true
-Integrated: false
-Dedicated Adapter Memory: 0 bytes
-Dedicated System Memory: 0 bytes
-Shared System Memory: 15.79 GB
-D3D12_GRAPHICS: false
-CORE_COMPUTE: false
-GENERIC_ML: true

Microsoft Basic Render Driver
-Version: 10.0.26100.2314
-Hardware: false
-Integrated: false
-Dedicated Adapter Memory: 0 bytes
-Dedicated System Memory: 0 bytes

@ashumish-QCOM
Copy link

Hi @fobrs

Thank you for reporting the issue with the Qualcomm(R) Hexagon(TM) NPU on Windows. We have noted the error during the initialization of the 'sum' operation.

To address this, you might want to ensure that you are using the latest version of the DirectML library and that your system meets all the necessary hardware and software requirements. Additionally, you can try running the model with different execution flags or configurations to see if that resolves the issue.

@fobrs
Copy link
Author

fobrs commented Dec 10, 2024

Hi @ashumish-QCOM

I do run the latest version of DirectML and have installed the latest Qualcomm NPU driver.

The only thing I did was running the first example from the DirectML Readme. And it failed.... What should I tweak?

 **Getting Started**
See the [guide](https://github.com/microsoft/DirectML/blob/master/DxDispatch/doc/Guide.md) for detailed usage instructions.
The [models](https://github.com/microsoft/DirectML/blob/master/DxDispatch/models) directory contains some
simple examples to get started. For example, here's an example that invokes DML's reduction operator:

> dxdispatch.exe models/dml_reduce.json

Running on 'NVIDIA GeForce RTX 2070 SUPER'
Resource 'input': 1, 2, 3, 4, 5, 6, 7, 8, 9
Resource 'output': 6, 15, 24

I hope Qualcomm gets its NPU drivers 100% compatible without users requiring tweaking...

@ashumish-QCOM
Copy link

Hi @fobrs

Here are some steps you can look into:

Model Compatibility: Ensure the model dml_reduce.json is fully compatible with the Qualcomm Hexagon NPU. Some models may require specific optimizations.
Execution Flags: Experiment with different execution flags in your dxdispatch command to see if that resolves the issue.
DirectML Version: Ensure you are using DirectML version 1.15.4 or higher, as this version includes optimizations for the Qualcomm Hexagon NPU

@fobrs
Copy link
Author

fobrs commented Dec 12, 2024

The model consists of exactly one operator:

"sum": 
{
    "type": "DML_OPERATOR_REDUCE",
    "desc": 
    {
        "InputTensor": { "DataType": "FLOAT32", "Sizes": [1,1,3,3] },
        "OutputTensor": { "DataType": "FLOAT32", "Sizes": [1,1,3,1] },
        "Function": "DML_REDUCE_FUNCTION_SUM",
        "AxisCount": 1,
        "Axes": [3]
    }
}

The error occurs during compilation. So tweaking execution parameters makes no sense here.

@SID37
Copy link

SID37 commented Jan 25, 2025

@fobrs, I have a Snapdragon X Plus - X1P-42-100, and I noticed that currently NPU does not support the IDMLDevice::CreateOperator at all, always returning error 0x887A0004: The specified device interface or feature level is not supported on this system. But IDMLDevice1::CompileGraph works, so in this case you can make it work by compiling the operator as a graph

    "sum":
    {
        "type": "DML_OPERATOR_REDUCE",
        "desc":
        {
            "InputTensor": { "DataType": "FLOAT32",  "Sizes": [1,1,3,3] },
            "OutputTensor": { "DataType": "FLOAT32", "Sizes": [1,1,3,1] },
            "Function": "DML_REDUCE_FUNCTION_SUM",
            "AxisCount": 1,
            "Axes": [3]
        },
        "dmlCompileType": "DmlCompileGraph",
        "executionFlags": "DML_EXECUTION_FLAG_NONE"
    }

But even so, not all operators will work.

@fobrs
Copy link
Author

fobrs commented Jan 25, 2025

Wow, Thanks, that works! Although sometimes.

Most of the time now I get this failed assertion:

Assertion failed!

Program: ...0.inf_arm64_71cbfe001b559788\qcnspdx12arm64xum.dll
File: Z:\b\WP\qcnspmcdm\rel\10.9\d3d12dml\qnnp...\HtpModel.cpp
Line: 1404

Expression: nErr == 0

For information on how your program can cause an assertion
failure, see the Visual C++ documentation on asserts

(Press Retry to debug the application - JIT must be enabled)

And then:
Failed to execute dispatchable: C:\projects\DirectML\DxDispatch\src\dxdispatch\Device.cpp(615)\dxdispatchImpl.dll!00007FFA2F9429FC: (caller: 00007FFA2F954C48) Exception(1) tid(b58) 887A0005 The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants