-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
onnxruntime_genai.onnxruntime_genai.OrtException when running Phi-3-Vision ONNX model #849
Comments
Can you upgrade your version of ONNX Runtime? The |
Thanks! Since I am using JetPack 5.1.1 with CUDA 11.4, I couldn't find a pre-compiled newer version than that of onnxruntime-gpu tarball that supports gpu-linux-aarch64. Is there any other alternative except compiling and building supported onnxruntime-gpu tarball from source? I use that extracted tar as ort_home to build the onnxruntime-genai. However besides this approach, I was able to find out a compiled gpu aarch64 newer version of onnxruntime as a .whl. when building onnxruntime-genai from source (build.py), instead of giving ort_home source directory to onnxruntime-gpu, is there any other option here? |
ONNX Runtime GenAI requires the shared libraries and header files from ONNX Runtime. To get the shared libraries, you can install the For example: 1) Download and install the ONNX Runtime
|
Thanks for the very detailed response. Will try this out and update here. @kunal-vaishnavi Any estimated release date for release Phi-3.5-vision ONNX? |
The work is in progress and we are working to complete it soon, but there's no estimated release date because the Phi-3.5 vision ONNX models will need to undergo Microsoft's Responsible AI evaluations before they can be published officially. If the evaluations take a while, I can publish a tutorial once all of the work is merged into ONNX Runtime GenAI so that you can generate your own ONNX models locally and run them. |
The new Phi-3 vision and Phi-3.5 vision ONNX models have now been released. The new models support no-image, single-image, and multi-image scenarios. |
Describe the bug
python3 phi3v.py -m cuda-int4-rtn-block-32 gives the following issue:
Loading model... Traceback (most recent call last): File "phi3v.py", line 66, in <module> run(args) File "phi3v.py", line 16, in run model = og.Model(args.model_path) onnxruntime_genai.onnxruntime_genai.OrtException: Load model from cuda-int4-rtn-block-32/phi-3-v-128k-instruct-text.onnx failed:This is an invalid model. In Node, ("/model/layers.0/attn/GroupQueryAttention", GroupQueryAttention, "com.microsoft", -1) : ("/model/layers.0/attn/qkv_proj/MatMul/output_0": tensor(float16),"","","past_key_values.0.key": tensor(float16),"past_key_values.0.value": tensor(float16),"/model/attn_mask_reformat/attn_mask_subgraph/Sub/Cast/output_0": tensor(int32),"/model/attn_mask_reformat/attn_mask_subgraph/Gather/Cast/output_0": tensor(int32),"cos_cache": tensor(float16),"sin_cache": tensor(float16),) -> ("/model/layers.0/attn/GroupQueryAttention/output_0": tensor(float16),"present.0.key": tensor(float16),"present.0.value": tensor(float16),) , Error Node (/model/layers.0/attn/GroupQueryAttention) has input size 9 not in range [min=7, max=7].
To Reproduce
Quantized Phi-3-vision model in ONNX format on the Jetson ORIN
wget http://jetson.webredirect.org:8000/jp5/cu114/onnxruntime-gpu-1.16.3.tar.gz
mkdir ort
tar -xvf onnxruntime-gpu-1.16.3.tar.gz -C ort
mv ort/include/onnxruntime/onnxruntime_c_api.h ort/include/
rm -rf ort/include/onnxruntime/
python3 build.py --use_cuda --cuda_home /usr/local/cuda-11.4 --skip_tests --skip_csharp --parallel
pip3 install *.whl
pip3 install huggingface-hub[cli]
huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx-cuda --include cuda-int4-rtn-block-32/* --local-dir .
wget https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3v.py
python3 phi3v.py -m cuda-int4-rtn-block-32
JETSON-ORIN
Additional context
onnxruntime-genai built from source without encountering any CUDA-related problems. However, when loading the model I get this error related to the model. I would appreciate any assistance in diagnosing and correcting this problem.
The text was updated successfully, but these errors were encountered: