Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with building TensorRT engine for the mask decoder #12

Open
zunction opened this issue Nov 21, 2023 · 2 comments
Open

Issue with building TensorRT engine for the mask decoder #12

zunction opened this issue Nov 21, 2023 · 2 comments

Comments

@zunction
Copy link

I am trying to setup nanosam using the NGC docker nvcr.io/nvidia/pytorch:23.10-py3

At the part of building the tensorrt engine for the mask decoder, I encounter this error message

&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=data/mobile_sam_mask_decoder.onnx --saveEngine=data/mobile_sam_mask_decoder.engine --minShapes=point_coords:1x1x2,point_labels:1x1 --optShapes=point_coords:1x1x2,point_labels:1x1 --maxShapes=point_coords:1x10x2,point_labels:1x10
[11/21/2023-02:04:10] [I] === Model Options ===
[11/21/2023-02:04:10] [I] Format: ONNX
[11/21/2023-02:04:10] [I] Model: data/mobile_sam_mask_decoder.onnx
[11/21/2023-02:04:10] [I] Output:
[11/21/2023-02:04:10] [I] === Build Options ===
[11/21/2023-02:04:10] [I] Max batch: explicit batch
[11/21/2023-02:04:10] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/21/2023-02:04:10] [I] minTiming: 1
[11/21/2023-02:04:10] [I] avgTiming: 8
[11/21/2023-02:04:10] [I] Precision: FP32
[11/21/2023-02:04:10] [I] LayerPrecisions: 
[11/21/2023-02:04:10] [I] Layer Device Types: 
[11/21/2023-02:04:10] [I] Calibration: 
[11/21/2023-02:04:10] [I] Refit: Disabled
[11/21/2023-02:04:10] [I] Version Compatible: Disabled
[11/21/2023-02:04:10] [I] TensorRT runtime: full
[11/21/2023-02:04:10] [I] Lean DLL Path: 
[11/21/2023-02:04:10] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[11/21/2023-02:04:10] [I] Exclude Lean Runtime: Disabled
[11/21/2023-02:04:10] [I] Sparsity: Disabled
[11/21/2023-02:04:10] [I] Safe mode: Disabled
[11/21/2023-02:04:10] [I] Build DLA standalone loadable: Disabled
[11/21/2023-02:04:10] [I] Allow GPU fallback for DLA: Disabled
[11/21/2023-02:04:10] [I] DirectIO mode: Disabled
[11/21/2023-02:04:10] [I] Restricted mode: Disabled
[11/21/2023-02:04:10] [I] Skip inference: Disabled
[11/21/2023-02:04:10] [I] Save engine: data/mobile_sam_mask_decoder.engine
[11/21/2023-02:04:10] [I] Load engine: 
[11/21/2023-02:04:10] [I] Profiling verbosity: 0
[11/21/2023-02:04:10] [I] Tactic sources: Using default tactic sources
[11/21/2023-02:04:10] [I] timingCacheMode: local
[11/21/2023-02:04:10] [I] timingCacheFile: 
[11/21/2023-02:04:10] [I] Heuristic: Disabled
[11/21/2023-02:04:10] [I] Preview Features: Use default preview flags.
[11/21/2023-02:04:10] [I] MaxAuxStreams: -1
[11/21/2023-02:04:10] [I] BuilderOptimizationLevel: -1
[11/21/2023-02:04:10] [I] Input(s)s format: fp32:CHW
[11/21/2023-02:04:10] [I] Output(s)s format: fp32:CHW
[11/21/2023-02:04:10] [I] Input build shape: point_coords=1x1x2+1x1x2+1x10x2
[11/21/2023-02:04:10] [I] Input build shape: point_labels=1x1+1x1+1x10
[11/21/2023-02:04:10] [I] Input calibration shapes: model
[11/21/2023-02:04:10] [I] === System Options ===
[11/21/2023-02:04:10] [I] Device: 0
[11/21/2023-02:04:10] [I] DLACore: 
[11/21/2023-02:04:10] [I] Plugins:
[11/21/2023-02:04:10] [I] setPluginsToSerialize:
[11/21/2023-02:04:10] [I] dynamicPlugins:
[11/21/2023-02:04:10] [I] ignoreParsedPluginLibs: 0
[11/21/2023-02:04:10] [I] 
[11/21/2023-02:04:10] [I] === Inference Options ===
[11/21/2023-02:04:10] [I] Batch: Explicit
[11/21/2023-02:04:10] [I] Input inference shape: point_labels=1x1
[11/21/2023-02:04:10] [I] Input inference shape: point_coords=1x1x2
[11/21/2023-02:04:10] [I] Iterations: 10
[11/21/2023-02:04:10] [I] Duration: 3s (+ 200ms warm up)
[11/21/2023-02:04:10] [I] Sleep time: 0ms
[11/21/2023-02:04:10] [I] Idle time: 0ms
[11/21/2023-02:04:10] [I] Inference Streams: 1
[11/21/2023-02:04:10] [I] ExposeDMA: Disabled
[11/21/2023-02:04:10] [I] Data transfers: Enabled
[11/21/2023-02:04:10] [I] Spin-wait: Disabled
[11/21/2023-02:04:10] [I] Multithreading: Disabled
[11/21/2023-02:04:10] [I] CUDA Graph: Disabled
[11/21/2023-02:04:10] [I] Separate profiling: Disabled
[11/21/2023-02:04:10] [I] Time Deserialize: Disabled
[11/21/2023-02:04:10] [I] Time Refit: Disabled
[11/21/2023-02:04:10] [I] NVTX verbosity: 0
[11/21/2023-02:04:10] [I] Persistent Cache Ratio: 0
[11/21/2023-02:04:10] [I] Inputs:
[11/21/2023-02:04:10] [I] === Reporting Options ===
[11/21/2023-02:04:10] [I] Verbose: Disabled
[11/21/2023-02:04:10] [I] Averages: 10 inferences
[11/21/2023-02:04:10] [I] Percentiles: 90,95,99
[11/21/2023-02:04:10] [I] Dump refittable layers:Disabled
[11/21/2023-02:04:10] [I] Dump output: Disabled
[11/21/2023-02:04:10] [I] Profile: Disabled
[11/21/2023-02:04:10] [I] Export timing to JSON file: 
[11/21/2023-02:04:10] [I] Export output to JSON file: 
[11/21/2023-02:04:10] [I] Export profile to JSON file: 
[11/21/2023-02:04:10] [I] 
[11/21/2023-02:04:10] [I] === Device Information ===
[11/21/2023-02:04:10] [I] Selected Device: Quadro RTX 4000 with Max-Q Design
[11/21/2023-02:04:10] [I] Compute Capability: 7.5
[11/21/2023-02:04:10] [I] SMs: 40
[11/21/2023-02:04:10] [I] Device Global Memory: 7802 MiB
[11/21/2023-02:04:10] [I] Shared Memory per SM: 64 KiB
[11/21/2023-02:04:10] [I] Memory Bus Width: 256 bits (ECC disabled)
[11/21/2023-02:04:10] [I] Application Compute Clock Rate: 1.38 GHz
[11/21/2023-02:04:10] [I] Application Memory Clock Rate: 6.001 GHz
[11/21/2023-02:04:10] [I] 
[11/21/2023-02:04:10] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[11/21/2023-02:04:10] [I] 
[11/21/2023-02:04:10] [I] TensorRT version: 8.6.1
[11/21/2023-02:04:10] [I] Loading standard plugins
[11/21/2023-02:04:10] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 22, GPU 114 (MiB)
[11/21/2023-02:04:15] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +889, GPU +174, now: CPU 987, GPU 288 (MiB)
[11/21/2023-02:04:15] [I] Start parsing network model.
[11/21/2023-02:04:15] [I] [TRT] ----------------------------------------------------------------
[11/21/2023-02:04:15] [I] [TRT] Input filename:   data/mobile_sam_mask_decoder.onnx
[11/21/2023-02:04:15] [I] [TRT] ONNX IR version:  0.0.8
[11/21/2023-02:04:15] [I] [TRT] Opset version:    16
[11/21/2023-02:04:15] [I] [TRT] Producer name:    pytorch
[11/21/2023-02:04:15] [I] [TRT] Producer version: 2.1.0
[11/21/2023-02:04:15] [I] [TRT] Domain:           
[11/21/2023-02:04:15] [I] [TRT] Model version:    0
[11/21/2023-02:04:15] [I] [TRT] Doc string:       
[11/21/2023-02:04:15] [I] [TRT] ----------------------------------------------------------------
[11/21/2023-02:04:15] [I] Finished parsing network model. Parse time: 0.0541873
&&&& FAILED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=data/mobile_sam_mask_decoder.onnx --saveEngine=data/mobile_sam_mask_decoder.engine --minShapes=point_coords:1x1x2,point_labels:1x1 --optShapes=point_coords:1x1x2,point_labels:1x1 --maxShapes=point_coords:1x10x2,point_labels:1x10

I am not sure how to proceed, and in fact trying to setup (1) with installation instructions in the README (2) using the jetson containers from 'jetson-containers` also did not work out well for me. I am suspecting the jetson containers only work for jetson as I got a hardware error message; my GPU is a quadro RTX 4000.

Would appreciate some help in my attempt to setup a running nanosam, thanks!

@jaybdub
Copy link
Contributor

jaybdub commented Nov 21, 2023

Hi @zunction ,

Thanks for reaching out!

Could you try running trtexec with the --verbose flag enabled to see if this provides any additional information?

You may also be able to use nanosam outside of a container environment, by installing the dependencies listed in the README.

Let me know if this helps or you run into issues.

Best,
John

@csoham96
Copy link

i am getting the issue while running trtexec to build tensorrt engine from the onnx file i have installed tensorrt 8.6.1 with 11.8 cuda installed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants