Performance YOLOv7 vs YOLOv9 Series using TensorRT engine #143

levipereira · 2024-03-01T18:58:45Z

Perfomance Test using GPU RTX 4090 on AMD Ryzen 7 3700X 8-Core/ 16GB RAM

Model Performance using TensorRT engine

All models were sourced from the original repository and subsequently converted to ONNX format with dynamic batching enabled. Profiling was conducted using TensorRT Engine Explorer (TREx).

Detailed reports will be made available in the coming days, providing comprehensive insights into the performance metrics and optimizations achieved.

All models were converted (re-parameterized) and optimized for inference.

TensorRT version: 8.6.1

Device Properties:

Selected Device: NVIDIA GeForce RTX 4090
Compute Capability: 8.9
SMs: 128.0
Compute Clock Rate: 2.58 GHz
Device Global Memory: 24208 MiB
Shared Memory per SM: 100 KiB
Memory Bus Width: 384.0 bits
Memory Clock Rate: 10.501 GHz

YOLO v7 vs v9 Series Models Performance Results

Average time: Represents the total sum of layer latencies when profiling layers individually.
Latency: Refers to the minimum, maximum, mean, median, and 99th percentile of the engine latency measurements, captured without profiling layers.
Throughput: Measured in inferences per second (IPS).

Performance Summary Tables

Throughput and Average Time

Model Name	Throughput (IPS)	Average Time (ms)
YOLOv7	978	1.441
YOLOv7x	609	2.065
YOLOv9-c	798	2.049
YOLOv9-e	353	4.261

Latency Summary

Model Name	Min Latency (ms)	Max Latency (ms)	Mean Latency (ms)	Median Latency (ms)	99th Percentile Latency (ms)
YOLOv7	1.012	1.104	1.020	1.018	1.024
YOLOv7x	1.613	1.751	1.640	1.636	1.664
YOLOv9-c	1.246	1.359	1.251	1.250	1.251
YOLOv9-e	2.807	3.032	2.823	2.814	2.817

Full Report
https://github.com/levipereira/triton-server-yolo/tree/master/perfomance

WongKinYiu · 2024-03-01T23:18:40Z

@levipereira
Thanks for provide TRT performance reports.
I notice that you use yolov9-c.pt for exporting and testing performance.
In actually, yolov9-c.pt contains PGI auxiliary branch, which can be removed at inference stage.
Could you help for using yolov9-c-converted.pt and yolov9-e-converted.pt to get more performance reports?
Their architectures are as same as gelan-c.pt and gelan-e.pt respectively.

The converted weights are provided at here.
yolov9-c-converted.pt
yolov9-e-converted.pt

WongKinYiu · 2024-03-02T12:53:36Z

https://github.com/levipereira/yolov9/blob/main/models/experimental.py#L140

output = output[0] will gets output of auxiliary branch.
output = output[1] will gets output of main branch, which is correct one.

levipereira · 2024-03-03T01:04:30Z

https://github.com/levipereira/yolov9/blob/main/models/experimental.py#L140

output = output[0] will gets output of auxiliary branch. output = output[1] will gets output of main branch, which is correct one.

#130 (comment)

laugh12321 · 2024-03-03T05:39:03Z

Perfomance Test using GPU RTX 2080Ti 2GB on AMD Ryzen 7 5700X 8-Core/ 128GB RAM

All models are converted to ONNX models with the EfficientNMS plugin. The conversion was done using the TensoRT-YOLO tool, with the trtyolo CLI tool installed via pip install tensorrt-yolo==3.0.1. The batch size is 1 and the image size is 640.

Model Export and Performance Testing

Use the following commands to export the model and perform performance testing with trtexec:

trtyolo export -v yolov9 -w yolov9-converted.pt --imgsz 640 -o ./
trtexec --onnx=yolov9-converted.onnx --saveEngine=yolov9-converted.engine --fp16
trtexec --fp16 --avgRuns=1000 --useSpinWait --loadEngine=yolov9-converted.engine

Performance testing was conducted using the TensorRT-YOLO inference on the coco128 dataset.

YOLOv9 Series

Tool	YOLOv9-T-Converted	YOLOv9-S-Converted	YOLOv9-M-Converted	YOLOv9-C-Converted	YOLOv9-E-Converted
trtexec (infer)	Mean Latency (ms) `3.51857`	Mean Latency (ms) `3.67899`	Mean Latency (ms) `4.19460`	Mean Latency (ms) `4.25964`	Mean Latency (ms) `8.95429`
TensorRT-YOLO Python (infer)	Mean Latency (ms) `10.19576`	Mean Latency (ms) `10.15226`	Mean Latency (ms) `9.29918`	Mean Latency (ms) `9.60093`	Mean Latency (ms) `21.85042`
TensorRT-YOLO C++ (pre + infer)	Mean Latency (ms) `3.44162`	Mean Latency (ms) `3.66080`	Mean Latency (ms) `4.10519`	Mean Latency (ms) `4.12471`	Mean Latency (ms) `8.98964`

Tool	Gelan-S2	Gelan-S	Gelan-M	Gelan-C	Gelan-E
trtexec (infer)	Mean Latency (ms) `3.42082`	Mean Latency (ms) `3.78578`	Mean Latency (ms) `4.16447`	Mean Latency (ms) `4.27485`	Mean Latency (ms) `8.91479`
TensorRT-YOLO Python (infer)	Mean Latency (ms) `9.96435`	Mean Latency (ms) `10.35934`	Mean Latency (ms) `9.14044`	Mean Latency (ms) `9.33843`	Mean Latency (ms) `21.42764`
TensorRT-YOLO C++ (pre + infer)	Mean Latency (ms) `3.60857`	Mean Latency (ms) `3.93528`	Mean Latency (ms) `4.25084`	Mean Latency (ms) `4.35533`	Mean Latency (ms) `9.23654`

YOLOv8 Series

Tool	YOLOv8n	YOLOv8s	YOLOv8m	YOLOv8l	YOLOv8x
trtexec (infer)	Mean Latency (ms) `1.90273`	Mean Latency (ms) `2.34166`	Mean Latency (ms) `3.58595`	Mean Latency (ms) `4.83306`	Mean Latency (ms) `7.12179`
TensorRT-YOLO Python (infer)	Mean Latency (ms) `7.03217`	Mean Latency (ms) `7.52751`	Mean Latency (ms) `8.75298`	Mean Latency (ms) `10.56914`	Mean Latency (ms) `12.45605`
TensorRT-YOLO C++ (pre + infer)	Mean Latency (ms) `2.02848`	Mean Latency (ms) `2.15021`	Mean Latency (ms) `3.57631`	Mean Latency (ms) `4.78318`	Mean Latency (ms) `6.96686`

levipereira · 2024-03-04T01:30:47Z

Hi @WongKinYiu,
I apologize for the delay in responding; my work has been taking up a lot of my time. I'm deeply involved in assessing the performance of YOLOv9. I've managed to gather some valuable performance information comparing YOLOv9 to YOLOv7. I'll be sharing these findings in the next few days. I'm sending a more detailed report and need to highlight the differences accurately.

The original post had results from many variables that shouldn't have been included in measuring the model's performance. That's why I made changes to the original post.

WongKinYiu · 2024-06-06T04:15:35Z

@laugh12321

Could you help for testing speed of yolov9-t-converted.pt, yolov9-s-converted.pt, yolov9-m-converted.pt?

Thanks.

laugh12321 · 2024-06-06T04:26:20Z

@laugh12321

Could you help for testing speed of yolov9-t-converted.pt, yolov9-s-converted.pt, yolov9-m-converted.pt?

Thanks.

@WongKinYiu Use trtexec or Trnsorrt-YOLO to test the model speed with NMS plugin?

WongKinYiu · 2024-06-06T05:48:08Z

Same testing method as the table #143 (comment).
Are these results tested with NMS plugin?
Thanks.

laugh12321 · 2024-06-06T06:36:35Z

@WongKinYiu Yes, these results were tested with the NMS plugin. In #143 (comment), we performed performance testing using the Python code of TensorRT-YOLO. We noticed that the results from the Python code were slightly lower compared to the tests conducted with the C++ code and the trtexec tool. To provide a more comprehensive comparison, we will conduct separate performance tests using the TensorRT-YOLO Python API, the TensorRT-YOLO C++ API, and the trtexec tool.

WongKinYiu · 2024-06-06T06:42:13Z

If it won't bother you too much, conduct performance tests using different protocols are nice.

laugh12321 · 2024-06-08T14:02:39Z

@WongKinYiu Update at #143 (comment)

WongKinYiu · 2024-06-08T14:42:16Z

Thanks.

It seems you have same results as @levipereira.
yolov9-m has similar speed as yolov9-c.
and yolov9 t/s/m are very slow on tensorrt yolo python.

WongKinYiu · 2024-06-08T15:40:42Z

Could you help for test gelan-s2.pt too?
Thanks.

laugh12321 · 2024-06-09T02:57:45Z

Could you help for test gelan-s2.pt too? Thanks.

@WongKinYiu Update at #143 (comment)

WongKinYiu · 2024-06-09T03:01:24Z

Thanks.

By the way, gelan-s2.pt is different from gelan-s.pt.
gelan-s2 stack 2 blocks in csp, while gelan-s stack 3 blocks in csp.

laugh12321 · 2024-06-09T03:05:08Z

Thanks.

By the way, gelan-s2.pt is different from gelan-s.pt. gelan-s2 stack 2 blocks in csp, while gelan-s stack 3 blocks in csp.

@WongKinYiu Thank you very much for your reminder. I overlooked gelan-s2.pt and will update it shortly. Thanks again for your correction!

laugh12321 · 2024-06-09T03:10:32Z

Thanks.
By the way, gelan-s2.pt is different from gelan-s.pt. gelan-s2 stack 2 blocks in csp, while gelan-s stack 3 blocks in csp.

@WongKinYiu Thank you very much for your reminder. I overlooked gelan-s2.pt and will update it shortly. Thanks again for your correction!

@WongKinYiu Update at #143 (comment)

WongKinYiu · 2024-06-09T03:13:07Z

Thanks.

agentfuzzy · 2024-11-13T23:24:59Z

Hi, I was able to run at ~36fps on an Nvidia Xavier AGX using yolov9-c-converted exported to TensorRT engine with FP16 inference and onnxsim. Very impressive

WongKinYiu closed this as completed Mar 4, 2024

levipereira changed the title ~~Performance YOLOv7 vs YOLOv9-C vs YOLOv9-E over TensorRT engine~~ Performance YOLOv7 vs YOLOv9 TensorRT engine Mar 4, 2024

levipereira changed the title ~~Performance YOLOv7 vs YOLOv9 TensorRT engine~~ Performance YOLOv7 vs YOLOv9 Series using TensorRT engine Mar 4, 2024

trivedisarthak mentioned this issue Mar 19, 2024

YOLOv9-QAT TensorRT Q/DQ: Improved Speed and Zero Loss Accuracy #253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance YOLOv7 vs YOLOv9 Series using TensorRT engine #143

Performance YOLOv7 vs YOLOv9 Series using TensorRT engine #143

levipereira commented Mar 1, 2024 •

edited

Loading

WongKinYiu commented Mar 1, 2024 •

edited

Loading

WongKinYiu commented Mar 2, 2024

levipereira commented Mar 3, 2024

laugh12321 commented Mar 3, 2024 •

edited

Loading

levipereira commented Mar 4, 2024

WongKinYiu commented Jun 6, 2024

laugh12321 commented Jun 6, 2024 •

edited

Loading

WongKinYiu commented Jun 6, 2024

laugh12321 commented Jun 6, 2024

WongKinYiu commented Jun 6, 2024

laugh12321 commented Jun 8, 2024

WongKinYiu commented Jun 8, 2024

WongKinYiu commented Jun 8, 2024

laugh12321 commented Jun 9, 2024

WongKinYiu commented Jun 9, 2024

laugh12321 commented Jun 9, 2024

laugh12321 commented Jun 9, 2024

WongKinYiu commented Jun 9, 2024

agentfuzzy commented Nov 13, 2024

Performance YOLOv7 vs YOLOv9 Series using TensorRT engine #143

Performance YOLOv7 vs YOLOv9 Series using TensorRT engine #143

Comments

levipereira commented Mar 1, 2024 • edited Loading

Perfomance Test using GPU RTX 4090 on AMD Ryzen 7 3700X 8-Core/ 16GB RAM

Model Performance using TensorRT engine

Device Properties:

YOLO v7 vs v9 Series Models Performance Results

Performance Summary Tables

Throughput and Average Time

Latency Summary

WongKinYiu commented Mar 1, 2024 • edited Loading

WongKinYiu commented Mar 2, 2024

levipereira commented Mar 3, 2024

laugh12321 commented Mar 3, 2024 • edited Loading

Perfomance Test using GPU RTX 2080Ti 2GB on AMD Ryzen 7 5700X 8-Core/ 128GB RAM

Model Export and Performance Testing

YOLOv9 Series

YOLOv8 Series

levipereira commented Mar 4, 2024

WongKinYiu commented Jun 6, 2024

laugh12321 commented Jun 6, 2024 • edited Loading

WongKinYiu commented Jun 6, 2024

laugh12321 commented Jun 6, 2024

WongKinYiu commented Jun 6, 2024

laugh12321 commented Jun 8, 2024

WongKinYiu commented Jun 8, 2024

WongKinYiu commented Jun 8, 2024

laugh12321 commented Jun 9, 2024

WongKinYiu commented Jun 9, 2024

laugh12321 commented Jun 9, 2024

laugh12321 commented Jun 9, 2024

WongKinYiu commented Jun 9, 2024

agentfuzzy commented Nov 13, 2024

levipereira commented Mar 1, 2024 •

edited

Loading

WongKinYiu commented Mar 1, 2024 •

edited

Loading

laugh12321 commented Mar 3, 2024 •

edited

Loading

laugh12321 commented Jun 6, 2024 •

edited

Loading