| Blog |
👋 Hi, I’m ThomasVonWu. I'd like to introduce you to a simple and practical deployment repository based on TensorRT which uses end-to-end perception paradigm with sparse transformer to sense 3D obstacles. This repository has no complex dependency for Training | Inference | Deployment(which means, we don't need to install MMDetection3d, mmcv, mmcv-full, mmdeploy, etc.), so it's easy to install in your local workstation or supercomputing gpu clusters. This repository will also provide x86(NVIDIA RTX Series GPU) | ARM(NVIDIA ORIN) deployment solutions. Finally, you can deploy your e2e model onborad through this repository happily.
👀 I guess you are interested in:
- how to define a PyTorch custom operator: DeformableAttentionAggr and register related ONNX node.
- how to build a custom opertator plugin: DeformableAttentionAggr in TensorRT engine with Makefile or CMake.
- how to convert ONNX format file with custom opertator to TensorRT engine and make it as part of the whole engine.
- how to validate inference results consistency : PyTorch results vs. ONNX Runtime results vs. TensorRT results.
- how to convert PyTorch model with temporal fusion transformer head to ONNX.
- how to locate the TensorRT layer accurately when overflow occurs during using fp16 quantization for model parameter.
Model | ImgSize | Backbone | Framework | Precision | mAP | NDS | FPS | GPU | config | ckpt | onnx | engine |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sparse4Dv3 | 256x704 | Resnet50 | PyTorch | FP32 | 56.37 | 70.97 | 19.8 | NVIDIA GeForce RTX 3090 | config | ckpt | -- | -- |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | FP32 | wait | wait | wait | NVIDIA GeForce RTX 3090 | config | ckpt | onnx | engine |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | FP16 | wait | wait | wait | NVIDIA GeForce RTX 3090 | config | ckpt | wait | wait |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | INT8+FP16 | wait | wait | wait | NVIDIA GeForce RTX 3090 | config | ckpt | wait | wait |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | FP32 | wait | wait | wait | NVIDIA ORIN | config | ckpt | wait | wait |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | FP16 | wait | wait | wait | NVIDIA ORIN | config | ckpt | wait | wait |
Sparse4Dv3 | 256x704 | Resnet50 | TensorRT | INT8+FP16 | wait | wait | wait | NVIDIA ORIN | config | ckpt | wait | wait |
24 Sep, 2024
: I release repository: SparseEnd2End. The complete deployment solution was released.25 Aug, 2024
: I release repository: SparseEnd2End. The complete deployment solution will be released as soon as possible. Please stay tuned!
- Register custom operation : DeformableAttentionAggr and export ONNX and TensorRT engine.
25 Aug, 2024
- Verify the consistency of reasoning results : DeformableAttentionAggr PyToch Implementation vs. TensorRT plugin Implementation.
25 Aug, 2024
- Export SparseTransFormer Backbone ONNX&TensorRT engine.
8 Sep, 2024
- Verify the consistency of reasoning results : SparseTransFormer Backbone PyTorch Implementation vs. ONNX Runtime vs. TensorRT engine.
8 Sep, 2024
- Export SparseTransFormer Head ONNX and TensorRT engine.
24 Sep, 2024
- Verify the consistency of reasoning results : SparseTransFormer Head PyTorch Implementation vs. TensorRT engine.
24 Sep, 2024
- Reasoning acceleration using CUDA shared memory and CUDA FP16 in DeformableAttentionAggr plugin Implementation.
- Reasoning acceleration using FlashAttention in replace of MultiheadAttention.
- Reasoning acceleration using FP16/INT8 in replace of FP32 of TensorRT engine.
- Reasoning acceleration : Image pre-processing Instancbank Caching and model post-processing Implementation with CUDA.
- Image pre-processing, Instancbank Caching and model post-processing Implementation with C++.
- Onboard: Full-link reasoning using CUDA, TensorRT and C++.
SparseEnd2End is a Sparse-Centric paradigm for end-to-end autonomous driving perception.
If you find SparseEnd2End useful in your research or applications, please consider giving me a star 🌟
08/25/2024: [v1.0.0] This repository now supports Training | Inference in NuscenesDataset. It includes: data dump in JSON, Training | Inference log caching, TensorBoard hooking, and so on.