Skip to content

Commit

Permalink
Merge pull request #13 from fabio-sim/feat/depth-anything-v2
Browse files Browse the repository at this point in the history
feat: add depth anything v2, torchdynamo
  • Loading branch information
fabio-sim authored Jun 21, 2024
2 parents 9475cb1 + c79f969 commit 40ed316
Show file tree
Hide file tree
Showing 16 changed files with 1,848 additions and 10 deletions.
87 changes: 78 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,75 @@

# Depth Anything ONNX

Open Neural Network Exchange (ONNX) compatible implementation of [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://github.com/LiheYoung/Depth-Anything).
Open Neural Network Exchange (ONNX) compatible implementation of [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://github.com/LiheYoung/Depth-Anything) and [Depth Anything V2](https://github.com/DepthAnything/Depth-Anything-V2). Supports PyTorch 2 Export via TorchDynamo.

<p align="center"><img src="assets/sample.png" width=90%>

<details>
<summary>Changelog</summary>

#### Changelog:
- **22 June 2024**: Support Depth Anything V2 and TorchDynamo Export.
- **22 January 2024**: Release.
</details>

## 🔥 ONNX Export & Inference

We provide a simple command-line tool [`dynamo.py`](dynamo.py) based on [Typer](https://github.com/tiangolo/typer) to export Depth Anything V2 to ONNX and PyTorch2 programs. Please install the [requirements](/requirements.txt) first.

```shell
$ python dynamo.py --help

Usage: dynamo.py [OPTIONS] COMMAND [ARGS]...

Depth-Anything Dynamo CLI

╭─ Commands ───────────────────────────────────────────────╮
export Export Depth-Anything V2 using TorchDynamo. │
│ infer Depth-Anything V2 inference using ONNXRuntime. │
│ No dependency on PyTorch. │
╰──────────────────────────────────────────────────────────╯
```

If you would like to try out inference right away, you can download ONNX models that have already been exported [here](https://github.com/fabio-sim/Depth-Anything-ONNX/releases).

We observe the following average latencies using the CUDA Execution Provider:

| Device | Encoder | Input Shape | Average Latency (ms) |
| --- | --- | --- | --- |
| RTX4080 12GB | ViT-S | `(1, 3, 518, 518)` | 13.3 |
| RTX4080 12GB | ViT-B | `(1, 3, 518, 518)` | 29.3 |
| RTX4080 12GB | ViT-L | `(1, 3, 518, 518)` | 83.2 |

Relevant framework versions:
```text
CUDA==12.1
cuDNN==8.9.2
onnxruntime-gpu==1.18.0
torch==2.3.1
```

### Export Example

> [!TIP]
> You can view the available options at any time by passing `--help`.
```bash
python dynamo.py export --encoder vitb --output weights/vitb.onnx --opset 18
```

> [!CAUTION]
> The TorchDynamo-based ONNX Exporter is a new beta feature that may undergo breaking changes in the future. Currently, only opset version 18 is supported. Specifying a smaller opset version will fall back to the legacy TorchScript-based Exporter.
### Inference Example

```bash
python dynamo.py infer weights/vitb.onnx -i assets/sacre_coeur1.jpg
```

This function serves as an implementation reference for performing inference with only ONNXRuntime and OpenCV as dependencies.

---

### Legacy
<details>
<summary> V1 </summary>
## 🔥 ONNX Export

Prior to exporting the ONNX models, please install the [requirements](/requirements.txt).
Expand All @@ -40,9 +99,6 @@ python infer.py --img assets/DSC_0410.JPG --model weights/depth_anything_vits14.
</pre>
</details>

## 🚀 TensorRT Support

(To be investigated)

## ⏱️ Inference Time Comparison

Expand All @@ -59,8 +115,21 @@ All experiments are conducted on an i9-12900HX CPU and RTX4080 12GB GPU with `CU
- Currently, the inference speed is bottlenecked by Conv operations.
- ONNXRuntime performs slightly (20-25%) faster for the ViT-L model variant.

</details>

## Credits
If you use any ideas from the papers or code in this repo, please consider citing the authors of [Depth Anything](https://arxiv.org/abs/2401.10891) and [DINOv2](https://arxiv.org/abs/2304.07193). Lastly, if the ONNX versions helped you in any way, please also consider starring this repository.
If you use any ideas from the papers or code in this repo, please consider citing the authors of [Depth Anything](https://arxiv.org/abs/2401.10891), [Depth Anything V2](https://arxiv.org/abs/2406.09414) and [DINOv2](https://arxiv.org/abs/2304.07193). Lastly, if the ONNX versions helped you in any way, please also consider starring this repository.

```bibtex
@article{yang2024depth,
title={Depth Anything V2},
author={Lihe Yang and Bingyi Kang and Zilong Huang and Zhen Zhao and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
year={2024},
eprint={2406.09414},
archivePrefix={arXiv},
primaryClass={id='cs.CV' full_name='Computer Vision and Pattern Recognition' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.'}
}
```

```bibtex
@article{depthanything,
Expand Down
41 changes: 41 additions & 0 deletions depth_anything_v2/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
from dataclasses import dataclass
from enum import StrEnum, auto


@dataclass
class Config:
url: str
features: int
out_channels: list[int]


class Encoder(StrEnum):
vits = auto()
vitb = auto()
vitl = auto()
vitg = auto()

@property
def config(self) -> Config:
return {
Encoder.vits: Config(
url="https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true",
features=64,
out_channels=[48, 96, 192, 384],
),
Encoder.vitb: Config(
url="https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true",
features=128,
out_channels=[96, 192, 384, 768],
),
Encoder.vitl: Config(
url="https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true",
features=256,
out_channels=[256, 512, 1024, 1024],
),
Encoder.vitg: Config(
url="Coming Soon", # TODO
features=512,
out_channels=[1536, 1536, 1536, 1536],
),
}[self]
Loading

0 comments on commit 40ed316

Please sign in to comment.