NVIDIA RTX 2070 run squeezenet in 1.95ms #65

haixuanTao · 2022-03-08T10:16:45Z

haixuanTao
Mar 8, 2022
Maintainer

Using a newly build tower with a RTX 2070, the squeezenet example runs in 1.95ms. On my previous GTX 1050, the time was 24ms.

I believe that the difference come from mainly memory allocation on the GPU but could be wrong.

pixelspark · 2022-03-08T15:56:08Z

pixelspark
Mar 8, 2022
Maintainer

Should we perhaps add some sort of performance showcase to README?

Some more data points:

NVIDIA GeForce GTX 1080 Ti (11 GiB RAM) in a 12C ThreadRipper 1920X build running Ubuntu

cargo run --release --example squeeze
time: first_prediction: 17.689169ms

Note that in the SqueezeNet WASM example the times improve a bit when you run more than one inference over time, probably due to some sort of caching effect that's not there on the first run. You can use the CLI to perform a more interesting benchmark (it performs 100 inferences, so divide the total time below by 100 to get the average time per frame). It also performs CPU inference using tract and shows the speedup factor. On the 1080Ti machine:

cargo run --release --features=cpu -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --compare --benchmark
OK (gpu=1692ms, cpu=2116ms, 1.25x)

Looks like tract can employ the ThreadRipper's 24 logical cores and manages to come close to the GPU result! GPU comes close to 60/s on average.

M1 Max MacBook 14"

cargo run --release --features=cpu -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --compare --benchmark
OK (gpu=554ms, cpu=1315ms, 2.37x)

cargo run --release --example squeeze
time: first_prediction: 30.064416ms

Although the first run is slow at 30ms, the benchmark is really fast at an average of 5.54ms/inference or about 180/s. Even the CPU outruns the (admittedly three years older) Threadripper. (Note that the fans of the laptop don't even come on. Thing is a beast).

i3-9100T with Intel UHD630

Just for fun, my shitty work desktop machine (Dell, i3-9100T with Intel UHD 630, Windows 11, I wrote all this while the poor thing was compiling wonnx):

cargo run --release --features=cpu -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --compare --benchmark
OK (gpu=2343ms, cpu=3010ms, 1.28x)

So... just shy of 43/s on GPU.

1 reply

haixuanTao Mar 8, 2022
Maintainer Author

I think that 43/s is quite good considering the GPU card. I think that being able to run DL at ~30fps on standart Intel work computer could meet a niche market!

haixuanTao · 2022-03-08T16:19:30Z

haixuanTao
Mar 8, 2022
Maintainer Author

Ahah, i find it great that we manage to "beat" Tract on almost all machine. I'll try to improve performance, in the near future.
This is the benchmark I got:

cargo run --release --features=cpu -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --compare --benchmark
OK (gpu=299ms, cpu=2736ms, 9.14x)

CPU: AMD Ryzen 5 2600 Six-Core Processor
GPU 0: NVIDIA GeForce RTX 2070 (UUID: GPU-78f94e6f-4480-6419-ff4b-9062796ee4e4)

1 reply

pixelspark Mar 8, 2022
Maintainer

Heh, just a casual 334/s 🥇 these GPUs are fast. Curious to see how the numbers change for larger models (I expect the 2070 to stay fast but the others to catch up a bit as there is higher utilization?)

Re tract, they optimize for latency mostly, not throughput, and is intended for embedded usage (that said, it is quite well-optimized as far as I can see).

pixelspark · 2022-05-20T13:50:03Z

pixelspark
May 20, 2022
Maintainer

New GPU at the office: RTX 3090 Ti (24GB RAM), in the machine that I previously tested the 1080 Ti in:

$ cargo run --release --example squeeze
time: first_prediction: 1.3061ms

cargo run --release --features=cpu -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --compare --benchmark
OK (gpu=96ms, cpu=4080ms, 42.46x)

So a cool 1041 per second :-)

(Also curious to see the CPU performance being so much lower. Seems like nothing else is running on the CPU and I can't imagine it being thermally throttled)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA RTX 2070 run squeezenet in 1.95ms #65

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

NVIDIA RTX 2070 run squeezenet in 1.95ms #65

haixuanTao Mar 8, 2022 Maintainer

Replies: 3 comments · 2 replies

pixelspark Mar 8, 2022 Maintainer

NVIDIA GeForce GTX 1080 Ti (11 GiB RAM) in a 12C ThreadRipper 1920X build running Ubuntu

M1 Max MacBook 14"

i3-9100T with Intel UHD630

haixuanTao Mar 8, 2022 Maintainer Author

haixuanTao Mar 8, 2022 Maintainer Author

pixelspark Mar 8, 2022 Maintainer

pixelspark May 20, 2022 Maintainer

haixuanTao
Mar 8, 2022
Maintainer

Replies: 3 comments 2 replies

pixelspark
Mar 8, 2022
Maintainer

haixuanTao Mar 8, 2022
Maintainer Author

haixuanTao
Mar 8, 2022
Maintainer Author

pixelspark Mar 8, 2022
Maintainer

pixelspark
May 20, 2022
Maintainer