NVIDIA RTX 2070 run squeezenet in 1.95ms #65
Replies: 3 comments 2 replies
-
Should we perhaps add some sort of performance showcase to README? Some more data points: NVIDIA GeForce GTX 1080 Ti (11 GiB RAM) in a 12C ThreadRipper 1920X build running Ubuntucargo run --release --example squeeze
time: first_prediction: 17.689169ms Note that in the SqueezeNet WASM example the times improve a bit when you run more than one inference over time, probably due to some sort of caching effect that's not there on the first run. You can use the CLI to perform a more interesting benchmark (it performs 100 inferences, so divide the total time below by 100 to get the average time per frame). It also performs CPU inference using tract and shows the speedup factor. On the 1080Ti machine: cargo run --release --features=cpu -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --compare --benchmark
OK (gpu=1692ms, cpu=2116ms, 1.25x) Looks like tract can employ the ThreadRipper's 24 logical cores and manages to come close to the GPU result! GPU comes close to 60/s on average. M1 Max MacBook 14"cargo run --release --features=cpu -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --compare --benchmark
OK (gpu=554ms, cpu=1315ms, 2.37x)
cargo run --release --example squeeze
time: first_prediction: 30.064416ms Although the first run is slow at 30ms, the benchmark is really fast at an average of 5.54ms/inference or about 180/s. Even the CPU outruns the (admittedly three years older) Threadripper. (Note that the fans of the laptop don't even come on. Thing is a beast). i3-9100T with Intel UHD630Just for fun, my shitty work desktop machine (Dell, i3-9100T with Intel UHD 630, Windows 11, I wrote all this while the poor thing was compiling wonnx): cargo run --release --features=cpu -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --compare --benchmark
OK (gpu=2343ms, cpu=3010ms, 1.28x) So... just shy of 43/s on GPU. |
Beta Was this translation helpful? Give feedback.
-
Ahah, i find it great that we manage to "beat" Tract on almost all machine. I'll try to improve performance, in the near future. cargo run --release --features=cpu -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --compare --benchmark
OK (gpu=299ms, cpu=2736ms, 9.14x) CPU: AMD Ryzen 5 2600 Six-Core Processor |
Beta Was this translation helpful? Give feedback.
-
New GPU at the office: RTX 3090 Ti (24GB RAM), in the machine that I previously tested the 1080 Ti in: $ cargo run --release --example squeeze
time: first_prediction: 1.3061ms
cargo run --release --features=cpu -- infer ./data/models/opt-squeeze.onnx -i data=./data/images/pelican.jpeg --compare --benchmark
OK (gpu=96ms, cpu=4080ms, 42.46x) So a cool 1041 per second :-) (Also curious to see the CPU performance being so much lower. Seems like nothing else is running on the CPU and I can't imagine it being thermally throttled) |
Beta Was this translation helpful? Give feedback.
-
Using a newly build tower with a RTX 2070, the squeezenet example runs in 1.95ms. On my previous GTX 1050, the time was 24ms.
I believe that the difference come from mainly memory allocation on the GPU but could be wrong.
Beta Was this translation helpful? Give feedback.
All reactions