You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I tried using SaT as a drop in for WtP (wtp-canine-s-1l-no-adapters).
However no matter which variant between 1l and 3l I try, it always takes nearly a second to run inference (vs 0.013s (cpu) and 0.005s (gpu) with wtp). There is no difference between CPU and GPU in the SaT runtime for me.
System: docker
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
Python 3.10.12
Hi, thanks for catching this! There was a small issue with the tokenizer. We fixed it with wtpsplit==2.0.4; please upgrade.
With this, I get (both 1L):
SaT:
%timeit sat.split(SENTENCE * 100)
801 ms ± 351 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
SaT + GPU:
%timeit sat.split(SENTENCE * 100)
65.5 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
WtP:
%timeit wtp.split(SENTENCE * 100)
6.08 s ± 1.49 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
WtP + GPU:
%timeit wtp.split(SENTENCE * 100)
370 ms ± 9.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(Note: For very short sequences and small models, it may still be that WtP is slightly faster. But you should absolutely not use WtP with short sequences regardless since others have reported problematic inconsistencies and we also show its poor performance in our paper.)
Hi, I tried using SaT as a drop in for WtP (wtp-canine-s-1l-no-adapters).
However no matter which variant between 1l and 3l I try, it always takes nearly a second to run inference (vs 0.013s (cpu) and 0.005s (gpu) with wtp). There is no difference between CPU and GPU in the SaT runtime for me.
System: docker
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
Python 3.10.12
Pip
The text was updated successfully, but these errors were encountered: