-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inference time cpu vs gpu #42
Comments
You may experience improved speed if you use import time
import torch
from span_marker import SpanMarkerModel
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super", torch_dtype=torch.bfloat16, device_map="cuda")
# model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super", device_map="cuda")
text = [
"Leonardo da Vinci recently published a scientific paper on combatting Mitocromulent disease. Leonardo da Vinci painted the most famous painting in existence: the Mona Lisa.",
"Leonardo da Vinci scored a critical goal towards the end of the second half. Leonardo da Vinci controversially veto'd a bill regarding public health care last friday. Leonardo da Vinci was promoted to Sergeant after his outstanding work in the war."
]
BS = 64
N = 500
model.predict(text * 50, batch_size=BS)
start_t = time.time()
model.predict(text * N, batch_size=BS)
print(f"{time.time() - start_t:8f}s for {N * 2} samples with batch_size={BS} and torch_dtype={model.dtype}.") This gave me:
and
Note that float16 is not available on CPU though! Not sure about bfloat16. If you have a Linux (or Mac?) device, then you can also use Beyond that the steps to increase the inference speeds become pretty challenging. Hope this helps a bit. Also, you can process about 8 sentences per second with CPU and about 110 sentences per second in GPU, is that not sufficiently fast yet?
|
thanku @tomaarsen
|
@polodealvarado started working on ONNX support here: #26 (comment)
|
I have used gte-tiny embeddings for my custom NER model and need to speed up the inference time.
below are stats for different batch sizes.
Is there any specific method to enhance it? @tomaarsen
The text was updated successfully, but these errors were encountered: