You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to make my own models but use the ternary BitLinear layers instead of standard nn.Linear layers. I see there is still a F.linear or torch.bmm operation in your BitLinear code and due to the overhead of quantization in each step, it is slower. Where can I find the inference code for just the simple BitLinear operation that is matmul free?
Thanks
The text was updated successfully, but these errors were encountered:
Thank you for your question. If you would like to use BitLinear for fast inference, as mentioned in our paper, we use BitBLAS for acceleration in GPUs. You can also refer to our branch to use our BitBLAS version when you install the BitBLAS.
Hi, Thanks for your reply. Can you provide more details on what you mean by your version of BitBLAS? Where can I find your code that uses BitBLAS for inference?
Hi,
I want to make my own models but use the ternary BitLinear layers instead of standard nn.Linear layers. I see there is still a F.linear or torch.bmm operation in your BitLinear code and due to the overhead of quantization in each step, it is slower. Where can I find the inference code for just the simple BitLinear operation that is matmul free?
Thanks
The text was updated successfully, but these errors were encountered: