Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for T5 Architecture #384

Open
niranjanakella opened this issue Jun 5, 2024 · 6 comments
Open

Support for T5 Architecture #384

niranjanakella opened this issue Jun 5, 2024 · 6 comments
Labels
models Additions to model or architectures new feature New feature or request

Comments

@niranjanakella
Copy link

niranjanakella commented Jun 5, 2024

Hello @EricLBuehler, opening this issue as part of T5 Seq2Seq model architecture support in mistral.rs. (As discussed)

Relates to: #156

@EricLBuehler EricLBuehler added new feature New feature or request models Additions to model or architectures labels Jun 5, 2024
@EricLBuehler
Copy link
Owner

Hi @niranjanakella!

Thank you for opening this issue. Just to clarify, would this be a quantized or nonquantized implementation?

@niranjanakella
Copy link
Author

niranjanakella commented Jun 6, 2024

@EricLBuehler Non-Quantized f16,32 implementation currently holds more precedence. But if possible, would also like to have a quantized implementation too.

Also I wish to know if LoRA adapters can be loaded at runtime without merging them into the model. It would be a huge game changer for most applications given the fact that many developers train multiple adapters. Would be great to attach multiple adapters during runtime.

@EricLBuehler
Copy link
Owner

EricLBuehler commented Jun 6, 2024

Non-Quantized f16,32 implementation currently holds more precedence. But if possible, would also like to have a quantized implementation too.

Sounds great, I'll get started on an implementation.

Also I wish to know if LoRA adapters can be loaded at runtime without merging them into the model. It would be a huge game changer for most applications given the fact that many developers train multiple adapters. Would be great to attach multiple adapters during runtime.

We actually have this feature already! There are 2 ways to do this:

  1. Activate adapters at runtime by preloading some and then sending requests to activate adapters
  2. Use per-request adapter specification to have granular control.

Docs: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/ADAPTER_MODELS.md#adapter-model-dynamic-adapter-activation.

@EricLBuehler
Copy link
Owner

Hi @niranjanakella! Sorry for the delay; I have been busy with the Idefics 2 implementation (#309). I should have a prototype ready tonight, though!

@niranjanakella
Copy link
Author

@EricLBuehler No problem sounds good. I am looking forward to trying it out soon.

@EricLBuehler
Copy link
Owner

See: #432.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models Additions to model or architectures new feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants