You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi everyone, long time no see! Start from this week, I will use about 4 weeks to gradually push AutoGPTQ to v1.0.0, in the mean time, there will be 2~3 minor version released as optimization or feature preview so that you can experience those updates as soon as they are finished and I can hear more community voices and get more feedbacks.
My vision is at the time of v1.0.0 is released, AutoGPTQ can serve as an automatic, extendable and flexible quantization backend for all language models that are written by Pytorch.
I open this issue to list all the things will be done (optimizations, new features, bug fixes, etc) and record the development progress.(so contents below will be updated frequently)
Feel free to comment in the thread to give your opinions and suggestions!
Optimizations
refactor the code framework for the future extensions while maintain the important interfaces.
separate quantization logic as a stand alone module and serve as mixin.
design automatic structure recognize strategy to better support different models (hope can even support multi-modal and diffusion models).
speed up model packing after quantization.
support kernel fusion to more models to futher speed up inference.
@PanQiWei Can you rejoin @fxmarty and be more active in code reviews? Feels like the project needs at least 2 active maintainers to keep it up to speed and not overload any single person.
Hi everyone, long time no see! Start from this week, I will use about 4 weeks to gradually push AutoGPTQ to v1.0.0, in the mean time, there will be 2~3 minor version released as optimization or feature preview so that you can experience those updates as soon as they are finished and I can hear more community voices and get more feedbacks.
My vision is at the time of v1.0.0 is released, AutoGPTQ can serve as an automatic, extendable and flexible quantization backend for all language models that are written by Pytorch.
I open this issue to list all the things will be done (optimizations, new features, bug fixes, etc) and record the development progress.(so contents below will be updated frequently)
Feel free to comment in the thread to give your opinions and suggestions!
Optimizations
New Features
Bug Fixes
The text was updated successfully, but these errors were encountered: