-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM when use InternVL2_5-1B-MPO #3143
Comments
Can you share the following information?
|
I will get the error when I run Below is the code, I run |
Can you help open INFO log level? Let's check what the log indicates from lmdeploy import pipeline, TurbomindEngineConfig, PytorchEngineConfig
pipe = pipeline("./InternVL2_5-1B-MPO-4bit/", backend_config=TurbomindEngineConfig(model_format="awq",session_len=2048), log_level='INFO') |
What's the mem size of jetson orin? |
GPU memory size is 16GB, and I uploaded the log file. log.txt |
I followed the installation guide to build mldeploy (0.7.0.post3) from source.
Inference using the PyTorch engine works fine.
However, after quantizing the model to 4-bit using AWQ, I encountered an OOM error when loading the model with the TurboMind engine.
I try to set "session_len=2048" in TurbomindEngineConfig.
The text was updated successfully, but these errors were encountered: