-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen-7B-Chat模型按照Quto-GPTQ示例进行4bit量化,报错:ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)[BUG] <title> #646
Comments
同报错;内容 |
您模型可能加载到内存里了,如下方案供参考
|
这边测试单卡cuda:0,但因为显卡不够所以放弃了;这能多显张卡运行吗 |
如果我只用cpu量化,同意报错 AttributeError: 'QWenLMHeadModel' object has no attribute 'quantize' |
@lonngxiang 还是要用GPU的,但我们没试过多卡量化。您多卡加载后(device_map='auto'),如果显存足够,应该不会有在内存里的参数(如果有的话,可以打印下model.hf_device_map,看看哪些到内存上了)。 |
代码:
|
按照提供的两种解决方案,我这也是一样的报错,“RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!” |
model.hf_device_map 加载模型设置device_map="cuda:1",为什么报错依然是在0卡和cpu上呢?“RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!” |
修改modeling_qwen.py ` |
也要的问题,自己量化训练后的模型报错 了 |
在autogptq 的官方示例下面
测试了这个方法,对我有效果 |
改完可以量化了,加载量化后的权重推理又出现问题了,FileNotFoundError: Could not find model in ./lora_finetune/qwen_7b_chat_q,应该是少文件 |
apply_rotary_emb这里报错,应该是AutoGPTQ会自己在device迁移tensor,但实现覆盖的不全,导致有些tensor没被迁移。参见AutoGPTQ/AutoGPTQ#370 (comment)。
参考以下回复哈 校准用的数据影响不能忽略的,需要跟应用场景同分布,GPTQ需要根据校准集最小化量化误差。 |
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
运行到model.quantize(examples),报错:ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?),请问会是什么原因?
期望行为 | Expected Behavior
No response
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
备注 | Anything else?
No response
The text was updated successfully, but these errors were encountered: