[Feature] Using the w8a8 model for inference, it should be automatically routed to the pytorch backend without adding the backend parameter. #2595

zhulinJulia24 · 2024-10-12T03:32:41Z

Motivation

Currently it cannot be automatically routed to the pytorch backend without adding the backend parameter.
Need routed to the pytorch backend automatically.

reproduce step:

do lite
lmdeploy lite smooth_quant /nvme/qa_test_models/internlm/internlm2_5-7b-chat --work-dir /nvme/qa_test_models/internlm/internlm2_5-7b-chat-inner-w8a8 --batch-size 32
chat with this model
lmdeploy chat /nvme/qa_test_models/internlm/internlm2_5-7b-chat-inner-w8a8 --session-len 4096 --tp 1

Error occurs:

  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/__w/lmdeploy/lmdeploy/autotest/utils/pipeline_chat.py", line 52, in run_pipeline_chat_test
    pipe = pipeline(hf_path, backend_config=backend_config)
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/api.py", line 81, in pipeline
    return pipeline_class(model_path,
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 158, in __init__
    self._build_turbomind(model_path=model_path,
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 197, in _build_turbomind
    self.engine = tm.TurboMind.from_pretrained(
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 302, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 112, in __init__
    self.model_comm = self._from_hf(model_source=model_source,
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 214, in _from_hf
    tm_model = get_tm_model(model_path, self.model_name,
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/turbomind/deploy/converter.py", line 241, in get_tm_model
    assert 0, f'unsupported quant_config: {quant_config}'
AssertionError: unsupported quant_config: {'quant_method': 'smooth_quant'}

Related resources

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

lvhan028 self-assigned this Oct 14, 2024

zhulinJulia24 mentioned this issue Dec 25, 2024

Fallback to pytorch engine when the model is quantized by smooth quant #2953

Merged

zhulinJulia24 closed this as completed Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Using the w8a8 model for inference, it should be automatically routed to the pytorch backend without adding the backend parameter. #2595

[Feature] Using the w8a8 model for inference, it should be automatically routed to the pytorch backend without adding the backend parameter. #2595

zhulinJulia24 commented Oct 12, 2024

[Feature] Using the w8a8 model for inference, it should be automatically routed to the pytorch backend without adding the backend parameter. #2595

[Feature] Using the w8a8 model for inference, it should be automatically routed to the pytorch backend without adding the backend parameter. #2595

Comments

zhulinJulia24 commented Oct 12, 2024

Motivation

Related resources

Additional context