Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't work with llama3.1 8b #38

Open
ShuaiShao93 opened this issue Dec 21, 2024 · 3 comments
Open

Doesn't work with llama3.1 8b #38

ShuaiShao93 opened this issue Dec 21, 2024 · 3 comments

Comments

@ShuaiShao93
Copy link

When I ran

python -m deepcompressor.app.llm.ptq examples/llm/configs/qoq-g128.yaml --model-name ~/Meta-Llama-3.1-8B-Instruct

Got error

Traceback (most recent call last):
  File "/home/ss/deepcompressor/deepcompressor/app/llm/ptq.py", line 384, in <module>
    main(config, logging_level=tools.logging.DEBUG)
  File "/home/ss/deepcompressor/deepcompressor/app/llm/ptq.py", line 352, in main
    model = ptq(
            ^^^^
  File "/home/ss/deepcompressor/deepcompressor/app/llm/ptq.py", line 154, in ptq
    reorder_llm(model, config, tokenizer, reorder_cache=reorder_cache)
  File "/home/ss/.cache/pypoetry/virtualenvs/deepcompressor-_YvnLDOG-py3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ss/deepcompressor/deepcompressor/app/llm/quant/reorder.py", line 268, in reorder_llm
    if "residual" not in reorder_cache and not config.reorder.dynamic and config.reorder.is_enabled_for("residual"):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ss/.cache/pypoetry/virtualenvs/deepcompressor-_YvnLDOG-py3.12/lib/python3.12/site-packages/torch/_tensor.py", line 1114, in __contains__
    raise RuntimeError(
RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.
@bobboli
Copy link
Contributor

bobboli commented Jan 3, 2025

Hi,
I can't reproduce the error. Could you try to clean up the cache runs/llm/llama-3.1/* and retry:

python -m deepcompressor.app.llm.ptq configs/qoq-g128.yaml --model-name llama-3.1-8b-instruct --model-path /path/to/Llama-3.1-8B-Instruct/

@ShuaiShao93
Copy link
Author

Hi, I can't reproduce the error. Could you try to clean up the cache runs/llm/llama-3.1/* and retry:

python -m deepcompressor.app.llm.ptq configs/qoq-g128.yaml --model-name llama-3.1-8b-instruct --model-path /path/to/Llama-3.1-8B-Instruct/

Thanks, this works!

However, when I further convert ckpt with trtllm 0.15.0

export TRTLLM_DISABLE_UNIFIED_CONVERTER=1
python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Meta-Llama-3.1-8B-Instruct --output_dir ./tllm_8b_checkpoint_1gpu_w4a8  --dtype float16 --quant_ckpt_path ./Llama-3.1-8B-QServe-g128 --use_qserve --per_group

It fails with

[01/03/2025-23:43:00] [TRT-LLM] [I] Processing weights in layer: 0
Traceback (most recent call last):
  File "/home/ss/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 555, in <module>
    main()
  File "/home/ss/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 547, in main
    convert_and_save_hf(args)
  File "/home/ss/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 488, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/home/ss/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 495, in execute
    f(args, rank)
  File "/home/ss/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 472, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 419, in from_hugging_face
    weights = load_weights_from_lmquant(quant_ckpt_path, config)
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/llama/convert.py", line 2072, in load_weights_from_lmquant
    v = [
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/llama/convert.py", line 2073, in <listcomp>
    load(f'{prefix}.self_attn.{comp}_proj.{suffix}')
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/llama/convert.py", line 1951, in load
    v = quant_params[key]
KeyError: 'model.layers.0.self_attn.q_proj.weight.zero'

Is this related?

@ShuaiShao93
Copy link
Author

Ah I had to upgrade trtllm to 0.16.0.

Now it failed with

Traceback (most recent call last):
  File "/opt/conda/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 627, in main
    parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 425, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 390, in build_and_save
    engine = build_model(build_config,
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 383, in build_model
    return build(model, build_config)
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 1237, in build
    model(**inputs)
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/module.py", line 52, in __call__
    output = self.forward(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 988, in forward
    hidden_states = self.transformer.forward(**kwargs)
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 311, in forward
    hidden_states = self.layers.forward(
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 543, in forward
    hidden_states = layer(
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/module.py", line 52, in __call__
    output = self.forward(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 163, in forward
    attention_output = self.attention(
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/module.py", line 52, in __call__
    output = self.forward(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/quantization/layers.py", line 2793, in forward
    assert lora_layer_params is None, "lora is not supported on SmoothQuantAttention now"
AssertionError: lora is not supported on SmoothQuantAttention now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants