Fix check_share_embedding #2232

lkm2835 · 2024-09-17T16:01:25Z

Related to #2226 "use_embedding_sharing" option not working for llama model.

Reproduce

I used open huggingface model HuggingFaceTB/SmolLM-1.7B. (tie_word_embedding=True)

python /app/tensorrt_llm/examples/llama/convert_checkpoint.py \
                            --model_dir ${MODEL_DIR} \
                            --output_dir ${MODEL_DIR}/tensorrt/${TP_SIZE}-gpu \
                            --tp_size 1 \
                            --use_embedding_sharing \
                            --load_model_on_cpu \
                            --dtype float16

Error Message

[TensorRT-LLM] TensorRT-LLM version: 0.14.0.dev2024091000
0.14.0.dev2024091000
[09/17/2024-15:30:41] [TRT-LLM] [I] Loading weights from Huggingface Llama safetensors...
[09/17/2024-15:30:44] [TRT-LLM] [I] Weights loaded. Total time: 00:00:02
Traceback (most recent call last):
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 497, in <module>
    main()
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 489, in main
    convert_and_save_hf(args)
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 431, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 438, in execute
    f(args, rank)
  File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 417, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 373, in from_hugging_face
    check_share_embedding(weights, config)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 1290, in check_share_embedding
    if (weights["lm_head.weight"] -
TypeError: unsupported operand type(s) for -: 'NoneType' and 'Tensor'

The error occurs because if tie_word_embedding is True, model.safetensors doesn't have lm_head.weight.
https://huggingface.co/HuggingFaceTB/SmolLM-1.7B/blob/main/model.safetensors.index.json

Barry-Delaney · 2024-09-18T12:57:52Z

Hi @lkm2835, thanks for the PR!
We are going to unify the conversion scripts with the ModelWeightsLoader, and this PR can help with the legacy path to convert checkpoints with lm_head and without vocab_embedding.
We will merge it first, and thanks for your contribution!

Fix check_share_embedding

df63c09

lkm2835 mentioned this pull request Sep 17, 2024

"use_embedding_sharing" option not working for llama model. #2226

Closed

4 tasks

Barry-Delaney self-assigned this Sep 18, 2024

Barry-Delaney added the Merged label Sep 20, 2024

kaiyux mentioned this pull request Sep 24, 2024

Update TensorRT-LLM #2253

Merged

lkm2835 closed this Sep 26, 2024

lkm2835 deleted the fix branch October 27, 2024 09:45

kaiyux mentioned this pull request Nov 1, 2024

Update TensorRT-LLM v0.14.0 #2401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix check_share_embedding #2232

Fix check_share_embedding #2232

lkm2835 commented Sep 17, 2024

Barry-Delaney commented Sep 18, 2024

Fix check_share_embedding #2232

Fix check_share_embedding #2232

Conversation

lkm2835 commented Sep 17, 2024

Reproduce

Barry-Delaney commented Sep 18, 2024