Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Mistral Nemo #1985

Open
hongjunchoi92 opened this issue Jul 18, 2024 · 9 comments
Open

Support for Mistral Nemo #1985

hongjunchoi92 opened this issue Jul 18, 2024 · 9 comments
Assignees
Labels
feature request New feature or request new model

Comments

@hongjunchoi92
Copy link

https://mistral.ai/news/mistral-nemo/

Would Mistral Nemo Models be supported in Tensorrt-LLM in near future?

@byshiue byshiue added the feature request New feature or request label Jul 22, 2024
@byshiue byshiue removed the feature request New feature or request label Jul 22, 2024
@fan-niu
Copy link

fan-niu commented Jul 22, 2024

@byshiue Looking forward to any progress

@hongjunchoi92
Copy link
Author

hongjunchoi92 commented Jul 22, 2024

Hello @byshiue

It seems like Mistral 7B model is already supported

BASE_MISTRAL_MODEL=komt-mistral-7b-v1/

If the model architecture is the same, would that mean that we can also use existing scripts / code for Mistral-Nemo as well?
Or would the model architecture difference require new code changes?

Would be happy to try out with existing scripts. Please let us know.

cc: @AdamzNV @ncomly-nvidia as well.

@fan-niu
Copy link

fan-niu commented Jul 23, 2024

@byshiue @AdamzNV @ncomly-nvidia Can you help solve this problem? Yesterday I tried to directly use the mistral method to convert and compile the mistral nemo 12b engine, but an error occurred during the conversion phase. I use the smoothquant conversion method. The following is the conversion script and error log. CC: @hongjunchoi92

Convert script:
tensorrtllm commit : ab49b93 (use this commit for llama3 + rope scaling)
tensorrtllm backend commit: 97feb8f
python3 ./tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py --model_dir ${model_path} --output_dir ${convert_model_path} --dtype float16 --smoothquant 0.5 --per_token --per_channel --tp_size 1

Error log:
[TensorRT-LLM] TensorRT-LLM version: 0.11.0 0.11.0 Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s] Traceback (most recent call last): File "/code/./tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 461, in <module> main() File "/code/./tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 453, in main convert_and_save_hf(args) File "/code/./tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 339, in convert_and_save_hf LLaMAForCausalLM.quantize( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 411, in quantize convert.quantize(hf_model_dir, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1226, in quantize hf_model = AutoModelForCausalLM.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3838, in from_pretrained ) = cls._load_pretrained_model( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4298, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 895, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 362, in set_module_tensor_to_device raise ValueError( ValueError: Trying to set a tensor of shape torch.Size([1024, 5120]) in "weight" (which has shape torch.Size([1280, 5120])), this look incorrect. ][TensorRT-LLM] TensorRT-LLM version: 0.11.0

@eleapttn
Copy link

eleapttn commented Aug 1, 2024

Hello everyone!

Same issue here. Any news about the integration of this model?
Is it related to transformers version and this PR? huggingface/transformers#32050

The logs are the following (pp_size and tp_size at 1)

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 465, in load
    param.value = weights[name]
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/parameter.py", line 133, in value
    assert v.shape == self.shape, \
AssertionError: The value updated is not the same shape as the original. Updated: (6144, 5120), original: (7680, 5120)

@QiJune
Copy link
Collaborator

QiJune commented Aug 4, 2024

@nv-guomingz Could you please take a look? Thanks

@QiJune QiJune added the feature request New feature or request label Aug 4, 2024
@nv-guomingz
Copy link
Collaborator

Hi @eleapttn ,we've fixed this issue internally and corresponding fixing will be pushed to main branch in coming weekly update.

@eleapttn
Copy link

eleapttn commented Aug 5, 2024

Hi @QiJune, @nv-guomingz,
Thanks a lot for your quick reply. I can't wait to test it!

@MatthewPeyrard
Copy link

This is working in 0.12. Good job!
Does anyone have any advice or documentation that can help to optimize engine builds for Mistral Nemo?
I am currently experimenting with fp8 quants on an H100 and finding them to be about 1/3 the speed of a similar quant of Llama 3.1 8B. I expected Nemo to be a bit slower, but not that much slower.

@AdamzNV
Copy link
Collaborator

AdamzNV commented Oct 31, 2024

As more and more new models enter the market, we have prepared comprehensive instructions for TRT-LLM developers on adapting to new models of interest. We encourage our community developers to expand the range of supported models, fostering an open ecosystem with rapid iterations.

Please try following these instructions and let us know if you encounter any issues during the adaptation process. We greatly appreciate your dedication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request new model
Projects
None yet
Development

No branches or pull requests

9 participants