Skip to content

Conversation

@ibsidorenko
Copy link

This is refactoring to follow new rule: "keep mlc-llm and tvm as clean as possible".
This commit removes several helper functions that were moved to slm/interface part.

This is refactoring to follow new rule: "keep mlc-llm and tvm as
clean as possible".
This commit removes several helper functions that were moved to
slm/interface part.
@ibsidorenko ibsidorenko requested a review from sunggg April 30, 2024 13:54
@ponytaill
Copy link

Hi, I tested it and I want to ask if the Smoothquant quantization really works? I run mlc_llm convert_weight ./dist/llama2 --quantization smq_q8i8f16_0 -o dist/$llama2-smq-MLC/ and get unquantized model. Weird.

@ibsidorenko
Copy link
Author

Hi, I tested it and I want to ask if the Smoothquant quantization really works? I run mlc_llm convert_weight ./dist/llama2 --quantization smq_q8i8f16_0 -o dist/$llama2-smq-MLC/ and get unquantized model. Weird.

Hi @ponytaill ! It works, but not through the mlc-llm.

@ponytaill
Copy link

Hi, I tested it and I want to ask if the Smoothquant quantization really works? I run mlc_llm convert_weight ./dist/llama2 --quantization smq_q8i8f16_0 -o dist/$llama2-smq-MLC/ and get unquantized model. Weird.

Hi @ponytaill ! It works, but not through the mlc-llm.
Hi, thanks for your reply. Could you explain a little bit about how to use the code about Smoothquant without going through mlc-llm? I just started learning mlc-llm.

@ibsidorenko
Copy link
Author

ibsidorenko commented May 28, 2024

Hi, I tested it and I want to ask if the Smoothquant quantization really works? I run mlc_llm convert_weight ./dist/llama2 --quantization smq_q8i8f16_0 -o dist/$llama2-smq-MLC/ and get unquantized model. Weird.

Hi @ponytaill ! It works, but not through the mlc-llm.
Hi, thanks for your reply. Could you explain a little bit about how to use the code about Smoothquant without going through mlc-llm? I just started learning mlc-llm.

You need access to the private repo. If you don't have access, there's no way to run SmoothQuant in this case

@sunggg sunggg merged commit 0739fab into mlc-serve-v0.2.0 Jun 14, 2024
@ibsidorenko ibsidorenko deleted the ibsidorenko/smq-refactoring branch June 14, 2024 05:57
Lunderberg pushed a commit to Lunderberg/mlc-llm that referenced this pull request Jul 25, 2024
This PR refactors the mlc-chat into a formal package
Still need some followup TODOs on cleaning up
the rest and gradio API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants