Skip to content

Commit 8beed7a

Browse files
authored
[REFACTOR] rename mlc_chat => mlc_llm (#1932)
This PR renames the mlc_chat pckage to the mlc_llm package now that this is the new official flow. We also update the necessary locations that might touch the package.
1 parent 4290a05 commit 8beed7a

File tree

231 files changed

+754
-788
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

231 files changed

+754
-788
lines changed

ci/task/build_clean.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@ set -x
88
rm -rf ${WORKSPACE_CWD}/build/ \
99
${WORKSPACE_CWD}/python/dist/ \
1010
${WORKSPACE_CWD}/python/build/ \
11-
${WORKSPACE_CWD}/python/mlc_chat.egg-info
11+
${WORKSPACE_CWD}/python/mlc_llm.egg-info

cpp/llm_chat.cc

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,8 +127,7 @@ struct FunctionTable {
127127
device_ids[i] = i;
128128
}
129129
this->use_disco = true;
130-
this->sess =
131-
Session::ProcessSession(num_shards, f_create_process_pool, "mlc_chat.cli.worker");
130+
this->sess = Session::ProcessSession(num_shards, f_create_process_pool, "mlc_llm.cli.worker");
132131
this->sess->InitCCL(ccl, ShapeTuple(device_ids));
133132
this->disco_mod = sess->CallPacked(sess->GetGlobalFunc("runtime.disco.load_vm_module"),
134133
lib_path, null_device);

cpp/serve/function_table.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ void FunctionTable::Init(TVMArgValue reload_lib, Device device, picojson::object
8585
device_ids[i] = i;
8686
}
8787
this->use_disco = true;
88-
this->sess = Session::ProcessSession(num_shards, f_create_process_pool, "mlc_chat.cli.worker");
88+
this->sess = Session::ProcessSession(num_shards, f_create_process_pool, "mlc_llm.cli.worker");
8989
this->sess->InitCCL(ccl, ShapeTuple(device_ids));
9090
this->disco_mod = sess->CallPacked(sess->GetGlobalFunc("runtime.disco.load_vm_module"),
9191
lib_path, null_device);

docs/compilation/compile_models.rst

Lines changed: 112 additions & 112 deletions
Large diffs are not rendered by default.

docs/compilation/convert_weights.rst

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ To run a model with MLC LLM in any platform, you need:
88
1. **Model weights** converted to MLC format (e.g. `RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC <https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/tree/main>`_.)
99
2. **Model library** that comprises the inference logic (see repo `binary-mlc-llm-libs <https://github.com/mlc-ai/binary-mlc-llm-libs>`__).
1010

11-
In many cases, we only need to convert weights and reuse existing model library.
12-
This page demonstrates adding a model variant with ``mlc_chat convert_weight``, which
11+
In many cases, we only need to convert weights and reuse existing model library.
12+
This page demonstrates adding a model variant with ``mlc_llm convert_weight``, which
1313
takes a hugginface model as input and converts/quantizes into MLC-compatible weights.
1414

1515
Specifically, we add RedPjama-INCITE-**Instruct**-3B-v1, while MLC already
@@ -23,7 +23,7 @@ This can be extended to, e.g.:
2323
.. note::
2424
Before you proceed, make sure you followed :ref:`install-tvm-unity`, a required
2525
backend to compile models with MLC LLM.
26-
26+
2727
Please also follow the instructions in :ref:`deploy-cli` / :ref:`deploy-python` to obtain
2828
the CLI app / Python API that can be used to chat with the compiled model.
2929
Finally, we strongly recommend you to read :ref:`project-overview` first to get
@@ -38,20 +38,20 @@ This can be extended to, e.g.:
3838
0. Verify installation
3939
----------------------
4040

41-
**Step 1. Verify mlc_chat**
41+
**Step 1. Verify mlc_llm**
4242

43-
We use the python package ``mlc_chat`` to compile models. This can be installed by
43+
We use the python package ``mlc_llm`` to compile models. This can be installed by
4444
following :ref:`install-mlc-packages`, either by building from source, or by
45-
installing the prebuilt package. Verify ``mlc_chat`` installation in command line via:
45+
installing the prebuilt package. Verify ``mlc_llm`` installation in command line via:
4646

4747
.. code:: bash
4848
49-
$ mlc_chat --help
49+
$ mlc_llm --help
5050
# You should see help information with this line
5151
usage: MLC LLM Command Line Interface. [-h] {compile,convert_weight,gen_config}
5252
5353
.. note::
54-
If it runs into error ``command not found: mlc_chat``, try ``python -m mlc_chat --help``.
54+
If it runs into error ``command not found: mlc_llm``, try ``python -m mlc_llm --help``.
5555

5656
**Step 2. Verify TVM**
5757

@@ -80,7 +80,7 @@ for specification of ``convert_weight``.
8080
git clone https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1
8181
cd ../..
8282
# Convert weight
83-
mlc_chat convert_weight ./dist/models/RedPajama-INCITE-Instruct-3B-v1/ \
83+
mlc_llm convert_weight ./dist/models/RedPajama-INCITE-Instruct-3B-v1/ \
8484
--quantization q4f16_1 \
8585
-o dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC
8686
@@ -89,20 +89,20 @@ for specification of ``convert_weight``.
8989
2. Generate MLC Chat Config
9090
---------------------------
9191

92-
Use ``mlc_chat gen_config`` to generate ``mlc-chat-config.json`` and process tokenizers.
92+
Use ``mlc_llm gen_config`` to generate ``mlc-chat-config.json`` and process tokenizers.
9393
See :ref:`compile-command-specification` for specification of ``gen_config``.
9494

9595
.. code:: shell
9696
97-
mlc_chat gen_config ./dist/models/RedPajama-INCITE-Instruct-3B-v1/ \
97+
mlc_llm gen_config ./dist/models/RedPajama-INCITE-Instruct-3B-v1/ \
9898
--quantization q4f16_1 --conv-template redpajama_chat \
9999
-o dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC/
100100
101101
102102
.. note::
103103
The file ``mlc-chat-config.json`` is crucial in both model compilation
104104
and runtime chatting. Here we only care about the latter case.
105-
105+
106106
You can **optionally** customize
107107
``dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC/mlc-chat-config.json`` (checkout :ref:`configure-mlc-chat-json` for more detailed instructions).
108108
You can also simply use the default configuration.
@@ -111,7 +111,7 @@ See :ref:`compile-command-specification` for specification of ``gen_config``.
111111
contains a full list of conversation templates that MLC provides. If the model you are adding
112112
requires a new conversation template, you would need to add your own.
113113
Follow `this PR <https://github.com/mlc-ai/mlc-llm/pull/1402>`__ as an example. However,
114-
adding your own template would require you :ref:`build mlc_chat from source <mlcchat_build_from_source>` in order for it
114+
adding your own template would require you :ref:`build mlc_llm from source <mlcchat_build_from_source>` in order for it
115115
to be recognized by the runtime.
116116

117117
By now, you should have the following files.
@@ -132,7 +132,7 @@ By now, you should have the following files.
132132
(Optional) 3. Upload weights to HF
133133
----------------------------------
134134

135-
Optionally, you can upload what we have to huggingface.
135+
Optionally, you can upload what we have to huggingface.
136136

137137
.. code:: shell
138138
@@ -175,7 +175,7 @@ Running the distributed models are similar to running prebuilt model weights and
175175
176176
# Run the model in Python; note that we reuse `-Chat` model library
177177
python
178-
>>> from mlc_chat import ChatModule
178+
>>> from mlc_llm import ChatModule
179179
>>> cm = ChatModule(model="dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC", \
180180
model_lib_path="dist/prebuilt_libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-cuda.so") # Adjust based on backend
181181
>>> cm.generate("hi")

docs/deploy/android.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ Prerequisite
3737
3838
**JDK**, such as OpenJDK >= 17, to compile Java bindings of TVM Unity runtime. It could be installed via Homebrew on macOS, apt on Ubuntu or other package managers. Set up the following environment variable:
3939

40-
- ``JAVA_HOME`` so that Java is available in ``$JAVA_HOME/bin/java``.
41-
40+
- ``JAVA_HOME`` so that Java is available in ``$JAVA_HOME/bin/java``.
41+
4242
Please ensure that the JDK versions for Android Studio and JAVA_HOME are the same. We recommended setting the `JAVA_HOME` to the JDK bundled with Android Studio. e.g. `export JAVA_HOME=/Applications/Android\ Studio.app/Contents/jbr/Contents/Home` for macOS.
4343

4444
**TVM Unity runtime** is placed under `3rdparty/tvm <https://github.com/mlc-ai/mlc-llm/tree/main/3rdparty>`__ in MLC LLM, so there is no need to install anything extra. Set up the following environment variable:
@@ -92,14 +92,14 @@ To deploy models on Android with reasonable performance, one has to cross-compil
9292
.. code-block:: bash
9393
9494
# convert weights
95-
mlc_chat convert_weight ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION -o dist/$MODEL_NAME-$QUANTIZATION-MLC/
95+
mlc_llm convert_weight ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION -o dist/$MODEL_NAME-$QUANTIZATION-MLC/
9696
9797
# create mlc-chat-config.json
98-
mlc_chat gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
98+
mlc_llm gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
9999
--conv-template llama-2 --context-window-size 768 -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/
100100
101101
# 2. compile: compile model library with specification in mlc-chat-config.json
102-
mlc_chat compile ./dist/${MODEL_NAME}-${QUANTIZATION}-MLC/mlc-chat-config.json \
102+
mlc_llm compile ./dist/${MODEL_NAME}-${QUANTIZATION}-MLC/mlc-chat-config.json \
103103
--device android -o ./dist/${MODEL_NAME}-${QUANTIZATION}-MLC/${MODEL_NAME}-${QUANTIZATION}-android.tar
104104
105105
This generates the directory ``./dist/$MODEL_NAME-$QUANTIZATION-MLC`` which contains the necessary components to run the model, as explained below.
@@ -131,19 +131,19 @@ The source code for MLC LLM is available under ``android/``, including scripts t
131131
(Required) Unique local identifier to identify the model.
132132

133133
``model_lib``
134-
(Required) Matches the system-lib-prefix, generally set during ``mlc_chat compile`` which can be specified using
135-
``--system-lib-prefix`` argument. By default, it is set to ``"${model_type}_${quantization}"`` e.g. ``gpt_neox_q4f16_1`` for the RedPajama-INCITE-Chat-3B-v1 model. If the ``--system-lib-prefix`` argument is manually specified during ``mlc_chat compile``, the ``model_lib`` field should be updated accordingly.
134+
(Required) Matches the system-lib-prefix, generally set during ``mlc_llm compile`` which can be specified using
135+
``--system-lib-prefix`` argument. By default, it is set to ``"${model_type}_${quantization}"`` e.g. ``gpt_neox_q4f16_1`` for the RedPajama-INCITE-Chat-3B-v1 model. If the ``--system-lib-prefix`` argument is manually specified during ``mlc_llm compile``, the ``model_lib`` field should be updated accordingly.
136136

137137
``estimated_vram_bytes``
138138
(Optional) Estimated requirements of VRAM to run the model.
139-
139+
140140
To change the configuration, edit ``app-config.json``:
141141

142142
.. code-block:: bash
143143
144144
vim ./src/main/assets/app-config.json
145145
146-
Then bundle the android library ``${MODEL_NAME}-${QUANTIZATION}-android.tar`` compiled from ``mlc_chat compile`` in the previous steps, with TVM Unity's Java runtime by running the commands below:
146+
Then bundle the android library ``${MODEL_NAME}-${QUANTIZATION}-android.tar`` compiled from ``mlc_llm compile`` in the previous steps, with TVM Unity's Java runtime by running the commands below:
147147

148148
.. code-block:: bash
149149

docs/deploy/cli.rst

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ To use other GPU runtimes, e.g. CUDA, please instead :ref:`build it from source
1919
.. code:: shell
2020
2121
conda activate your-environment
22-
python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-chat-nightly mlc-ai-nightly
23-
mlc_chat chat -h
22+
python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly mlc-ai-nightly
23+
mlc_llm chat -h
2424
2525
.. note::
2626
The prebuilt package supports **Metal** on macOS and **Vulkan** on Linux and Windows. It is possible to use other GPU runtimes such as **CUDA** by compiling MLCChat CLI from the source.
@@ -29,7 +29,7 @@ To use other GPU runtimes, e.g. CUDA, please instead :ref:`build it from source
2929
Option 2. Build MLC Runtime from Source
3030
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3131

32-
We also provide options to build mlc runtime libraries and ``mlc_chat`` from source.
32+
We also provide options to build mlc runtime libraries and ``mlc_llm`` from source.
3333
This step is useful if the prebuilt is unavailable on your platform, or if you would like to build a runtime
3434
that supports other GPU runtime than the prebuilt version. We can build a customized version
3535
of mlc chat runtime. You only need to do this if you choose not to use the prebuilt.
@@ -44,7 +44,7 @@ Then please follow the instructions in :ref:`mlcchat_build_from_source` to build
4444
Run Models through MLCChat CLI
4545
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4646

47-
Once ``mlc_chat`` is installed, you are able to run any MLC-compiled model on the command line.
47+
Once ``mlc_llm`` is installed, you are able to run any MLC-compiled model on the command line.
4848

4949
To run a model with MLC LLM in any platform, you can either:
5050

@@ -53,14 +53,14 @@ To run a model with MLC LLM in any platform, you can either:
5353

5454
**Option 1: Use model prebuilts**
5555

56-
To run ``mlc_chat``, you can specify the Huggingface MLC prebuilt model repo path with the prefix ``HF://``.
56+
To run ``mlc_llm``, you can specify the Huggingface MLC prebuilt model repo path with the prefix ``HF://``.
5757
For example, to run the MLC Llama 2 7B Q4F16_1 model (`Repo link <https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC>`_),
5858
simply use ``HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC``. The model weights and library will be downloaded
5959
automatically from Huggingface.
6060

6161
.. code:: shell
6262
63-
mlc_chat chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC --device "cuda:0" --overrides context_window_size=1024
63+
mlc_llm chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC --device "cuda:0" --overrides context_window_size=1024
6464
6565
.. code:: shell
6666
@@ -75,10 +75,10 @@ automatically from Huggingface.
7575
Multi-line input: Use escape+enter to start a new line.
7676
7777
[INST]: What's the meaning of life
78-
[/INST]:
79-
Ah, a question that has puzzled philosophers and theologians for centuries! The meaning
80-
of life is a deeply personal and subjective topic, and there are many different
81-
perspectives on what it might be. However, here are some possible answers that have been
78+
[/INST]:
79+
Ah, a question that has puzzled philosophers and theologians for centuries! The meaning
80+
of life is a deeply personal and subjective topic, and there are many different
81+
perspectives on what it might be. However, here are some possible answers that have been
8282
proposed by various thinkers and cultures:
8383
...
8484
@@ -91,14 +91,14 @@ For models other than the prebuilt ones we provided:
9191
follow :ref:`convert-weights-via-MLC` to convert the weights and reuse existing model libraries.
9292
2. Otherwise, follow :ref:`compile-model-libraries` to compile both the model library and weights.
9393
94-
Once you have the model locally compiled with a model library and model weights, to run ``mlc_chat``, simply
94+
Once you have the model locally compiled with a model library and model weights, to run ``mlc_llm``, simply
9595
9696
- Specify the path to ``mlc-chat-config.json`` and the converted model weights to ``--model``
9797
- Specify the path to the compiled model library (e.g. a .so file) to ``--model-lib-path``
9898
9999
.. code:: shell
100100
101-
mlc_chat chat dist/Llama-2-7b-chat-hf-q4f16_1-MLC \
101+
mlc_llm chat dist/Llama-2-7b-chat-hf-q4f16_1-MLC \
102102
--device "cuda:0" --overrides context_window_size=1024 \
103103
--model-lib-path dist/prebuilt_libs/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f16_1-vulkan.so
104104
# CUDA on Linux: dist/prebuilt_libs/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f16_1-cuda.so

0 commit comments

Comments
 (0)