You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR renames the mlc_chat pckage to the mlc_llm package
now that this is the new official flow. We also update the necessary
locations that might touch the package.
Copy file name to clipboardExpand all lines: docs/compilation/convert_weights.rst
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,8 +8,8 @@ To run a model with MLC LLM in any platform, you need:
8
8
1. **Model weights** converted to MLC format (e.g. `RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC <https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/tree/main>`_.)
9
9
2. **Model library** that comprises the inference logic (see repo `binary-mlc-llm-libs <https://github.com/mlc-ai/binary-mlc-llm-libs>`__).
10
10
11
-
In many cases, we only need to convert weights and reuse existing model library.
12
-
This page demonstrates adding a model variant with ``mlc_chat convert_weight``, which
11
+
In many cases, we only need to convert weights and reuse existing model library.
12
+
This page demonstrates adding a model variant with ``mlc_llm convert_weight``, which
13
13
takes a hugginface model as input and converts/quantizes into MLC-compatible weights.
14
14
15
15
Specifically, we add RedPjama-INCITE-**Instruct**-3B-v1, while MLC already
@@ -23,7 +23,7 @@ This can be extended to, e.g.:
23
23
.. note::
24
24
Before you proceed, make sure you followed :ref:`install-tvm-unity`, a required
25
25
backend to compile models with MLC LLM.
26
-
26
+
27
27
Please also follow the instructions in :ref:`deploy-cli` / :ref:`deploy-python` to obtain
28
28
the CLI app / Python API that can be used to chat with the compiled model.
29
29
Finally, we strongly recommend you to read :ref:`project-overview` first to get
@@ -38,20 +38,20 @@ This can be extended to, e.g.:
38
38
0. Verify installation
39
39
----------------------
40
40
41
-
**Step 1. Verify mlc_chat**
41
+
**Step 1. Verify mlc_llm**
42
42
43
-
We use the python package ``mlc_chat`` to compile models. This can be installed by
43
+
We use the python package ``mlc_llm`` to compile models. This can be installed by
44
44
following :ref:`install-mlc-packages`, either by building from source, or by
45
-
installing the prebuilt package. Verify ``mlc_chat`` installation in command line via:
45
+
installing the prebuilt package. Verify ``mlc_llm`` installation in command line via:
46
46
47
47
.. code:: bash
48
48
49
-
$ mlc_chat --help
49
+
$ mlc_llm --help
50
50
# You should see help information with this line
51
51
usage: MLC LLM Command Line Interface. [-h] {compile,convert_weight,gen_config}
52
52
53
53
.. note::
54
-
If it runs into error ``command not found: mlc_chat``, try ``python -m mlc_chat --help``.
54
+
If it runs into error ``command not found: mlc_llm``, try ``python -m mlc_llm --help``.
55
55
56
56
**Step 2. Verify TVM**
57
57
@@ -80,7 +80,7 @@ for specification of ``convert_weight``.
Copy file name to clipboardExpand all lines: docs/deploy/android.rst
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,8 +37,8 @@ Prerequisite
37
37
38
38
**JDK**, such as OpenJDK >= 17, to compile Java bindings of TVM Unity runtime. It could be installed via Homebrew on macOS, apt on Ubuntu or other package managers. Set up the following environment variable:
39
39
40
-
- ``JAVA_HOME`` so that Java is available in ``$JAVA_HOME/bin/java``.
41
-
40
+
- ``JAVA_HOME`` so that Java is available in ``$JAVA_HOME/bin/java``.
41
+
42
42
Please ensure that the JDK versions for Android Studio and JAVA_HOME are the same. We recommended setting the `JAVA_HOME` to the JDK bundled with Android Studio. e.g. `export JAVA_HOME=/Applications/Android\ Studio.app/Contents/jbr/Contents/Home` for macOS.
43
43
44
44
**TVM Unity runtime** is placed under `3rdparty/tvm <https://github.com/mlc-ai/mlc-llm/tree/main/3rdparty>`__ in MLC LLM, so there is no need to install anything extra. Set up the following environment variable:
@@ -92,14 +92,14 @@ To deploy models on Android with reasonable performance, one has to cross-compil
This generates the directory ``./dist/$MODEL_NAME-$QUANTIZATION-MLC`` which contains the necessary components to run the model, as explained below.
@@ -131,19 +131,19 @@ The source code for MLC LLM is available under ``android/``, including scripts t
131
131
(Required) Unique local identifier to identify the model.
132
132
133
133
``model_lib``
134
-
(Required) Matches the system-lib-prefix, generally set during ``mlc_chat compile`` which can be specified using
135
-
``--system-lib-prefix`` argument. By default, it is set to ``"${model_type}_${quantization}"`` e.g. ``gpt_neox_q4f16_1`` for the RedPajama-INCITE-Chat-3B-v1 model. If the ``--system-lib-prefix`` argument is manually specified during ``mlc_chat compile``, the ``model_lib`` field should be updated accordingly.
134
+
(Required) Matches the system-lib-prefix, generally set during ``mlc_llm compile`` which can be specified using
135
+
``--system-lib-prefix`` argument. By default, it is set to ``"${model_type}_${quantization}"`` e.g. ``gpt_neox_q4f16_1`` for the RedPajama-INCITE-Chat-3B-v1 model. If the ``--system-lib-prefix`` argument is manually specified during ``mlc_llm compile``, the ``model_lib`` field should be updated accordingly.
136
136
137
137
``estimated_vram_bytes``
138
138
(Optional) Estimated requirements of VRAM to run the model.
139
-
139
+
140
140
To change the configuration, edit ``app-config.json``:
141
141
142
142
.. code-block:: bash
143
143
144
144
vim ./src/main/assets/app-config.json
145
145
146
-
Then bundle the android library ``${MODEL_NAME}-${QUANTIZATION}-android.tar`` compiled from ``mlc_chat compile`` in the previous steps, with TVM Unity's Java runtime by running the commands below:
146
+
Then bundle the android library ``${MODEL_NAME}-${QUANTIZATION}-android.tar`` compiled from ``mlc_llm compile`` in the previous steps, with TVM Unity's Java runtime by running the commands below:
The prebuilt package supports **Metal** on macOS and **Vulkan** on Linux and Windows. It is possible to use other GPU runtimes such as **CUDA** by compiling MLCChat CLI from the source.
@@ -29,7 +29,7 @@ To use other GPU runtimes, e.g. CUDA, please instead :ref:`build it from source
29
29
Option 2. Build MLC Runtime from Source
30
30
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31
31
32
-
We also provide options to build mlc runtime libraries and ``mlc_chat`` from source.
32
+
We also provide options to build mlc runtime libraries and ``mlc_llm`` from source.
33
33
This step is useful if the prebuilt is unavailable on your platform, or if you would like to build a runtime
34
34
that supports other GPU runtime than the prebuilt version. We can build a customized version
35
35
of mlc chat runtime. You only need to do this if you choose not to use the prebuilt.
@@ -44,7 +44,7 @@ Then please follow the instructions in :ref:`mlcchat_build_from_source` to build
44
44
Run Models through MLCChat CLI
45
45
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
46
46
47
-
Once ``mlc_chat`` is installed, you are able to run any MLC-compiled model on the command line.
47
+
Once ``mlc_llm`` is installed, you are able to run any MLC-compiled model on the command line.
48
48
49
49
To run a model with MLC LLM in any platform, you can either:
50
50
@@ -53,14 +53,14 @@ To run a model with MLC LLM in any platform, you can either:
53
53
54
54
**Option 1: Use model prebuilts**
55
55
56
-
To run ``mlc_chat``, you can specify the Huggingface MLC prebuilt model repo path with the prefix ``HF://``.
56
+
To run ``mlc_llm``, you can specify the Huggingface MLC prebuilt model repo path with the prefix ``HF://``.
57
57
For example, to run the MLC Llama 2 7B Q4F16_1 model (`Repo link <https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC>`_),
58
58
simply use ``HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC``. The model weights and library will be downloaded
0 commit comments