mlc-ai
diff --git a/‎ci/task/build_clean.sh‎
Lines changed: 1 addition & 1 deletion b/‎ci/task/build_clean.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎cpp/llm_chat.cc‎
Lines changed: 1 addition & 2 deletions b/‎cpp/llm_chat.cc‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎cpp/serve/function_table.cc‎
Lines changed: 1 addition & 1 deletion b/‎cpp/serve/function_table.cc‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/compilation/compile_models.rst‎
Lines changed: 112 additions & 112 deletions b/‎docs/compilation/compile_models.rst‎
Lines changed: 112 additions & 112 deletions
diff --git a/‎docs/compilation/convert_weights.rst‎
Lines changed: 15 additions & 15 deletions b/‎docs/compilation/convert_weights.rst‎
Lines changed: 15 additions & 15 deletions
diff --git a/‎docs/deploy/android.rst‎
Lines changed: 9 additions & 9 deletions b/‎docs/deploy/android.rst‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎docs/deploy/cli.rst‎
Lines changed: 12 additions & 12 deletions b/‎docs/deploy/cli.rst‎
Lines changed: 12 additions & 12 deletions
@@ -8,4 +8,4 @@ set -x
 rm -rf ${WORKSPACE_CWD}/build/ \
 	${WORKSPACE_CWD}/python/dist/ \
 	${WORKSPACE_CWD}/python/build/ \
-	${WORKSPACE_CWD}/python/mlc_chat.egg-info
+	${WORKSPACE_CWD}/python/mlc_llm.egg-info
@@ -127,8 +127,7 @@ struct FunctionTable {
         device_ids[i] = i;
       }
       this->use_disco = true;
-      this->sess =
-          Session::ProcessSession(num_shards, f_create_process_pool, "mlc_chat.cli.worker");
+      this->sess = Session::ProcessSession(num_shards, f_create_process_pool, "mlc_llm.cli.worker");
       this->sess->InitCCL(ccl, ShapeTuple(device_ids));
       this->disco_mod = sess->CallPacked(sess->GetGlobalFunc("runtime.disco.load_vm_module"),
                                          lib_path, null_device);
 
@@ -85,7 +85,7 @@ void FunctionTable::Init(TVMArgValue reload_lib, Device device, picojson::object
       device_ids[i] = i;
     }
     this->use_disco = true;
-    this->sess = Session::ProcessSession(num_shards, f_create_process_pool, "mlc_chat.cli.worker");
+    this->sess = Session::ProcessSession(num_shards, f_create_process_pool, "mlc_llm.cli.worker");
     this->sess->InitCCL(ccl, ShapeTuple(device_ids));
     this->disco_mod = sess->CallPacked(sess->GetGlobalFunc("runtime.disco.load_vm_module"),
                                        lib_path, null_device);
 
@@ -8,8 +8,8 @@ To run a model with MLC LLM in any platform, you need:
 1. **Model weights** converted to MLC format (e.g. `RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC <https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/tree/main>`_.)
 2. **Model library** that comprises the inference logic (see repo `binary-mlc-llm-libs <https://github.com/mlc-ai/binary-mlc-llm-libs>`__).
 
-In many cases, we only need to convert weights and reuse existing model library. 
-This page demonstrates adding a model variant with ``mlc_chat convert_weight``, which
+In many cases, we only need to convert weights and reuse existing model library.
+This page demonstrates adding a model variant with ``mlc_llm convert_weight``, which
 takes a hugginface model as input and converts/quantizes into MLC-compatible weights.
 
 Specifically, we add RedPjama-INCITE-**Instruct**-3B-v1, while MLC already
@@ -23,7 +23,7 @@ This can be extended to, e.g.:
 .. note::
     Before you proceed, make sure you followed :ref:`install-tvm-unity`, a required
     backend to compile models with MLC LLM.
-    
+
     Please also follow the instructions in :ref:`deploy-cli` / :ref:`deploy-python` to obtain
     the CLI app / Python API that can be used to chat with the compiled model.
     Finally, we strongly recommend you to read :ref:`project-overview` first to get
@@ -38,20 +38,20 @@ This can be extended to, e.g.:
 0. Verify installation
 ----------------------
 
-**Step 1. Verify mlc_chat**
+**Step 1. Verify mlc_llm**
 
-We use the python package ``mlc_chat`` to compile models. This can be installed by 
+We use the python package ``mlc_llm`` to compile models. This can be installed by
 following :ref:`install-mlc-packages`, either by building from source, or by
-installing the prebuilt package. Verify ``mlc_chat`` installation in command line via:
+installing the prebuilt package. Verify ``mlc_llm`` installation in command line via:
 
 .. code:: bash
 
-    $ mlc_chat --help
+    $ mlc_llm --help
     # You should see help information with this line
     usage: MLC LLM Command Line Interface. [-h] {compile,convert_weight,gen_config}
 
 .. note::
-    If it runs into error ``command not found: mlc_chat``, try ``python -m mlc_chat --help``.
+    If it runs into error ``command not found: mlc_llm``, try ``python -m mlc_llm --help``.
 
 **Step 2. Verify TVM**
 
@@ -80,7 +80,7 @@ for specification of ``convert_weight``.
     git clone https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1
     cd ../..
     # Convert weight
-    mlc_chat convert_weight ./dist/models/RedPajama-INCITE-Instruct-3B-v1/ \
+    mlc_llm convert_weight ./dist/models/RedPajama-INCITE-Instruct-3B-v1/ \
         --quantization q4f16_1 \
         -o dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC
 
@@ -89,20 +89,20 @@ for specification of ``convert_weight``.
 2. Generate MLC Chat Config
 ---------------------------
 
-Use ``mlc_chat gen_config`` to generate ``mlc-chat-config.json`` and process tokenizers.
+Use ``mlc_llm gen_config`` to generate ``mlc-chat-config.json`` and process tokenizers.
 See :ref:`compile-command-specification` for specification of ``gen_config``.
 
 .. code:: shell
 
-    mlc_chat gen_config ./dist/models/RedPajama-INCITE-Instruct-3B-v1/ \
+    mlc_llm gen_config ./dist/models/RedPajama-INCITE-Instruct-3B-v1/ \
         --quantization q4f16_1 --conv-template redpajama_chat \
         -o dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC/
 
 
 .. note::
     The file ``mlc-chat-config.json`` is crucial in both model compilation
     and runtime chatting. Here we only care about the latter case.
-    
+
     You can **optionally** customize
     ``dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC/mlc-chat-config.json`` (checkout :ref:`configure-mlc-chat-json` for more detailed instructions).
     You can also simply use the default configuration.
@@ -111,7 +111,7 @@ See :ref:`compile-command-specification` for specification of ``gen_config``.
     contains a full list of conversation templates that MLC provides. If the model you are adding
     requires a new conversation template, you would need to add your own.
     Follow `this PR <https://github.com/mlc-ai/mlc-llm/pull/1402>`__ as an example. However,
-    adding your own template would require you :ref:`build mlc_chat from source <mlcchat_build_from_source>` in order for it
+    adding your own template would require you :ref:`build mlc_llm from source <mlcchat_build_from_source>` in order for it
     to be recognized by the runtime.
 
 By now, you should have the following files.
@@ -132,7 +132,7 @@ By now, you should have the following files.
 (Optional) 3. Upload weights to HF
 ----------------------------------
 
-Optionally, you can upload what we have to huggingface. 
+Optionally, you can upload what we have to huggingface.
 
 .. code:: shell
 
@@ -175,7 +175,7 @@ Running the distributed models are similar to running prebuilt model weights and
 
     # Run the model in Python; note that we reuse `-Chat` model library
     python
-    >>> from mlc_chat import ChatModule
+    >>> from mlc_llm import ChatModule
     >>> cm = ChatModule(model="dist/RedPajama-INCITE-Instruct-3B-v1-q4f16_1-MLC", \
         model_lib_path="dist/prebuilt_libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-cuda.so")  # Adjust based on backend
     >>> cm.generate("hi")
 
@@ -37,8 +37,8 @@ Prerequisite
 
 **JDK**, such as OpenJDK >= 17, to compile Java bindings of TVM Unity runtime. It could be installed via Homebrew on macOS, apt on Ubuntu or other package managers. Set up the following environment variable:
 
-- ``JAVA_HOME`` so that Java is available in ``$JAVA_HOME/bin/java``. 
-  
+- ``JAVA_HOME`` so that Java is available in ``$JAVA_HOME/bin/java``.
+
 Please ensure that the JDK versions for Android Studio and JAVA_HOME are the same. We recommended setting the `JAVA_HOME` to the JDK bundled with Android Studio. e.g. `export JAVA_HOME=/Applications/Android\ Studio.app/Contents/jbr/Contents/Home` for macOS.
 
 **TVM Unity runtime** is placed under `3rdparty/tvm <https://github.com/mlc-ai/mlc-llm/tree/main/3rdparty>`__ in MLC LLM, so there is no need to install anything extra. Set up the following environment variable:
@@ -92,14 +92,14 @@ To deploy models on Android with reasonable performance, one has to cross-compil
 .. code-block:: bash
 
   # convert weights
-  mlc_chat convert_weight ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION -o dist/$MODEL_NAME-$QUANTIZATION-MLC/
+  mlc_llm convert_weight ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION -o dist/$MODEL_NAME-$QUANTIZATION-MLC/
 
   # create mlc-chat-config.json
-  mlc_chat gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
+  mlc_llm gen_config ./dist/models/$MODEL_NAME/ --quantization $QUANTIZATION \
     --conv-template llama-2 --context-window-size 768 -o dist/${MODEL_NAME}-${QUANTIZATION}-MLC/
 
   # 2. compile: compile model library with specification in mlc-chat-config.json
-  mlc_chat compile ./dist/${MODEL_NAME}-${QUANTIZATION}-MLC/mlc-chat-config.json \
+  mlc_llm compile ./dist/${MODEL_NAME}-${QUANTIZATION}-MLC/mlc-chat-config.json \
       --device android -o ./dist/${MODEL_NAME}-${QUANTIZATION}-MLC/${MODEL_NAME}-${QUANTIZATION}-android.tar
 
 This generates the directory ``./dist/$MODEL_NAME-$QUANTIZATION-MLC`` which contains the necessary components to run the model, as explained below.
@@ -131,19 +131,19 @@ The source code for MLC LLM is available under ``android/``, including scripts t
   (Required) Unique local identifier to identify the model.
 
 ``model_lib``
-   (Required) Matches the system-lib-prefix, generally set during ``mlc_chat compile`` which can be specified using 
-   ``--system-lib-prefix`` argument. By default, it is set to ``"${model_type}_${quantization}"`` e.g. ``gpt_neox_q4f16_1`` for the RedPajama-INCITE-Chat-3B-v1 model. If the ``--system-lib-prefix`` argument is manually specified during ``mlc_chat compile``, the ``model_lib`` field should be updated accordingly.
+   (Required) Matches the system-lib-prefix, generally set during ``mlc_llm compile`` which can be specified using
+   ``--system-lib-prefix`` argument. By default, it is set to ``"${model_type}_${quantization}"`` e.g. ``gpt_neox_q4f16_1`` for the RedPajama-INCITE-Chat-3B-v1 model. If the ``--system-lib-prefix`` argument is manually specified during ``mlc_llm compile``, the ``model_lib`` field should be updated accordingly.
 
 ``estimated_vram_bytes``
    (Optional) Estimated requirements of VRAM to run the model.
-   
+
 To change the configuration, edit ``app-config.json``:
 
 .. code-block:: bash
 
   vim ./src/main/assets/app-config.json
 
-Then bundle the android library ``${MODEL_NAME}-${QUANTIZATION}-android.tar`` compiled from ``mlc_chat compile`` in the previous steps, with TVM Unity's Java runtime by running the commands below:
+Then bundle the android library ``${MODEL_NAME}-${QUANTIZATION}-android.tar`` compiled from ``mlc_llm compile`` in the previous steps, with TVM Unity's Java runtime by running the commands below:
 
 .. code-block:: bash
 
 
@@ -19,8 +19,8 @@ To use other GPU runtimes, e.g. CUDA, please instead :ref:`build it from source
 .. code:: shell
 
     conda activate your-environment
-    python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-chat-nightly mlc-ai-nightly
-    mlc_chat chat -h
+    python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly mlc-ai-nightly
+    mlc_llm chat -h
 
 .. note::
     The prebuilt package supports **Metal** on macOS and **Vulkan** on Linux and Windows. It is possible to use other GPU runtimes such as **CUDA** by compiling MLCChat CLI from the source.
@@ -29,7 +29,7 @@ To use other GPU runtimes, e.g. CUDA, please instead :ref:`build it from source
 Option 2. Build MLC Runtime from Source
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-We also provide options to build mlc runtime libraries and ``mlc_chat`` from source.
+We also provide options to build mlc runtime libraries and ``mlc_llm`` from source.
 This step is useful if the prebuilt is unavailable on your platform, or if you would like to build a runtime
 that supports other GPU runtime than the prebuilt version. We can build a customized version
 of mlc chat runtime. You only need to do this if you choose not to use the prebuilt.
@@ -44,7 +44,7 @@ Then please follow the instructions in :ref:`mlcchat_build_from_source` to build
 Run Models through MLCChat CLI
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Once ``mlc_chat`` is installed, you are able to run any MLC-compiled model on the command line.
+Once ``mlc_llm`` is installed, you are able to run any MLC-compiled model on the command line.
 
 To run a model with MLC LLM in any platform, you can either:
 
@@ -53,14 +53,14 @@ To run a model with MLC LLM in any platform, you can either:
 
 **Option 1: Use model prebuilts**
 
-To run ``mlc_chat``, you can specify the Huggingface MLC prebuilt model repo path with the prefix ``HF://``. 
+To run ``mlc_llm``, you can specify the Huggingface MLC prebuilt model repo path with the prefix ``HF://``.
 For example, to run the MLC Llama 2 7B Q4F16_1 model (`Repo link <https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC>`_),
 simply use ``HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC``. The model weights and library will be downloaded
 automatically from Huggingface.
 
 .. code:: shell
 
-  mlc_chat chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC --device "cuda:0" --overrides context_window_size=1024
+  mlc_llm chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC --device "cuda:0" --overrides context_window_size=1024
 
 .. code:: shell
 
@@ -75,10 +75,10 @@ automatically from Huggingface.
     Multi-line input: Use escape+enter to start a new line.
 
   [INST]: What's the meaning of life
-  [/INST]: 
-  Ah, a question that has puzzled philosophers and theologians for centuries! The meaning 
-  of life is a deeply personal and subjective topic, and there are many different 
-  perspectives on what it might be. However, here are some possible answers that have been 
+  [/INST]:
+  Ah, a question that has puzzled philosophers and theologians for centuries! The meaning
+  of life is a deeply personal and subjective topic, and there are many different
+  perspectives on what it might be. However, here are some possible answers that have been
   proposed by various thinkers and cultures:
   ...
 
@@ -91,14 +91,14 @@ For models other than the prebuilt ones we provided:
    follow :ref:`convert-weights-via-MLC` to convert the weights and reuse existing model libraries.
 2. Otherwise, follow :ref:`compile-model-libraries` to compile both the model library and weights.
 
-Once you have the model locally compiled with a model library and model weights, to run ``mlc_chat``, simply 
+Once you have the model locally compiled with a model library and model weights, to run ``mlc_llm``, simply
 
 - Specify the path to ``mlc-chat-config.json`` and the converted model weights to ``--model``
 - Specify the path to the compiled model library (e.g. a .so file) to ``--model-lib-path``
 
 .. code:: shell
 
-  mlc_chat chat dist/Llama-2-7b-chat-hf-q4f16_1-MLC \
+  mlc_llm chat dist/Llama-2-7b-chat-hf-q4f16_1-MLC \
                --device "cuda:0" --overrides context_window_size=1024 \
                --model-lib-path dist/prebuilt_libs/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f16_1-vulkan.so
                # CUDA on Linux: dist/prebuilt_libs/Llama-2-7b-chat-hf/Llama-2-7b-chat-hf-q4f16_1-cuda.so
Original file line number	Diff line number	Diff line change
`@@ -85,7 +85,7 @@ void FunctionTable::Init(TVMArgValue reload_lib, Device device, picojson::object`
`85`	`85`	`device_ids[i] = i;`
`86`	`86`	`}`
`87`	`87`	`this->use_disco = true;`
`88`		`- this->sess = Session::ProcessSession(num_shards, f_create_process_pool, "mlc_chat.cli.worker");`
	`88`	`+ this->sess = Session::ProcessSession(num_shards, f_create_process_pool, "mlc_llm.cli.worker");`
`89`	`89`	`this->sess->InitCCL(ccl, ShapeTuple(device_ids));`
`90`	`90`	`this->disco_mod = sess->CallPacked(sess->GetGlobalFunc("runtime.disco.load_vm_module"),`
`91`	`91`	`lib_path, null_device);`