Skip to content

Commit 0bfb6c0

Browse files
authored
Support Mixtral on macOS (#1558)
A follow-up of my previous PR (#1529). This PR makes Mixtral work on Metal GPUs that macOS comes with. There are honestly no much change needed, except for that Metal doesn't support fp64 data types. A python script to run Mixtral: ```python from mlc_chat import ChatConfig, ChatModule, callback from mlc_chat.support import logging logging.enable_logging() MODEL = "HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC" NUM_GPU = 1 def main(): cm = ChatModule(MODEL, chat_config=ChatConfig( sliding_window_size=1024, tensor_parallel_shards=NUM_GPU, )) cm.generate("What is the meaning of life?", progress_callback=callback.StreamToStdout(callback_interval=2)) if __name__ == "__main__": main() ``` Quantization formats: - 3-bit (19.662 GB): ["HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q3f16_1-MLC"](https://huggingface.co/junrushao/Mixtral-8x7B-Instruct-v0.1-q3f16_1-MLC) - 4-bit (24.466 GB): ["HF://junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC"](https://huggingface.co/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC)
1 parent e32c6c9 commit 0bfb6c0

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

python/mlc_chat/op/moe_misc.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ def moe_cumsum(expert_indices: Tensor, num_local_experts: int) -> Tensor:
186186
.permute_dims(1, 0)
187187
.reshape(batch_size * num_local_experts)
188188
)
189-
with Target(
189+
with Target.current(allow_none=True) or Target(
190190
{
191191
"kind": "cuda",
192192
"max_num_threads": 1024,

0 commit comments

Comments
 (0)