AIME API vLLM Worker

vLLM Worker for AIME API Server to scalable serve large language models on CUDA or ROCM with various quantizations.

Supported models:

Llama / Mistral / Mixtral / Bloom / Falcon / Gemma / GPT-NeoX / InternLM / Mamba / Nemotron / Phi / Qwen / Starcoder

For a full list of current supported models see here

How to setup a AIME API vLLM worker with MLC

mlc-create vllm Pytorch 2.4.0 
mlc-open vllm

git clone https://github.com/aime-labs/vllm_worker.git

cd vllm_worker

pip3 install -r requirements.txt

Download LLM models

python download_weights.py -m {model name} -o {directory to store model}

e.g. to download LLama3.1 70B Instruct model:

python download_weights.py -m meta-llama/Llama-3.1-70B-Instruct -o ./models

How to start a AIME API vLLM worker

Running Llama 3.1

Starting Llama 3.1 70B worker with fp8 quantization on 2x RTX 6000 Ada 48GB GPUs

python main.py --api_server http://127.0.0.1:7777 --api_auth_key b07e305b50505ca2b3284b4ae5f65d1 --job_type llama3_1 --model ./models/Llama-3.1-70B-Instruct-fp8 --max_batch_size 8 --max-seq-len 16192 --max-model-len 16192 --tensor-parallel-size 2

Running Mixtral 8x7B

Starting Mixtral 8x7B worker with fp8 quantization on 2x RTX 6000 Ada 48GB GPUs

python3 main.py --api_server http://127.0.0.1:7777 --api_auth_key b07e305b50505ca2b3284b4ae5f65d1 --model /data/models/Mixtral-8x7B-Instruct-v0.1-hf --job-type mixtral --max_batch_size 8 --max-seq-len 16192 --max-model-len 16192 --tensor-parallel-size 2

Copyright (c) AIME GmbH and affiliates. Find more info at https://www.aime.info/api

This software may be used and distributed according to the terms of the MIT LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md
download_weights.py		download_weights.py
main.py		main.py
requirements.txt		requirements.txt
requirements_pinned.txt		requirements_pinned.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIME API vLLM Worker

How to setup a AIME API vLLM worker with MLC

Download LLM models

How to start a AIME API vLLM worker

Running Llama 3.1

Running Mixtral 8x7B

About

Releases

Packages

Contributors 2

Languages

License

aime-labs/aime-api_vllm

Folders and files

Latest commit

History

Repository files navigation

AIME API vLLM Worker

How to setup a AIME API vLLM worker with MLC

Download LLM models

How to start a AIME API vLLM worker

Running Llama 3.1

Running Mixtral 8x7B

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages