Releases · instill-ai/model-mistral-7b-dvc

Support Mistral-7b Text Completion Task via vLLM in Triton Inference Server's Python Operator, running in parallel with 2 GPU instances, each utilizing 80% of GPU memory.

Assets 2

21 Oct 17:17

tonywang10101

fp16-7b-vllm-p80-1gpu

fd0cce1

fp16-7b-vllm-p80-1gpu

Support Mistral-7b Text Completion Task via vLLM in Triton Inference Server's Python Operator, running not in parallel with only 1 gpu instance with utilizing 80% of GPU memory.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: instill-ai/model-mistral-7b-dvc

for-test

fp16-7b-vllm-a100

fp16-7b-vllm-p80-2gpu

fp16-7b-vllm-p80-1gpu