-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
Description
Describe the bug
I'd like to deploy mistral 0.2 LLM on sagemaker it seems that we need to have the hugging face llm version 1.3.3. For now the huggingface-llm is limited to some versions that does not include this version.
To reproduce
Run the following code :
#!/usr/bin/env python3
import json
import re
import boto3
import sagemaker
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client("iam")
role = iam.get_role(RoleName="exec-role")["Role"][
"Arn"
]
# Hub Model configuration. https://huggingface.co/models
hub = {
"HF_MODEL_ID": "mistralai/Mistral-7B-Instruct-v0.2",
"SM_NUM_GPUS": json.dumps(1),
# "HF_MODEL_QUANTIZE": "gptq",
# 'HF_TASK':'question-answering',
# Enable to have long input length, and override default sagemaker values
# See https://github.com/facebookresearch/llama/issues/450#issuecomment-1645247796
"MAX_INPUT_LENGTH": json.dumps(4095),
"MAX_TOTAL_TOKENS": json.dumps(4096),
}
# Ensure endpoint name will be compliant for AWS
regex = r"[^\-a-zA-Z0-9]+"
compliant_name = re.sub(regex, "-", hub["HF_MODEL_ID"])
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
# Here we'd like to have at least 1.3.3
# See https://github.com/huggingface/text-generation-inference/issues/1342
image_uri=get_huggingface_llm_image_uri("huggingface", version="1.3.3"),
env=hub,
role=role,
name=compliant_name,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
endpoint_name=compliant_name,
)Expected behavior
Being able to deploy the huggingface llm version 1.3.3.
Screenshots or logs
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.200.1
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): huggingface-llm (PyTorch TGI inference)
- Framework version:
- Python version: 3.10
- CPU or GPU: GPU
- Custom Docker image (Y/N): N
Additional context
If it's a quick fix I could probably help for the PR if needed.
existme
