Skip to content

Unable to deploy huggingface-llm 1.3.3 #4332

@LvffY

Description

@LvffY

Describe the bug

I'd like to deploy mistral 0.2 LLM on sagemaker it seems that we need to have the hugging face llm version 1.3.3. For now the huggingface-llm is limited to some versions that does not include this version.

To reproduce

Run the following code :

#!/usr/bin/env python3
import json
import re

import boto3
import sagemaker
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="exec-role")["Role"][
        "Arn"
    ]

# Hub Model configuration. https://huggingface.co/models
hub = {
    "HF_MODEL_ID": "mistralai/Mistral-7B-Instruct-v0.2",
    "SM_NUM_GPUS": json.dumps(1),
    # "HF_MODEL_QUANTIZE": "gptq",
    # 'HF_TASK':'question-answering',
    # Enable to have long input length, and override default sagemaker values
    # See https://github.com/facebookresearch/llama/issues/450#issuecomment-1645247796
    "MAX_INPUT_LENGTH": json.dumps(4095),
    "MAX_TOTAL_TOKENS": json.dumps(4096),
}

# Ensure endpoint name will be compliant for AWS
regex = r"[^\-a-zA-Z0-9]+"

compliant_name = re.sub(regex, "-", hub["HF_MODEL_ID"])

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    # Here we'd like to have at least 1.3.3
    # See https://github.com/huggingface/text-generation-inference/issues/1342
    image_uri=get_huggingface_llm_image_uri("huggingface", version="1.3.3"),
    env=hub,
    role=role,
    name=compliant_name,
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    container_startup_health_check_timeout=300,
    endpoint_name=compliant_name,
)

Expected behavior

Being able to deploy the huggingface llm version 1.3.3.

Screenshots or logs

image

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.200.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): huggingface-llm (PyTorch TGI inference)
  • Framework version:
  • Python version: 3.10
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N

Additional context

If it's a quick fix I could probably help for the PR if needed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions