Terraform module for easy deployment of a Hugging Face Transformer models to Amazon SageMaker real-time endpoints. This module will create all the necessary resources to deploy a model to Amazon SageMaker including IAM roles, if not provided, SageMaker Model, SageMaker Endpoint Configuration, SageMaker endpoint.
With this module you can deploy Hugging Face Transformer directly from the Model Hub or from Amazon S3 to Amazon SageMaker for PyTorch and Tensorflow based models.
basic example
module "sagemaker-huggingface" {
source = "philschmid/sagemaker-huggingface/aws"
version = "0.5.0"
name_prefix = "distilbert"
pytorch_version = "1.9.1"
transformers_version = "4.12.3"
instance_type = "ml.g4dn.xlarge"
instance_count = 1 # default is 1
hf_model_id = "distilbert-base-uncased-finetuned-sst-2-english"
hf_task = "text-classification"
}
advanced example with autoscaling
module "sagemaker-huggingface" {
source = "philschmid/sagemaker-huggingface/aws"
version = "0.5.0"
name_prefix = "distilbert"
pytorch_version = "1.9.1"
transformers_version = "4.12.3"
instance_type = "ml.g4dn.xlarge"
hf_model_id = "distilbert-base-uncased-finetuned-sst-2-english"
hf_task = "text-classification"
autoscaling = {
max_capacity = 4 # The max capacity of the scalable target
scaling_target_invocations = 200 # The scaling target invocations (requests/minute)
}
}
examples:
- Deploy Model from hf.co/models
- Deploy Model from Amazon S3
- Deploy Private Models from hf.co/models
- Autoscaling Endpoint
- Asynchronous Inference
- Serverless Inference
- Tensorflow example
- Deploy Model with existing IAM role
Name | Version |
---|---|
terraform | >= 1.0.0 |
aws | ~> 4.0 |
Name | Version |
---|---|
aws | 3.74.0 |
random | n/a |
No modules.
Name | Type |
---|---|
aws_appautoscaling_policy.sagemaker_policy | resource |
aws_appautoscaling_target.sagemaker_target | resource |
aws_iam_role.new_role | resource |
aws_sagemaker_endpoint.huggingface | resource |
aws_sagemaker_endpoint_configuration.huggingface | resource |
aws_sagemaker_endpoint_configuration.huggingface_async | resource |
aws_sagemaker_endpoint_configuration.huggingface_serverless | resource |
aws_sagemaker_model.model_with_hub_model | resource |
aws_sagemaker_model.model_with_model_artifact | resource |
random_string.resource_id | resource |
aws_iam_role.get_role | data source |
aws_sagemaker_prebuilt_ecr_image.deploy_image | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
async_config | (Optional) Specifies configuration for how an endpoint performs asynchronous inference. Required key is s3_output_path , which is the s3 bucket used for async inference. |
object({ |
{ |
no |
autoscaling | A Object which defines the autoscaling target and policy for our SageMaker Endpoint. Required keys are max_capacity and scaling_target_invocations |
object({ |
{ |
no |
hf_api_token | The HF_API_TOKEN environment variable defines the your Hugging Face authorization token. The HF_API_TOKEN is used as a HTTP bearer authorization for remote files, like private models. You can find your token at your settings page. | string |
null |
no |
hf_model_id | The HF_MODEL_ID environment variable defines the model id, which will be automatically loaded from hf.co/models when creating or SageMaker Endpoint. | string |
null |
no |
hf_model_revision | The HF_MODEL_REVISION is an extension to HF_MODEL_ID and allows you to define/pin a revision of the model to make sure you always load the same model on your SageMaker Endpoint. | string |
null |
no |
hf_task | The HF_TASK environment variable defines the task for the used 🤗 Transformers pipeline. A full list of tasks can be find here. | string |
n/a | yes |
image_tag | The image tag you want to use for the container you want to use. Defaults to None . The module tries to derive the image_tag from the pytorch_version , tensorflow_version & instance_type . If you want to override this, you can provide the image_tag as a variable. |
string |
null |
no |
instance_count | The initial number of instances to run in the Endpoint created from this Model. Defaults to 1. | number |
1 |
no |
instance_type | The EC2 instance type to deploy this Model to. For example, ml.p2.xlarge . |
string |
null |
no |
model_data | The S3 location of a SageMaker model data .tar.gz file (default: None). Not needed when using hf_model_id . |
string |
null |
no |
name_prefix | A prefix used for naming resources. | string |
n/a | yes |
pytorch_version | PyTorch version you want to use for executing your inference code. Defaults to None . Required unless tensorflow_version is provided. List of supported versions |
string |
null |
no |
sagemaker_execution_role | An AWS IAM role Name to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role if it needs to access some AWS resources. If not specified, the role will created with with the CreateModel permissions from the documentation |
string |
null |
no |
serverless_config | (Optional) Specifies configuration for how an endpoint performs serverless inference. Required keys are max_concurrency and memory_size_in_mb |
object({ |
{ |
no |
tags | A map of tags (key-value pairs) passed to resources. | map(string) |
{} |
no |
tensorflow_version | TensorFlow version you want to use for executing your inference code. Defaults to None . Required unless pytorch_version is provided. List of supported versions |
string |
null |
no |
transformers_version | Transformers version you want to use for executing your model training code. Defaults to None. List of supported versions | string |
n/a | yes |
Name | Description |
---|---|
iam_role | IAM role used in the endpoint |
sagemaker_endpoint | created Amazon SageMaker endpoint resource |
sagemaker_endpoint_configuration | created Amazon SageMaker endpoint configuration resource |
sagemaker_endpoint_name | Name of the created Amazon SageMaker endpoint, used for invoking the endpoint, with sdks |
sagemaker_model | created Amazon SageMaker model resource |
tags | n/a |
used_container | Used container for creating the endpoint |
MIT License. See LICENSE for full details.