Hugging Face Inference SageMaker Module

Terraform module for easy deployment of a Hugging Face Transformer models to Amazon SageMaker real-time endpoints. This module will create all the necessary resources to deploy a model to Amazon SageMaker including IAM roles, if not provided, SageMaker Model, SageMaker Endpoint Configuration, SageMaker endpoint.

With this module you can deploy Hugging Face Transformer directly from the Model Hub or from Amazon S3 to Amazon SageMaker for PyTorch and Tensorflow based models.

Usage

basic example

module "sagemaker-huggingface" {
  source               = "philschmid/sagemaker-huggingface/aws"
  version              = "0.5.0"
  name_prefix          = "distilbert"
  pytorch_version      = "1.9.1"
  transformers_version = "4.12.3"
  instance_type        = "ml.g4dn.xlarge"
  instance_count       = 1 # default is 1
  hf_model_id          = "distilbert-base-uncased-finetuned-sst-2-english"
  hf_task              = "text-classification"
}

advanced example with autoscaling

module "sagemaker-huggingface" {
  source               = "philschmid/sagemaker-huggingface/aws"
  version              = "0.5.0"
  name_prefix          = "distilbert"
  pytorch_version      = "1.9.1"
  transformers_version = "4.12.3"
  instance_type        = "ml.g4dn.xlarge"
  hf_model_id          = "distilbert-base-uncased-finetuned-sst-2-english"
  hf_task              = "text-classification"
  autoscaling = {
    max_capacity               = 4   # The max capacity of the scalable target
    scaling_target_invocations = 200 # The scaling target invocations (requests/minute)
  }
}

examples:

Requirements

Name	Version
terraform	>= 1.0.0
aws	~> 4.0

Providers

Name	Version
aws	3.74.0
random	n/a

Modules

No modules.

Resources

Name	Type
aws_appautoscaling_policy.sagemaker_policy	resource
aws_appautoscaling_target.sagemaker_target	resource
aws_iam_role.new_role	resource
aws_sagemaker_endpoint.huggingface	resource
aws_sagemaker_endpoint_configuration.huggingface	resource
aws_sagemaker_endpoint_configuration.huggingface_async	resource
aws_sagemaker_endpoint_configuration.huggingface_serverless	resource
aws_sagemaker_model.model_with_hub_model	resource
aws_sagemaker_model.model_with_model_artifact	resource
random_string.resource_id	resource
aws_iam_role.get_role	data source
aws_sagemaker_prebuilt_ecr_image.deploy_image	data source

Inputs

Name	Description	Type	Default	Required
async_config	(Optional) Specifies configuration for how an endpoint performs asynchronous inference. Required key is `s3_output_path`, which is the s3 bucket used for async inference.	object({ s3_output_path = string, s3_failure_path = optional(string), kms_key_id = optional(string), sns_error_topic = optional(string), sns_success_topic = optional(string), })	{ "kms_key_id": null, "s3_output_path": null, "s3_failure_path": null, "sns_error_topic": null, "sns_success_topic": null }	no
autoscaling	A Object which defines the autoscaling target and policy for our SageMaker Endpoint. Required keys are `max_capacity` and `scaling_target_invocations`	object({ min_capacity = optional(number), max_capacity = number, scaling_target_invocations = optional(number), scale_in_cooldown = optional(number), scale_out_cooldown = optional(number), })	{ "max_capacity": null, "min_capacity": 1, "scale_in_cooldown": 300, "scale_out_cooldown": 66, "scaling_target_invocations": null }	no
hf_api_token	The HF_API_TOKEN environment variable defines the your Hugging Face authorization token. The HF_API_TOKEN is used as a HTTP bearer authorization for remote files, like private models. You can find your token at your settings page.	`string`	`null`	no
hf_model_id	The HF_MODEL_ID environment variable defines the model id, which will be automatically loaded from hf.co/models when creating or SageMaker Endpoint.	`string`	`null`	no
hf_model_revision	The HF_MODEL_REVISION is an extension to HF_MODEL_ID and allows you to define/pin a revision of the model to make sure you always load the same model on your SageMaker Endpoint.	`string`	`null`	no
hf_task	The HF_TASK environment variable defines the task for the used 🤗 Transformers pipeline. A full list of tasks can be find here.	`string`	n/a	yes
image_tag	The image tag you want to use for the container you want to use. Defaults to `None`. The module tries to derive the `image_tag` from the `pytorch_version`, `tensorflow_version` & `instance_type`. If you want to override this, you can provide the `image_tag` as a variable.	`string`	`null`	no
instance_count	The initial number of instances to run in the Endpoint created from this Model. Defaults to 1.	`number`	`1`	no
instance_type	The EC2 instance type to deploy this Model to. For example, `ml.p2.xlarge`.	`string`	`null`	no
model_data	The S3 location of a SageMaker model data .tar.gz file (default: None). Not needed when using `hf_model_id`.	`string`	`null`	no
name_prefix	A prefix used for naming resources.	`string`	n/a	yes
pytorch_version	PyTorch version you want to use for executing your inference code. Defaults to `None`. Required unless `tensorflow_version` is provided. List of supported versions	`string`	`null`	no
sagemaker_execution_role	An AWS IAM role Name to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role if it needs to access some AWS resources. If not specified, the role will created with with the `CreateModel` permissions from the documentation	`string`	`null`	no
serverless_config	(Optional) Specifies configuration for how an endpoint performs serverless inference. Required keys are `max_concurrency` and `memory_size_in_mb`	object({ max_concurrency = number, memory_size_in_mb = number })	{ "max_concurrency": null, "memory_size_in_mb": null }	no
tags	A map of tags (key-value pairs) passed to resources.	`map(string)`	`{}`	no
tensorflow_version	TensorFlow version you want to use for executing your inference code. Defaults to `None`. Required unless `pytorch_version` is provided. List of supported versions	`string`	`null`	no
transformers_version	Transformers version you want to use for executing your model training code. Defaults to None. List of supported versions	`string`	n/a	yes

Outputs

Name	Description
iam_role	IAM role used in the endpoint
sagemaker_endpoint	created Amazon SageMaker endpoint resource
sagemaker_endpoint_configuration	created Amazon SageMaker endpoint configuration resource
sagemaker_endpoint_name	Name of the created Amazon SageMaker endpoint, used for invoking the endpoint, with sdks
sagemaker_model	created Amazon SageMaker model resource
tags	n/a
used_container	Used container for creating the endpoint

License

MIT License. See LICENSE for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
examples		examples
.gitignore		.gitignore
.terraform.lock.hcl		.terraform.lock.hcl
BUILD.md		BUILD.md
LICENSE		LICENSE
README.md		README.md
main.tf		main.tf
outputs.tf		outputs.tf
variables.tf		variables.tf
versions.tf		versions.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hugging Face Inference SageMaker Module

Usage

Requirements

Providers

Modules

Resources

Inputs

Outputs

License

About

Releases 11

Packages

Contributors 8

Languages

License

philschmid/terraform-aws-sagemaker-huggingface

Folders and files

Latest commit

History

Repository files navigation

Hugging Face Inference SageMaker Module

Usage

Requirements

Providers

Modules

Resources

Inputs

Outputs

License

About

Resources

License

Stars

Watchers

Forks

Releases 11

Packages 0

Contributors 8

Languages

Packages