-
Notifications
You must be signed in to change notification settings - Fork 129
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Qing Lan
authored
Oct 12, 2023
1 parent
4761d1b
commit 0a6c870
Showing
2 changed files
with
251 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
250 changes: 250 additions & 0 deletions
250
aws/sagemaker/large-model-inference/sample-llm/rollingbatch_deploy_llama2-13b-gptq.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,250 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "71a329f0", | ||
"metadata": {}, | ||
"source": [ | ||
"# LLAMA2-13B GPTQ rollingbatch deployment guide\n", | ||
"In this tutorial, you will use LMI container from DLC to SageMaker and run inference with it.\n", | ||
"\n", | ||
"Please make sure the following permission granted before running the notebook:\n", | ||
"\n", | ||
"- S3 bucket push access\n", | ||
"- SageMaker access\n", | ||
"\n", | ||
"## Step 1: Let's bump up SageMaker and import stuff" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "67fa3208", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install sagemaker --upgrade --quiet" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "ec9ac353", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import boto3\n", | ||
"import sagemaker\n", | ||
"from sagemaker import Model, image_uris, serializers, deserializers\n", | ||
"\n", | ||
"role = sagemaker.get_execution_role() # execution role for the endpoint\n", | ||
"sess = sagemaker.session.Session() # sagemaker session for interacting with different AWS APIs\n", | ||
"region = sess._region_name # region name of the current SageMaker Studio environment\n", | ||
"account_id = sess.account_id() # account_id of the current SageMaker Studio environment" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "81deac79", | ||
"metadata": {}, | ||
"source": [ | ||
"## Step 2: Start preparing model artifacts\n", | ||
"In LMI contianer, we expect some artifacts to help setting up the model\n", | ||
"- serving.properties (required): Defines the model server settings\n", | ||
"- model.py (optional): A python file to define the core inference logic\n", | ||
"- requirements.txt (optional): Any additional pip wheel need to install\n", | ||
"\n", | ||
"**Note** The original codeGen 2.5 model will not work only if [this change](https://huggingface.co/Salesforce/codegen25-7b-multi/discussions/8/files) has applied. Please apply to your own tokenizer.py when you downloaded the model or wait SalesForce to merge it." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b011bf5f", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%writefile serving.properties\n", | ||
"engine=MPI\n", | ||
"option.model_id=TheBloke/Llama-2-13B-GPTQ\n", | ||
"option.tensor_parallel_degree=1\n", | ||
"option.max_rolling_batch_size=32\n", | ||
"option.rolling_batch=auto\n", | ||
"option.quantize=gptq" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b0142973", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%sh\n", | ||
"mkdir mymodel\n", | ||
"mv serving.properties mymodel/\n", | ||
"tar czvf mymodel.tar.gz mymodel/\n", | ||
"rm -rf mymodel" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "2e58cf33", | ||
"metadata": {}, | ||
"source": [ | ||
"## Step 3: Start building SageMaker endpoint\n", | ||
"In this step, we will build SageMaker endpoint from scratch" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4d955679", | ||
"metadata": {}, | ||
"source": [ | ||
"### Getting the container image URI\n", | ||
"\n", | ||
"[Large Model Inference available DLC](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7a174b36", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"image_uri = image_uris.retrieve(\n", | ||
" framework=\"djl-deepspeed\",\n", | ||
" region=sess.boto_session.region_name,\n", | ||
" version=\"0.24.0\"\n", | ||
" )" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "11601839", | ||
"metadata": {}, | ||
"source": [ | ||
"### Upload artifact on S3 and create SageMaker model" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "38b1e5ca", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"s3_code_prefix = \"large-model-lmi/code\"\n", | ||
"bucket = sess.default_bucket() # bucket to house artifacts\n", | ||
"code_artifact = sess.upload_data(\"mymodel.tar.gz\", bucket, s3_code_prefix)\n", | ||
"print(f\"S3 Code or Model tar ball uploaded to --- > {code_artifact}\")\n", | ||
"\n", | ||
"model = Model(image_uri=image_uri, model_data=code_artifact, role=role)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "004f39f6", | ||
"metadata": {}, | ||
"source": [ | ||
"### 4.2 Create SageMaker endpoint\n", | ||
"\n", | ||
"You need to specify the instance to use and endpoint names" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8e0e61cd", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"instance_type = \"ml.g5.2xlarge\"\n", | ||
"endpoint_name = sagemaker.utils.name_from_base(\"lmi-model\")\n", | ||
"\n", | ||
"model.deploy(initial_instance_count=1,\n", | ||
" instance_type=instance_type,\n", | ||
" endpoint_name=endpoint_name,\n", | ||
" # container_startup_health_check_timeout=3600\n", | ||
" )\n", | ||
"\n", | ||
"# our requests and responses will be in json format so we specify the serializer and the deserializer\n", | ||
"predictor = sagemaker.Predictor(\n", | ||
" endpoint_name=endpoint_name,\n", | ||
" sagemaker_session=sess,\n", | ||
" serializer=serializers.JSONSerializer(),\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "bb63ee65", | ||
"metadata": {}, | ||
"source": [ | ||
"## Step 5: Test and benchmark the inference" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "79786708", | ||
"metadata": {}, | ||
"source": [ | ||
"Firstly let's try to run with a wrong inputs" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "2bcef095", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"predictor.predict(\n", | ||
" {\"inputs\": \"Deep Learning is\", \"parameters\": {\"max_new_tokens\":128, \"do_sample\":true}}\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c1cd9042", | ||
"metadata": {}, | ||
"source": [ | ||
"## Clean up the environment" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "3d674b41", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"sess.delete_endpoint(endpoint_name)\n", | ||
"sess.delete_endpoint_config(endpoint_name)\n", | ||
"model.delete_model()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |