Skip to content

Commit

Permalink
[feat] Add deepspeed smoothquant notebook (#375)
Browse files Browse the repository at this point in the history
  • Loading branch information
tosterberg authored Oct 18, 2023
1 parent 945a709 commit ccb6b56
Show file tree
Hide file tree
Showing 2 changed files with 307 additions and 0 deletions.
4 changes: 4 additions & 0 deletions aws/sagemaker/large-model-inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ For all the serving.prorperties options you could set on DJLServing, click [here

- [Mistral 7B](sample-llm/hf_acc_deploy_mistral_7b.ipynb)

### DeepSpeed

- [LLAMA-2-13B-SmoothQuant](sample-llm/ds_deploy_llama2-13b-smoothquant.ipynb)

### FasterTransformer

- [OpenAssistant GPTNeoX](sample-llm/fastertransformer_deploy_pythia12b_triton_mode.ipynb)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,303 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "71a329f0",
"metadata": {},
"source": [
"# LLAMA2-13B SmoothQuant deployment guide\n",
"In this tutorial, you will use LMI container from DLC to SageMaker and run inference with it.\n",
"\n",
"Please make sure the following permission granted before running the notebook:\n",
"\n",
"- S3 bucket push access\n",
"- SageMaker access\n",
"\n",
"## Step 1: Let's bump up SageMaker and import stuff"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "67fa3208",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install sagemaker --upgrade --quiet"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ec9ac353",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml\n",
"sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml\n",
"sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml\n",
"sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml\n",
"sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml\n",
"sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml\n"
]
}
],
"source": [
"import boto3\n",
"import sagemaker\n",
"from sagemaker import Model, image_uris, serializers, deserializers\n",
"\n",
"role = sagemaker.get_execution_role() # execution role for the endpoint\n",
"sess = sagemaker.session.Session() # sagemaker session for interacting with different AWS APIs\n",
"region = sess._region_name # region name of the current SageMaker Studio environment\n",
"account_id = sess.account_id() # account_id of the current SageMaker Studio environment"
]
},
{
"cell_type": "markdown",
"id": "81deac79",
"metadata": {},
"source": [
"## Step 2: Start preparing model artifacts\n",
"In LMI contianer, we expect some artifacts to help setting up the model\n",
"- serving.properties (required): Defines the model server settings\n",
"- model.py (optional): A python file to define the core inference logic\n",
"- requirements.txt (optional): Any additional pip wheel need to install"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b011bf5f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing serving.properties\n"
]
}
],
"source": [
"%%writefile serving.properties\n",
"engine=DeepSpeed\n",
"option.model_id=TheBloke/Llama-2-13B-fp16\n",
"option.tensor_parallel_degree=1\n",
"option.dtype=fp16\n",
"option.quantize=smoothquant\n",
"batch_size=32\n",
"max_batch_delay=100"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b0142973",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mymodel/\n",
"mymodel/serving.properties\n"
]
}
],
"source": [
"%%sh\n",
"mkdir mymodel\n",
"mv serving.properties mymodel/\n",
"tar czvf mymodel.tar.gz mymodel/\n",
"rm -rf mymodel"
]
},
{
"cell_type": "markdown",
"id": "2e58cf33",
"metadata": {},
"source": [
"## Step 3: Start building SageMaker endpoint\n",
"In this step, we will build SageMaker endpoint from scratch"
]
},
{
"cell_type": "markdown",
"id": "4d955679",
"metadata": {},
"source": [
"### Getting the container image URI\n",
"\n",
"[Large Model Inference available DLC](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers)\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7a174b36",
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "Unsupported djl-deepspeed version: 0.24.0. You may need to upgrade your SDK version (pip install -U sagemaker) for newer djl-deepspeed versions. Supported djl-deepspeed version(s): 0.23.0, 0.22.1, 0.21.0, 0.20.0, 0.19.0.",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[5], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m image_uri \u001b[38;5;241m=\u001b[39m \u001b[43mimage_uris\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mretrieve\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 2\u001b[0m \u001b[43m \u001b[49m\u001b[43mframework\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mdjl-deepspeed\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 3\u001b[0m \u001b[43m \u001b[49m\u001b[43mregion\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mboto_session\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mregion_name\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 4\u001b[0m \u001b[43m \u001b[49m\u001b[43mversion\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m0.24.0\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\n\u001b[1;32m 5\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/sagemaker/workflow/utilities.py:417\u001b[0m, in \u001b[0;36moverride_pipeline_parameter_var.<locals>.wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 415\u001b[0m logger\u001b[38;5;241m.\u001b[39mwarning(warning_msg_template, arg_name, func_name, \u001b[38;5;28mtype\u001b[39m(value))\n\u001b[1;32m 416\u001b[0m kwargs[arg_name] \u001b[38;5;241m=\u001b[39m value\u001b[38;5;241m.\u001b[39mdefault_value\n\u001b[0;32m--> 417\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/sagemaker/image_uris.py:176\u001b[0m, in \u001b[0;36mretrieve\u001b[0;34m(framework, region, version, py_version, instance_type, accelerator_type, image_scope, container_version, distribution, base_framework_version, training_compiler_config, model_id, model_version, tolerate_vulnerable_model, tolerate_deprecated_model, sdk_version, inference_tool, serverless_inference_config, sagemaker_session)\u001b[0m\n\u001b[1;32m 173\u001b[0m config \u001b[38;5;241m=\u001b[39m _config_for_framework_and_scope(_framework, final_image_scope, accelerator_type)\n\u001b[1;32m 175\u001b[0m original_version \u001b[38;5;241m=\u001b[39m version\n\u001b[0;32m--> 176\u001b[0m version \u001b[38;5;241m=\u001b[39m \u001b[43m_validate_version_and_set_if_needed\u001b[49m\u001b[43m(\u001b[49m\u001b[43mversion\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mframework\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 177\u001b[0m version_config \u001b[38;5;241m=\u001b[39m config[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mversions\u001b[39m\u001b[38;5;124m\"\u001b[39m][_version_for_config(version, config)]\n\u001b[1;32m 179\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m framework \u001b[38;5;241m==\u001b[39m HUGGING_FACE_FRAMEWORK:\n",
"File \u001b[0;32m~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/sagemaker/image_uris.py:473\u001b[0m, in \u001b[0;36m_validate_version_and_set_if_needed\u001b[0;34m(version, config, framework)\u001b[0m\n\u001b[1;32m 466\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m version \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m framework \u001b[38;5;129;01min\u001b[39;00m [\n\u001b[1;32m 467\u001b[0m DATA_WRANGLER_FRAMEWORK,\n\u001b[1;32m 468\u001b[0m HUGGING_FACE_LLM_FRAMEWORK,\n\u001b[1;32m 469\u001b[0m STABILITYAI_FRAMEWORK,\n\u001b[1;32m 470\u001b[0m ]:\n\u001b[1;32m 471\u001b[0m version \u001b[38;5;241m=\u001b[39m _get_latest_versions(available_versions)\n\u001b[0;32m--> 473\u001b[0m \u001b[43m_validate_arg\u001b[49m\u001b[43m(\u001b[49m\u001b[43mversion\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mavailable_versions\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[43maliased_versions\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;132;43;01m{}\u001b[39;49;00m\u001b[38;5;124;43m version\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mformat\u001b[49m\u001b[43m(\u001b[49m\u001b[43mframework\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 474\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m version\n",
"File \u001b[0;32m~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/sagemaker/image_uris.py:585\u001b[0m, in \u001b[0;36m_validate_arg\u001b[0;34m(arg, available_options, arg_name)\u001b[0m\n\u001b[1;32m 583\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Checks if the arg is in the available options, and raises a ``ValueError`` if not.\"\"\"\u001b[39;00m\n\u001b[1;32m 584\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m arg \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m available_options:\n\u001b[0;32m--> 585\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 586\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mUnsupported \u001b[39m\u001b[38;5;132;01m{arg_name}\u001b[39;00m\u001b[38;5;124m: \u001b[39m\u001b[38;5;132;01m{arg}\u001b[39;00m\u001b[38;5;124m. You may need to upgrade your SDK version \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 587\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m(pip install -U sagemaker) for newer \u001b[39m\u001b[38;5;132;01m{arg_name}\u001b[39;00m\u001b[38;5;124ms. Supported \u001b[39m\u001b[38;5;132;01m{arg_name}\u001b[39;00m\u001b[38;5;124m(s): \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 588\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{options}\u001b[39;00m\u001b[38;5;124m.\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;241m.\u001b[39mformat(arg_name\u001b[38;5;241m=\u001b[39marg_name, arg\u001b[38;5;241m=\u001b[39marg, options\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m, \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;241m.\u001b[39mjoin(available_options))\n\u001b[1;32m 589\u001b[0m )\n",
"\u001b[0;31mValueError\u001b[0m: Unsupported djl-deepspeed version: 0.24.0. You may need to upgrade your SDK version (pip install -U sagemaker) for newer djl-deepspeed versions. Supported djl-deepspeed version(s): 0.23.0, 0.22.1, 0.21.0, 0.20.0, 0.19.0."
]
}
],
"source": [
"image_uri = image_uris.retrieve(\n",
" framework=\"djl-deepspeed\",\n",
" region=sess.boto_session.region_name,\n",
" version=\"0.24.0\"\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "11601839",
"metadata": {},
"source": [
"### Upload artifact on S3 and create SageMaker model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "38b1e5ca",
"metadata": {},
"outputs": [],
"source": [
"s3_code_prefix = \"large-model-lmi/code\"\n",
"bucket = sess.default_bucket() # bucket to house artifacts\n",
"code_artifact = sess.upload_data(\"mymodel.tar.gz\", bucket, s3_code_prefix)\n",
"print(f\"S3 Code or Model tar ball uploaded to --- > {code_artifact}\")\n",
"\n",
"model = Model(image_uri=image_uri, model_data=code_artifact, role=role)"
]
},
{
"cell_type": "markdown",
"id": "004f39f6",
"metadata": {},
"source": [
"### 4.2 Create SageMaker endpoint\n",
"\n",
"You need to specify the instance to use and endpoint names"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e0e61cd",
"metadata": {},
"outputs": [],
"source": [
"instance_type = \"ml.g5.2xlarge\"\n",
"endpoint_name = sagemaker.utils.name_from_base(\"lmi-model\")\n",
"\n",
"model.deploy(initial_instance_count=1,\n",
" instance_type=instance_type,\n",
" endpoint_name=endpoint_name,\n",
" # container_startup_health_check_timeout=3600\n",
" )\n",
"\n",
"# our requests and responses will be in json format so we specify the serializer and the deserializer\n",
"predictor = sagemaker.Predictor(\n",
" endpoint_name=endpoint_name,\n",
" sagemaker_session=sess,\n",
" serializer=serializers.JSONSerializer(),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "bb63ee65",
"metadata": {},
"source": [
"## Step 5: Test and benchmark the inference"
]
},
{
"cell_type": "markdown",
"id": "79786708",
"metadata": {},
"source": [
"Firstly let's try to run with a wrong inputs"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2bcef095",
"metadata": {},
"outputs": [],
"source": [
"predictor.predict(\n",
" {\"inputs\": \"Deep Learning is\", \"parameters\": {\"max_new_tokens\":128, \"do_sample\":true}}\n",
")"
]
},
{
"cell_type": "markdown",
"id": "c1cd9042",
"metadata": {},
"source": [
"## Clean up the environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d674b41",
"metadata": {},
"outputs": [],
"source": [
"sess.delete_endpoint(endpoint_name)\n",
"sess.delete_endpoint_config(endpoint_name)\n",
"model.delete_model()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_pytorch_p310",
"language": "python",
"name": "conda_pytorch_p310"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

0 comments on commit ccb6b56

Please sign in to comment.