add gptq notebook sample (#370)

deepjavalibrary · Oct 12, 2023 · 0a6c870 · 0a6c870
1 parent 4761d1b
commit 0a6c870
Show file tree

Hide file tree

Showing 2 changed files with 251 additions and 0 deletions.
diff --git a/aws/sagemaker/large-model-inference/README.md b/aws/sagemaker/large-model-inference/README.md
@@ -26,6 +26,7 @@ For all the serving.prorperties options you could set on DJLServing, click [here
 - [CodeGen 2.5](sample-llm/rollingbatch_deploy_codegen25_7b.ipynb)
 - [Falcon-40B](sample-llm/rollingbatch_deploy_falcon_40b.ipynb)
 - [CodeLLAMA-34B](sample-llm/rollingbatch_deploy_codellama_34b.ipynb)
+- [LLAMA2-13B-GPTQ](sample-llm/rollingbatch_deploy_llama2-13b-gptq.ipynb)
 
 ### HF Acc Rolling Batch
 

diff --git a/aws/sagemaker/large-model-inference/sample-llm/rollingbatch_deploy_llama2-13b-gptq.ipynb b/aws/sagemaker/large-model-inference/sample-llm/rollingbatch_deploy_llama2-13b-gptq.ipynb
@@ -0,0 +1,250 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "71a329f0",
+   "metadata": {},
+   "source": [
+    "# LLAMA2-13B GPTQ rollingbatch deployment guide\n",
+    "In this tutorial, you will use LMI container from DLC to SageMaker and run inference with it.\n",
+    "\n",
+    "Please make sure the following permission granted before running the notebook:\n",
+    "\n",
+    "- S3 bucket push access\n",
+    "- SageMaker access\n",
+    "\n",
+    "## Step 1: Let's bump up SageMaker and import stuff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "67fa3208",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install sagemaker --upgrade  --quiet"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ec9ac353",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import boto3\n",
+    "import sagemaker\n",
+    "from sagemaker import Model, image_uris, serializers, deserializers\n",
+    "\n",
+    "role = sagemaker.get_execution_role()  # execution role for the endpoint\n",
+    "sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs\n",
+    "region = sess._region_name  # region name of the current SageMaker Studio environment\n",
+    "account_id = sess.account_id()  # account_id of the current SageMaker Studio environment"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81deac79",
+   "metadata": {},
+   "source": [
+    "## Step 2: Start preparing model artifacts\n",
+    "In LMI contianer, we expect some artifacts to help setting up the model\n",
+    "- serving.properties (required): Defines the model server settings\n",
+    "- model.py (optional): A python file to define the core inference logic\n",
+    "- requirements.txt (optional): Any additional pip wheel need to install\n",
+    "\n",
+    "**Note** The original codeGen 2.5 model will not work only if [this change](https://huggingface.co/Salesforce/codegen25-7b-multi/discussions/8/files) has applied. Please apply to your own tokenizer.py when you downloaded the model or wait SalesForce to merge it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b011bf5f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile serving.properties\n",
+    "engine=MPI\n",
+    "option.model_id=TheBloke/Llama-2-13B-GPTQ\n",
+    "option.tensor_parallel_degree=1\n",
+    "option.max_rolling_batch_size=32\n",
+    "option.rolling_batch=auto\n",
+    "option.quantize=gptq"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b0142973",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%sh\n",
+    "mkdir mymodel\n",
+    "mv serving.properties mymodel/\n",
+    "tar czvf mymodel.tar.gz mymodel/\n",
+    "rm -rf mymodel"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e58cf33",
+   "metadata": {},
+   "source": [
+    "## Step 3: Start building SageMaker endpoint\n",
+    "In this step, we will build SageMaker endpoint from scratch"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d955679",
+   "metadata": {},
+   "source": [
+    "### Getting the container image URI\n",
+    "\n",
+    "[Large Model Inference available DLC](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7a174b36",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "image_uri = image_uris.retrieve(\n",
+    "        framework=\"djl-deepspeed\",\n",
+    "        region=sess.boto_session.region_name,\n",
+    "        version=\"0.24.0\"\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11601839",
+   "metadata": {},
+   "source": [
+    "### Upload artifact on S3 and create SageMaker model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "38b1e5ca",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "s3_code_prefix = \"large-model-lmi/code\"\n",
+    "bucket = sess.default_bucket()  # bucket to house artifacts\n",
+    "code_artifact = sess.upload_data(\"mymodel.tar.gz\", bucket, s3_code_prefix)\n",
+    "print(f\"S3 Code or Model tar ball uploaded to --- > {code_artifact}\")\n",
+    "\n",
+    "model = Model(image_uri=image_uri, model_data=code_artifact, role=role)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "004f39f6",
+   "metadata": {},
+   "source": [
+    "### 4.2 Create SageMaker endpoint\n",
+    "\n",
+    "You need to specify the instance to use and endpoint names"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8e0e61cd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "instance_type = \"ml.g5.2xlarge\"\n",
+    "endpoint_name = sagemaker.utils.name_from_base(\"lmi-model\")\n",
+    "\n",
+    "model.deploy(initial_instance_count=1,\n",
+    "             instance_type=instance_type,\n",
+    "             endpoint_name=endpoint_name,\n",
+    "             # container_startup_health_check_timeout=3600\n",
+    "            )\n",
+    "\n",
+    "# our requests and responses will be in json format so we specify the serializer and the deserializer\n",
+    "predictor = sagemaker.Predictor(\n",
+    "    endpoint_name=endpoint_name,\n",
+    "    sagemaker_session=sess,\n",
+    "    serializer=serializers.JSONSerializer(),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb63ee65",
+   "metadata": {},
+   "source": [
+    "## Step 5: Test and benchmark the inference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79786708",
+   "metadata": {},
+   "source": [
+    "Firstly let's try to run with a wrong inputs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2bcef095",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "predictor.predict(\n",
+    "    {\"inputs\": \"Deep Learning is\", \"parameters\": {\"max_new_tokens\":128, \"do_sample\":true}}\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1cd9042",
+   "metadata": {},
+   "source": [
+    "## Clean up the environment"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3d674b41",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sess.delete_endpoint(endpoint_name)\n",
+    "sess.delete_endpoint_config(endpoint_name)\n",
+    "model.delete_model()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}