-
Notifications
You must be signed in to change notification settings - Fork 213
[IN Review] Added Docs for AQUA Stacked Deployments #663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6978df4
739bd71
27327fa
58619d0
15ab1f4
da07346
9e8684a
a32e829
2b5fd56
27116b6
d0568f3
3f7ccee
d55f88e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,14 +28,24 @@ For fine-tuned models, requests specifying the base model name (ex. model: meta- | |
| - [Setup](#setup) | ||
| - [For AQUA CLI](#for-aqua-cli) | ||
| - [Using AQUA UI Interface for Multi-Model Deployment](#using-aqua-ui-interface-for-multi-model-deployment) | ||
| - [Select the 'Create deployment' Button](#select-the-create-deployment-button) | ||
| - [Select 'Deploy Multi Model'](#select-deploy-multi-model) | ||
| - [Inferencing with Multi-Model Deployment](#inferencing-with-multi-model-deployment) | ||
| - [Select the 'Create deployment' Button](#select-the-create-deployment-button) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In several places we mentioned that only base models supported, this need to be extended with the fine-tuned models as well. |
||
| - [Select 'Deploy Multi Model'](#select-deploy-multi-model) | ||
| - [Inferencing with Multi-Model Deployment](#inferencing-with-multi-model-deployment) | ||
| - [Using AQUA CLI for Multi-Model Deployment](#using-aqua-cli-for-multi-model-deployment) | ||
| - [1. Obtain Model OCIDs](#1-obtain-model-ocids) | ||
| - [Service Managed Models](#service-managed-models) | ||
| - [Fine-Tuned Models](#fine-tuned-models) | ||
| - [Custom Models](#custom-models) | ||
| - [Multi-Model Deployment](#multi-model-deployment) | ||
| - [List Available Shapes](#list-available-shapes) | ||
| - [Get Multi-Model Configuration](#get-multi-model-configuration) | ||
| - [Manage Multi-Model Deployments](#manage-multi-model-deployments) | ||
| - [List Multi-Model Deployments](#list-multi-model-deployments) | ||
| - [Edit Multi-Model Deployments](#edit-multi-model-deployments) | ||
| - [Multi-Model Inferencing](#multi-model-inferencing) | ||
| - [Multi-Model Evaluation](#multi-model-evaluation) | ||
| - [Create Model Evaluation](#create-model-evaluations) | ||
| - [Limitation](#limitations) | ||
| - [2. Before Deployment, Check Resource Limits](#2-before-deployment-check-resource-limits) | ||
| - [List Available Shapes](#list-available-shapes) | ||
| - [Usage](#usage) | ||
|
|
@@ -110,7 +120,7 @@ Only Multi-Model Deployments with **base service LLM models (text-generation)** | |
| There are two ways to send inference requests to models within a Multi-Model Deployment | ||
|
|
||
| 1. Python SDK (recommended)- see [here](#Multi-Model-Inferencing) | ||
| 2. Using AQUA UI (see below, ok for testing) | ||
| 2. Using AQUA UI - see [here](#Create-Multi-Model-Deployment) | ||
|
|
||
| Once the Deployment is Active, view the model deployment details and inferencing form by clicking on the 'Deployments' Tab and selecting the model within the Model Deployment list. | ||
|
|
||
|
|
@@ -470,22 +480,25 @@ ads aqua deployment get_multimodel_deployment_config --model_ids '["ocid1.datasc | |
| } | ||
| ``` | ||
|
|
||
| ## 3. Create Multi-Model Deployment | ||
| ### ADS CLI | ||
|
|
||
| Only **base service LLM models** are supported for MultiModel Deployment. All selected models will run on the same **GPU shape**, sharing the available compute resources. Make sure to choose a shape that meets the needs of all models in your deployment using [MultiModel Configuration command](#get-multimodel-configuration) | ||
| #### Description | ||
|
|
||
| You'll need the latest version of ADS to create a new Aqua MultiModel deployment. Installation instructions are available [here](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/quickstart.html). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We also need to mention that if they use NB session, then AQUA 2.0 should be used. |
||
|
|
||
| ### Description | ||
| Only fine tuned model with version `V2` is allowed to be deployed as weights in Multi Deployment. For deploying old fine tuned model weight, run the following command to convert it to version `V2` and apply the new fine tuned model OCID to the deployment creation. This command deletes the old fine tuned model by default after conversion but you can add ``--delete_model False`` to keep it instead. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's make a separate title for this section, something like - deploying fine-tuned models. This title will be used as a reference form the AQUA UI |
||
|
|
||
| Creates a new Aqua MultiModel deployment. | ||
| ```bash | ||
| ads aqua model convert_fine_tune --model_id [FT_OCID] | ||
| ``` | ||
|
|
||
| ### Usage | ||
| #### Usage | ||
|
|
||
| ```bash | ||
| ads aqua deployment create [OPTIONS] | ||
| ``` | ||
|
|
||
| ### Required Parameters | ||
| #### Required Parameters | ||
|
|
||
| `--models [str]` | ||
|
|
||
|
|
@@ -553,7 +566,7 @@ The URI of the inference container associated with the model being registered. I | |
| Example: `dsmc://odsc-vllm-serving:0.6.4.post1.2` or `dsmc://odsc-vllm-serving:0.8.1.2` | ||
|
|
||
|
|
||
| ### Optional Parameters | ||
| #### Optional Parameters | ||
|
|
||
| `--compartment_id [str]` | ||
|
|
||
|
|
@@ -604,28 +617,28 @@ Environment variable for the model deployment, defaults to None. | |
| The private endpoint id of model deployment. | ||
|
|
||
|
|
||
| ### Example | ||
| #### Example | ||
|
|
||
| #### Create Multi-Model deployment with `/v1/completions` | ||
| ##### Create Multi-Model deployment with `/v1/completions` | ||
|
|
||
| ```bash | ||
| ads aqua deployment create \ | ||
| --container_image_uri "dsmc://odsc-vllm-serving:0.6.4.post1.2" \ | ||
| --models '[{"model_id":"ocid1.log.oc1.iad.<ocid>", "gpu_count":1}, {"model_id":"ocid1.log.oc1.iad.<ocid>", "gpu_count":1}]' \ | ||
| --models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.<ocid>", "gpu_count":1}, {"model_id":"ocid1.datasciencemodel.oc1.iad.<ocid>", "gpu_count":1}]' \ | ||
| --instance_shape "VM.GPU.A10.2" \ | ||
| --display_name "modelDeployment_multmodel_model1_model2" \ | ||
| --env_var '{"MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/completions"}' | ||
|
|
||
| ``` | ||
|
|
||
| ##### CLI Output | ||
| ###### CLI Output | ||
|
|
||
| ```json | ||
| { | ||
| "id": "ocid1.datasciencemodeldeployment.oc1.iad.<ocid>", | ||
| "display_name": "Multi model deployment of Mistral-7B-v0.1 and falcon-7b on A10.2", | ||
| "display_name": "modelDeployment_multmodel_model1_model2", | ||
| "aqua_service_model": false, | ||
| "model_id": "ocid1.datasciencemodel.oc1.<ocid>", | ||
| "model_id": "ocid1.datasciencemodelgroup.oc1.<ocid>", | ||
| "models": [ | ||
| { | ||
| "model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>", | ||
|
|
@@ -656,22 +669,23 @@ ads aqua deployment create \ | |
| "memory_in_gbs": null | ||
| }, | ||
| "tags": { | ||
| "aqua_model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>", | ||
| "aqua_model_id": "ocid1.datasciencemodelgroup.oc1.iad.<ocid>", | ||
| "aqua_multimodel": "true", | ||
| "OCI_AQUA": "active" | ||
| }, | ||
| "environment_variables": { | ||
| "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/completions", | ||
| "MULTI_MODEL_CONFIG": "{\"models\": [{\"params\": \"--served-model-name mistralai/Mistral-7B-v0.1 --seed 42 --tensor-parallel-size 1 --max-model-len 4096\", \"model_path\": \"service_models/Mistral-7B-v0.1/78814a9/artifact\"}, {\"params\": \"--served-model-name tiiuae/falcon-7b --seed 42 --tensor-parallel-size 1 --trust-remote-code\", \"model_path\": \"service_models/falcon-7b/f779652/artifact\"}]}", | ||
| "MODEL_DEPLOY_ENABLE_STREAMING": "true", | ||
| }, | ||
| } | ||
| ``` | ||
|
|
||
| #### Create Multi-Model deployment with `/v1/chat/completions` | ||
| ##### Create Multi-Model deployment with `/v1/chat/completions` | ||
|
|
||
| ```bash | ||
| ads aqua deployment create \ | ||
| --container_image_uri "dsmc://odsc-vllm-serving:0.6.4.post1.2" \ | ||
| --models '[{"model_id":"ocid1.log.oc1.iad.<ocid>", "gpu_count":1}, {"model_id":"ocid1.log.oc1.iad.<ocid>", "gpu_count":1}]' \ | ||
| --models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.<ocid>", "gpu_count":1}, {"model_id":"ocid1.datasciencemodel.oc1.iad.<ocid>", "gpu_count":1}]' \ | ||
| --env-var '{"MODEL_DEPLOY_PREDICT_ENDPOINT":"/v1/chat/completions"}' \ | ||
| --instance_shape "VM.GPU.A10.2" \ | ||
| --display_name "modelDeployment_multmodel_model1_model2" \ | ||
|
|
@@ -684,9 +698,9 @@ ads aqua deployment create \ | |
| ```json | ||
| { | ||
| "id": "ocid1.datasciencemodeldeployment.oc1.iad.<ocid>", | ||
| "display_name": "Multi model deployment of Mistral-7B-v0.1 and falcon-7b on A10.2", | ||
| "display_name": "modelDeployment_multmodel_model1_model2", | ||
| "aqua_service_model": false, | ||
| "model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>", | ||
| "model_id": "ocid1.datasciencemodelgroup.oc1.iad.<ocid>", | ||
| "models": [ | ||
| { | ||
| "model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>", | ||
|
|
@@ -717,14 +731,15 @@ ads aqua deployment create \ | |
| "memory_in_gbs": null | ||
| }, | ||
| "tags": { | ||
| "aqua_model_id": "ocid1.datasciencemodel.oc1.<ocid>", | ||
| "aqua_model_id": "ocid1.datasciencemodelgroup.oc1.<ocid>", | ||
| "aqua_multimodel": "true", | ||
| "OCI_AQUA": "active" | ||
| }, | ||
| "environment_variables": { | ||
| "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions", | ||
| "MULTI_MODEL_CONFIG": "{\"models\": [{\"params\": \"--served-model-name mistralai/Mistral-7B-v0.1 --seed 42 --tensor-parallel-size 1 --max-model-len 4096\", \"model_path\": \"service_models/Mistral-7B-v0.1/78814a9/artifact\"}, {\"params\": \"--served-model-name tiiuae/falcon-7b --seed 42 --tensor-parallel-size 1 --trust-remote-code\", \"model_path\": \"service_models/falcon-7b/f779652/artifact\"}]}", | ||
| "MODEL_DEPLOY_ENABLE_STREAMING": "true", | ||
| }, | ||
| } | ||
| ``` | ||
| #### Create Multi-Model (1 Embedding Model, 1 LLM) deployment with `/v1/completions` | ||
|
|
||
|
|
@@ -741,7 +756,6 @@ ads aqua deployment create \ | |
| --instance_shape "VM.GPU.A10.2" \ | ||
| --display_name "modelDeployment_multmodel_model1_model2" \ | ||
| --env_var '{"MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/completions"}' | ||
|
|
||
| ``` | ||
|
|
||
| ## Manage Multi-Model Deployments | ||
|
|
@@ -750,10 +764,143 @@ To list all AQUA deployments (both Multi-Model and single-model) within a specif | |
|
|
||
| Note: Multi-Model deployments are identified by the tag `"aqua_multimodel": "true",` associated with them. | ||
|
|
||
| ### Edit Multi-Model Deployments | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's add a bit more information here, to describe a bit what can be edited and which strategy will be used for updating the deployment. |
||
|
|
||
| #### Usage | ||
|
|
||
| ```bash | ||
| ads aqua deployment update [OPTIONS] | ||
| ``` | ||
|
|
||
| #### Required Parameters | ||
|
|
||
| `--model_deployment_id [str]` | ||
|
|
||
| The model deployment OCID to be updated. | ||
|
|
||
| #### Optional Parameters | ||
|
|
||
| `--models [str]` | ||
|
|
||
| The String representation of a JSON array, where each object defines a model’s OCID and the number of GPUs assigned to it. The gpu count should always be a **power of two (e.g., 1, 2, 4, 8)**. <br> | ||
| Example: `'[{"model_id":"<model_ocid>", "gpu_count":1},{"model_id":"<model_ocid>", "gpu_count":1}]'` for `VM.GPU.A10.2` shape. <br> | ||
|
|
||
| `--display_name [str]` | ||
|
|
||
| The name of model deployment. | ||
|
|
||
| `--description [str]` | ||
|
|
||
| The description of the model deployment. Defaults to None. | ||
|
|
||
| `--instance_count [int]` | ||
|
|
||
| The number of instance used for model deployment. Defaults to 1. | ||
|
|
||
| `--log_group_id [str]` | ||
|
|
||
| The oci logging group id. The access log and predict log share the same log group. | ||
|
|
||
| `--access_log_id [str]` | ||
|
|
||
| The access log OCID for the access logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details. | ||
|
|
||
| `--predict_log_id [str]` | ||
|
|
||
| The predict log OCID for the predict logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details. | ||
|
|
||
| `--web_concurrency [int]` | ||
|
|
||
| The number of worker processes/threads to handle incoming requests. | ||
|
|
||
| `--bandwidth_mbps [int]` | ||
|
|
||
| The bandwidth limit on the load balancer in Mbps. | ||
|
|
||
| `--memory_in_gbs [float]` | ||
|
|
||
| Memory (in GB) for the selected shape. | ||
|
|
||
| `--ocpus [float]` | ||
|
|
||
| OCPU count for the selected shape. | ||
|
|
||
| `--freeform_tags [dict]` | ||
|
|
||
| Freeform tags for model deployment. | ||
|
|
||
| `--defined_tags [dict]` | ||
| Defined tags for model deployment. | ||
|
|
||
| #### Example | ||
|
|
||
| ##### Edit Multi-Model deployment with `/v1/completions` | ||
|
|
||
| ```bash | ||
| ads aqua deployment update \ | ||
| --model_deployment_id "ocid1.datasciencemodeldeployment.oc1.iad.<ocid>" \ | ||
| --models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.<ocid>", "model_name":"test_updated_model_name", "gpu_count":2}]' \ | ||
| --display_name "updated_modelDeployment_multmodel_model1_model2" | ||
|
|
||
| ``` | ||
|
|
||
| ##### CLI Output | ||
|
|
||
| ```json | ||
| { | ||
| "id": "ocid1.datasciencemodeldeployment.oc1.iad.<ocid>", | ||
| "display_name": "updated_modelDeployment_multmodel_model1_model2", | ||
| "aqua_service_model": false, | ||
| "model_id": "ocid1.datasciencemodelgroup.oc1.iad.<ocid>", | ||
| "models": [ | ||
| { | ||
| "model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>", | ||
| "model_name": "mistralai/Mistral-7B-v0.1", | ||
| "gpu_count": 1, | ||
| "env_var": {} | ||
| }, | ||
| { | ||
| "model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>", | ||
| "model_name": "tiiuae/falcon-7b", | ||
| "gpu_count": 1, | ||
| "env_var": {} | ||
| } | ||
| ], | ||
| "aqua_model_name": "", | ||
| "state": "UPDATING", | ||
| "description": null, | ||
| "created_on": "2025-03-10 19:09:40.793000+00:00", | ||
| "created_by": "ocid1.user.oc1..<ocid>", | ||
| "endpoint": "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.<ocid>", | ||
| "private_endpoint_id": null, | ||
| "console_link": "https://cloud.oracle.com/data-science/model-deployments/ocid1.datasciencemodeldeployment.oc1.iad.<ocid>", | ||
| "lifecycle_details": null, | ||
| "shape_info": { | ||
| "instance_shape": "VM.GPU.A10.2", | ||
| "instance_count": 1, | ||
| "ocpus": null, | ||
| "memory_in_gbs": null | ||
| }, | ||
| "tags": { | ||
| "aqua_model_id": "ocid1.datasciencemodelgroup.oc1.<ocid>", | ||
| "aqua_multimodel": "true", | ||
| "OCI_AQUA": "active" | ||
| }, | ||
| "environment_variables": { | ||
| "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions", | ||
| "MODEL_DEPLOY_ENABLE_STREAMING": "true", | ||
| }, | ||
| } | ||
| ``` | ||
|
|
||
| # Multi-Model Inferencing | ||
|
|
||
| The only change required to infer a specific model from a Multi-Model deployment is to update the value of `"model"` parameter in the request payload. The values for this parameter can be found in the Model Deployment details, under the field name `"model_name"`. This parameter segregates the request flow, ensuring that the inference request is directed to the correct model within the MultiModel deployment. | ||
|
|
||
| ## Using AQUA UI | ||
|
|
||
|  | ||
|
|
||
| ## Using oci-cli | ||
|
|
||
| ```bash | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi Modalwas the correct one. This is different from multi-model. We actually need both here.