oracle-samples · lu-ohai · Oct 14, 2025 · Oct 14, 2025 · Oct 14, 2025 · Oct 14, 2025
diff --git a/ai-quick-actions/model-deployment-tips.md b/ai-quick-actions/model-deployment-tips.md
@@ -8,7 +8,8 @@ Table of Contents:
 - [Model Fine Tuning](fine-tuning-tips.md)
 - [Model Evaluation](evaluation-tips.md)
 - [Model Registration](register-tips.md)
-- [Multi Modal Inferencing](multimodal-models-tips.md)
+- [Multi Model Inferencing](multimodal-models-tips.md)
+- [Stacked Model Inferencing](stacked-deployment-tips.md)
 - [Private_Endpoints](model-deployment-private-endpoint-tips.md)
 - [Tool Calling](model-deployment-tool-calling-tips.md)
 
@@ -918,4 +919,4 @@ Table of Contents:
 - [Model Registration](register-tips.md)
 - [Multi Modal Inferencing](multimodal-models-tips.md)
 - [Private_Endpoints](model-deployment-private-endpoint-tips.md)
-- [Tool Calling](model-deployment-tool-calling-tips.md)
+- [Tool Calling](model-deployment-tool-calling-tips.md)
diff --git a/ai-quick-actions/multimodel-deployment-tips.md b/ai-quick-actions/multimodel-deployment-tips.md
@@ -28,14 +28,24 @@ For fine-tuned models, requests specifying the base model name (ex. model: meta-
   - [Setup](#setup)
       - [For AQUA CLI](#for-aqua-cli)
 - [Using AQUA UI Interface for Multi-Model Deployment](#using-aqua-ui-interface-for-multi-model-deployment)
-    - [Select the 'Create deployment' Button](#select-the-create-deployment-button)
-    - [Select 'Deploy Multi Model'](#select-deploy-multi-model)
-    - [Inferencing with Multi-Model Deployment](#inferencing-with-multi-model-deployment)
+  - [Select the 'Create deployment' Button](#select-the-create-deployment-button)
+  - [Select 'Deploy Multi Model'](#select-deploy-multi-model)
+  - [Inferencing with Multi-Model Deployment](#inferencing-with-multi-model-deployment)
 - [Using AQUA CLI for Multi-Model Deployment](#using-aqua-cli-for-multi-model-deployment)
   - [1. Obtain Model OCIDs](#1-obtain-model-ocids)
     - [Service Managed Models](#service-managed-models)
     - [Fine-Tuned Models](#fine-tuned-models)
     - [Custom Models](#custom-models)
+- [Multi-Model Deployment](#multi-model-deployment)
+  - [List Available Shapes](#list-available-shapes)
+  - [Get Multi-Model Configuration](#get-multi-model-configuration)
+  - [Manage Multi-Model Deployments](#manage-multi-model-deployments)
+    - [List Multi-Model Deployments](#list-multi-model-deployments)
+    - [Edit Multi-Model Deployments](#edit-multi-model-deployments)
+- [Multi-Model Inferencing](#multi-model-inferencing)
+- [Multi-Model Evaluation](#multi-model-evaluation)
+  - [Create Model Evaluation](#create-model-evaluations)
+- [Limitation](#limitations)
   - [2. Before Deployment, Check Resource Limits](#2-before-deployment-check-resource-limits)
     - [List Available Shapes](#list-available-shapes)
       - [Usage](#usage)
@@ -110,7 +120,7 @@ Only Multi-Model Deployments with **base service LLM models (text-generation)**
 There are two ways to send inference requests to models within a Multi-Model Deployment
 
 1. Python SDK (recommended)- see [here](#Multi-Model-Inferencing)
-2. Using AQUA UI (see below, ok for testing)
+2. Using AQUA UI - see [here](#Create-Multi-Model-Deployment)
 
 Once the Deployment is Active, view the model deployment details and inferencing form by clicking on the 'Deployments' Tab and selecting the model within the Model Deployment list.
 
@@ -470,22 +480,25 @@ ads aqua deployment get_multimodel_deployment_config --model_ids '["ocid1.datasc
 }
 ```
 
-## 3. Create Multi-Model Deployment
+### ADS CLI
 
-Only **base service LLM models** are supported for MultiModel Deployment. All selected models will run on the same **GPU shape**, sharing the available compute resources. Make sure to choose a shape that meets the needs of all models in your deployment using [MultiModel Configuration command](#get-multimodel-configuration)
+#### Description
 
+You'll need the latest version of ADS to create a new Aqua MultiModel deployment. Installation instructions are available [here](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/quickstart.html).
 
-### Description
+Only fine tuned model with version `V2` is allowed to be deployed as weights in Multi Deployment. For deploying old fine tuned model weight, run the following command to convert it to version `V2` and apply the new fine tuned model OCID to the deployment creation. This command deletes the old fine tuned model by default after conversion but you can add ``--delete_model False`` to keep it instead.
 
-Creates a new Aqua MultiModel deployment.
+```bash
+ads aqua model convert_fine_tune --model_id [FT_OCID]
+```
 
-### Usage
+#### Usage
 
 ```bash
 ads aqua deployment create [OPTIONS]
 ```
 
-### Required Parameters
+#### Required Parameters
 
 `--models [str]`
 
@@ -553,7 +566,7 @@ The URI of the inference container associated with the model being registered. I
 Example: `dsmc://odsc-vllm-serving:0.6.4.post1.2` or `dsmc://odsc-vllm-serving:0.8.1.2`
 
 
-### Optional Parameters
+#### Optional Parameters
 
 `--compartment_id [str]`
 
@@ -604,28 +617,28 @@ Environment variable for the model deployment, defaults to None.
 The private endpoint id of model deployment.
 
 
-### Example
+#### Example
 
-#### Create Multi-Model deployment with `/v1/completions`
+##### Create Multi-Model deployment with `/v1/completions`
 
 ```bash
 ads aqua deployment create \
   --container_image_uri "dsmc://odsc-vllm-serving:0.6.4.post1.2" \
-  --models '[{"model_id":"ocid1.log.oc1.iad.<ocid>", "gpu_count":1}, {"model_id":"ocid1.log.oc1.iad.<ocid>", "gpu_count":1}]' \
+  --models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.<ocid>", "gpu_count":1}, {"model_id":"ocid1.datasciencemodel.oc1.iad.<ocid>", "gpu_count":1}]' \
   --instance_shape "VM.GPU.A10.2" \
   --display_name "modelDeployment_multmodel_model1_model2" \
   --env_var '{"MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/completions"}'
 
 ```
 
-##### CLI Output
+###### CLI Output
 
 ```json
 {
     "id": "ocid1.datasciencemodeldeployment.oc1.iad.<ocid>",
-    "display_name": "Multi model deployment of Mistral-7B-v0.1 and falcon-7b on A10.2",
+    "display_name": "modelDeployment_multmodel_model1_model2",
     "aqua_service_model": false,
-    "model_id": "ocid1.datasciencemodel.oc1.<ocid>",
+    "model_id": "ocid1.datasciencemodelgroup.oc1.<ocid>",
     "models": [
         {
             "model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>",
@@ -656,22 +669,23 @@ ads aqua deployment create \
         "memory_in_gbs": null
     },
     "tags": {
-        "aqua_model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>",
+        "aqua_model_id": "ocid1.datasciencemodelgroup.oc1.iad.<ocid>",
         "aqua_multimodel": "true",
         "OCI_AQUA": "active"
     },
     "environment_variables": {
         "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/completions",
-        "MULTI_MODEL_CONFIG": "{\"models\": [{\"params\": \"--served-model-name mistralai/Mistral-7B-v0.1 --seed 42 --tensor-parallel-size 1 --max-model-len 4096\", \"model_path\": \"service_models/Mistral-7B-v0.1/78814a9/artifact\"}, {\"params\": \"--served-model-name tiiuae/falcon-7b --seed 42 --tensor-parallel-size 1 --trust-remote-code\", \"model_path\": \"service_models/falcon-7b/f779652/artifact\"}]}",
         "MODEL_DEPLOY_ENABLE_STREAMING": "true",
+    },
+}
 ```
 
-#### Create Multi-Model deployment with `/v1/chat/completions`
+##### Create Multi-Model deployment with `/v1/chat/completions`
 
 ```bash
 ads aqua deployment create \
   --container_image_uri "dsmc://odsc-vllm-serving:0.6.4.post1.2" \
-  --models '[{"model_id":"ocid1.log.oc1.iad.<ocid>", "gpu_count":1}, {"model_id":"ocid1.log.oc1.iad.<ocid>", "gpu_count":1}]' \
+  --models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.<ocid>", "gpu_count":1}, {"model_id":"ocid1.datasciencemodel.oc1.iad.<ocid>", "gpu_count":1}]' \
   --env-var '{"MODEL_DEPLOY_PREDICT_ENDPOINT":"/v1/chat/completions"}' \
   --instance_shape "VM.GPU.A10.2" \
   --display_name "modelDeployment_multmodel_model1_model2" \
@@ -684,9 +698,9 @@ ads aqua deployment create \
 ```json
 {
     "id": "ocid1.datasciencemodeldeployment.oc1.iad.<ocid>",
-    "display_name": "Multi model deployment of Mistral-7B-v0.1 and falcon-7b on A10.2",
+    "display_name": "modelDeployment_multmodel_model1_model2",
     "aqua_service_model": false,
-    "model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>",
+    "model_id": "ocid1.datasciencemodelgroup.oc1.iad.<ocid>",
     "models": [
         {
             "model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>",
@@ -717,14 +731,15 @@ ads aqua deployment create \
         "memory_in_gbs": null
     },
     "tags": {
-        "aqua_model_id": "ocid1.datasciencemodel.oc1.<ocid>",
+        "aqua_model_id": "ocid1.datasciencemodelgroup.oc1.<ocid>",
         "aqua_multimodel": "true",
         "OCI_AQUA": "active"
     },
     "environment_variables": {
         "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions",
-        "MULTI_MODEL_CONFIG": "{\"models\": [{\"params\": \"--served-model-name mistralai/Mistral-7B-v0.1 --seed 42 --tensor-parallel-size 1 --max-model-len 4096\", \"model_path\": \"service_models/Mistral-7B-v0.1/78814a9/artifact\"}, {\"params\": \"--served-model-name tiiuae/falcon-7b --seed 42 --tensor-parallel-size 1 --trust-remote-code\", \"model_path\": \"service_models/falcon-7b/f779652/artifact\"}]}",
         "MODEL_DEPLOY_ENABLE_STREAMING": "true",
+    },
+}
 ```
 #### Create Multi-Model (1 Embedding Model, 1 LLM) deployment with `/v1/completions`
 
@@ -741,7 +756,6 @@ ads aqua deployment create \
   --instance_shape "VM.GPU.A10.2" \
   --display_name "modelDeployment_multmodel_model1_model2" \
   --env_var '{"MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/completions"}'
-
 ```
 
 ## Manage Multi-Model Deployments
@@ -750,10 +764,143 @@ To list all AQUA deployments (both Multi-Model and single-model) within a specif
 
 Note: Multi-Model deployments are identified by the tag `"aqua_multimodel": "true",` associated with them.
 
+### Edit Multi-Model Deployments
+
+#### Usage
+
+```bash
+ads aqua deployment update [OPTIONS]
+```
+
+#### Required Parameters
+
+`--model_deployment_id [str]`
+
+The model deployment OCID to be updated.
+
+#### Optional Parameters
+
+`--models [str]`
+
+The String representation of a JSON array, where each object defines a model’s OCID and the number of GPUs assigned to it. The gpu count should always be a **power of two (e.g., 1, 2, 4, 8)**. <br>
+Example: `'[{"model_id":"<model_ocid>", "gpu_count":1},{"model_id":"<model_ocid>", "gpu_count":1}]'` for  `VM.GPU.A10.2` shape. <br>
+
+`--display_name [str]`
+
+The name of model deployment.
+
+`--description [str]`
+
+The description of the model deployment. Defaults to None.
+
+`--instance_count [int]`
+
+The number of instance used for model deployment. Defaults to 1.
+
+`--log_group_id [str]`
+
+The oci logging group id. The access log and predict log share the same log group.
+
+`--access_log_id [str]`
+
+The access log OCID for the access logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details.
+
+`--predict_log_id [str]`
+
+The predict log OCID for the predict logs. Check [model deployment logging](https://docs.oracle.com/en-us/iaas/data-science/using/model_dep_using_logging.htm) for more details.
+
+`--web_concurrency [int]`
+
+The number of worker processes/threads to handle incoming requests.
+
+`--bandwidth_mbps [int]`
+
+The bandwidth limit on the load balancer in Mbps.
+
+`--memory_in_gbs [float]`
+
+Memory (in GB) for the selected shape.
+
+`--ocpus [float]`
+
+OCPU count for the selected shape.
+
+`--freeform_tags [dict]`
+
+Freeform tags for model deployment.
+
+`--defined_tags [dict]`
+Defined tags for model deployment.
+
+#### Example
+
+##### Edit Multi-Model deployment with `/v1/completions`
+
+```bash
+ads aqua deployment update \
+  --model_deployment_id "ocid1.datasciencemodeldeployment.oc1.iad.<ocid>" \
+  --models '[{"model_id":"ocid1.datasciencemodel.oc1.iad.<ocid>", "model_name":"test_updated_model_name", "gpu_count":2}]' \
+  --display_name "updated_modelDeployment_multmodel_model1_model2"
+
+```
+
+##### CLI Output
+
+```json
+{
+    "id": "ocid1.datasciencemodeldeployment.oc1.iad.<ocid>",
+    "display_name": "updated_modelDeployment_multmodel_model1_model2",
+    "aqua_service_model": false,
+    "model_id": "ocid1.datasciencemodelgroup.oc1.iad.<ocid>",
+    "models": [
+        {
+            "model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>",
+            "model_name": "mistralai/Mistral-7B-v0.1",
+            "gpu_count": 1,
+            "env_var": {}
+        },
+        {
+            "model_id": "ocid1.datasciencemodel.oc1.iad.<ocid>",
+            "model_name": "tiiuae/falcon-7b",
+            "gpu_count": 1,
+            "env_var": {}
+        }
+    ],
+    "aqua_model_name": "",
+    "state": "UPDATING",
+    "description": null,
+    "created_on": "2025-03-10 19:09:40.793000+00:00",
+    "created_by": "ocid1.user.oc1..<ocid>",
+    "endpoint": "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.<ocid>",
+    "private_endpoint_id": null,
+    "console_link": "https://cloud.oracle.com/data-science/model-deployments/ocid1.datasciencemodeldeployment.oc1.iad.<ocid>",
+    "lifecycle_details": null,
+    "shape_info": {
+        "instance_shape": "VM.GPU.A10.2",
+        "instance_count": 1,
+        "ocpus": null,
+        "memory_in_gbs": null
+    },
+    "tags": {
+        "aqua_model_id": "ocid1.datasciencemodelgroup.oc1.<ocid>",
+        "aqua_multimodel": "true",
+        "OCI_AQUA": "active"
+    },
+    "environment_variables": {
+        "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions",
+        "MODEL_DEPLOY_ENABLE_STREAMING": "true",
+    },
+}
+```
+
 # Multi-Model Inferencing
 
 The only change required to infer a specific model from a Multi-Model deployment is to update the value of `"model"` parameter in the request payload. The values for this parameter can be found in the Model Deployment details, under the field name `"model_name"`. This parameter segregates the request flow, ensuring that the inference request is directed to the correct model within the MultiModel deployment.
 
+## Using AQUA UI
+
+![Inferencing](web_assets/try-multi-model.png)
+
 ## Using oci-cli
 
 ```bash