elastic · markjhoy · Jul 12, 2024 · Jul 8, 2024 · Jul 8, 2024 · Jul 8, 2024
diff --git a/docs/changelog/110248.yaml b/docs/changelog/110248.yaml
@@ -0,0 +1,5 @@
+pr: 110248
+summary: "[Inference API] Add Amazon Bedrock Support to Inference API"
+area: Machine Learning
+type: enhancement
+issues: [ ]
@@ -25,6 +25,7 @@ include::delete-inference.asciidoc[]
 include::get-inference.asciidoc[]
 include::post-inference.asciidoc[]
 include::put-inference.asciidoc[]
+include::service-amazon-bedrock.asciidoc[]
 include::service-azure-ai-studio.asciidoc[]
 include::service-azure-openai.asciidoc[]
 include::service-cohere.asciidoc[]

@@ -34,6 +34,7 @@ The create {infer} API enables you to create an {infer} endpoint and configure a
 
 The following services are available through the {infer} API, click the links to review the configuration details of the services:
 
+* <<infer-service-amazon-bedrock,Amazon Bedrock>>
 * <<infer-service-azure-ai-studio,Azure AI Studio>>
 * <<infer-service-azure-openai,Azure OpenAI>>
 * <<infer-service-cohere,Cohere>>

@@ -0,0 +1,173 @@
+[[infer-service-amazon-bedrock]]
+=== Amazon Bedrock {infer} service
+
+Creates an {infer} endpoint to perform an {infer} task with the `amazonbedrock` service.
+
+[discrete]
+[[infer-service-amazon-bedrock-api-request]]
+==== {api-request-title}
+
+`PUT /_inference/<task_type>/<inference_id>`
+
+[discrete]
+[[infer-service-amazon-bedrock-api-path-params]]
+==== {api-path-parms-title}
+
+`<inference_id>`::
+(Required, string)
+include::inference-shared.asciidoc[tag=inference-id]
+
+`<task_type>`::
+(Required, string)
+include::inference-shared.asciidoc[tag=task-type]
++
+--
+Available task types:
+
+* `completion`,
+* `text_embedding`.
+--
+
+[discrete]
+[[infer-service-amazon-bedrock-api-request-body]]
+==== {api-request-body-title}
+
+`service`::
+(Required, string) The type of service supported for the specified task type.
+In this case,
+`amazonbedrock`.
+
+`service_settings`::
+(Required, object)
+include::inference-shared.asciidoc[tag=service-settings]
++
+--
+These settings are specific to the `amazonbedrock` service.
+--
+
+`access_key`:::
+(Required, string)
+A valid AWS access key that has permissions to use Amazon Bedrock and access to models for inference requests.
+
+`secret_key`:::
+(Required, string)
+A valid AWS secret key that is paired with the `access_key`.
+To create or manage access and secret keys, see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html[Managing access keys for IAM users] in the AWS documentation.
+
+IMPORTANT: You need to provide the access and secret keys only once, during the {infer} model creation.
+The <<get-inference-api>> does not retrieve your access or secret keys.
+After creating the {infer} model, you cannot change the associated key pairs.
+If you want to use a different access and secret key pair, delete the {infer} model and recreate it with the same name and the updated keys.
+
+`provider`:::
+(Required, string)
+The model provider for your deployment.
+Note that some providers may support only certain task types.
+Supported providers include:
+
+* `amazontitan` - available for `text_embedding` and `completion` task types
+* `anthropic` - available for `completion` task type only
+* `ai21labs` - available for `completion` task type only
+* `cohere` - available for `text_embedding` and `completion` task types
+* `meta` - available for `completion` task type only
+* `mistral` - available for `completion` task type only
+
+`model`:::
+(Required, string)
+The base model ID or an ARN to a custom model based on a foundational model.
+The base model IDs can be found in the https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock model IDs] documentation.
+Note that the model ID must be available for the provider chosen, and your IAM user must have access to the model.
+
+`region`:::
+(Required, string)
+The region that your model or ARN is deployed in.
+The list of available regions per model can be found in the https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html[Model support by AWS region] documentation.
+
+`rate_limit`:::
+(Optional, object)
+By default, the `amazonbedrock` service sets the number of requests allowed per minute to `240`.
+This helps to minimize the number of rate limit errors returned from Amazon Bedrock.
+To modify this, set the `requests_per_minute` setting of this object in your service settings:
++
+--
+include::inference-shared.asciidoc[tag=request-per-minute-example]
+--
+
+`task_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=task-settings]
++
+.`task_settings` for the `completion` task type
+[%collapsible%closed]
+=====
+
+`max_new_tokens`:::
+(Optional, integer)
+Provides a hint for the maximum number of output tokens to be generated.
-Provides a hint for the maximum number of output tokens to be generated.
+Sets a maximum number for the output tokens to be generated.
-Provides a hint for the maximum number of output tokens to be generated.
+Sets a maximum number for the output tokens to be generated.
+Defaults to 64.
+
+`temperature`:::
+(Optional, float)
+A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions.
-A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions.
+A number between 0.0 and 1.0 that controls the apparent creativity of the results. At temperature 0.0 the model is most deterministic, at temperature 1.0 most random.
-A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions.
+A number between 0.0 and 1.0 that controls the apparent creativity of the results. At temperature 0.0 the model is most deterministic, at temperature 1.0 most random.
+Should not be used if `top_p` or `top_k` is specified.
+
+`top_p`:::
+(Optional, float)
+A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability.
-A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability.
+Alternative to `temperature`. A number in the range of 0.0 to 1.0, to eliminate low-probability tokens. Top-p uses nucleus sampling to select top tokens whose sum of likelihoods does not exceed a certain value, ensuring both variety and coherence.
-A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability.
+Alternative to `temperature`. A number in the range of 0.0 to 1.0, to eliminate low-probability tokens. Top-p uses nucleus sampling to select top tokens whose sum of likelihoods does not exceed a certain value, ensuring both variety and coherence.
+Should not be used if `temperature` or `top_k` is specified.
+
+`top_p`:::
-`top_p`:::
+`top_k`:::
-`top_p`:::
+`top_k`:::
+(Optional, float)
+Only available for `anthropic`, `cohere`, and `mistral` providers.
+A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability.
-A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability.
+Alternative to `temperature`. Limits samples to the top-K most likely words, balancing coherence and variability.
+A number in the range of 0.0 to 1.0.
-A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability.
+Alternative to `temperature`. Limits samples to the top-K most likely words, balancing coherence and variability.
+A number in the range of 0.0 to 1.0.
+Should not be used if `temperature` or `top_p` is specified.
+
+=====
++
+.`task_settings` for the `text_embedding` task type
+[%collapsible%closed]
+=====
+
+There are no `task_settings` available for the `text_embedding` task type.
+
+[discrete]
+[[inference-example-amazonbedrock]]
+==== Amazon Bedrock service example
+
+The following example shows how to create an {infer} endpoint called `amazon_bedrock_embeddings` to perform a `text_embedding` task type.
+
+The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to.
-The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to.
+Choose chat completion and embeddings models you have access to from the https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base models].
-The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to.
+Choose chat completion and embeddings models you have access to from the https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base models].
+
+[source,console]
+------------------------------------------------------------
+PUT _inference/text_embedding/amazon_bedrock_embeddings
+{
+    "service": "amazonbedrock",
+    "service_settings": {
+        "access_key": "<aws_access_key>",
+        "secret_key": "<aws_secret_key>",
+        "region": "us-east-1",
+        "provider": "amazontitan",
+        "model": "amazon.titan-embed-text-v2:0"
+    }
+}
+------------------------------------------------------------
+// TEST[skip:TBD]
+
+The next example shows how to create an {infer} endpoint called `amazon_bedrock_completion` to perform a `completion` task type.
+
+[source,console]
+------------------------------------------------------------
+PUT _inference/completion/amazon_bedrock_completion
+{
+    "service": "amazonbedrock",
+    "service_settings": {
+        "access_key": "<aws_access_key>",
+        "secret_key": "<aws_secret_key>",
+        "region": "us-east-1",
+        "provider": "amazontitan",
+        "model": "amazon.titan-text-premier-v1:0"
+    }
+}
+------------------------------------------------------------
+// TEST[skip:TBD]
@@ -17,6 +17,7 @@ For a list of supported models available on HuggingFace, refer to
 Azure based examples use models available through https://ai.azure.com/explore/models?selectedTask=embeddings[Azure AI Studio]
 or https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models[Azure OpenAI].
 Mistral examples use the `mistral-embed` model from https://docs.mistral.ai/getting-started/models/[the Mistral API].
+Amazon Bedrock examples use the `amazon.titan-embed-text-v1` model from https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[the Amazon Bedrock base models].
 
 Click the name of the service you want to use on any of the widgets below to review the corresponding instructions.
 

@@ -37,6 +37,12 @@
             id="infer-api-ingest-mistral">
       Mistral
     </button>
+    <button role="tab"
+            aria-selected="false"
+            aria-controls="infer-api-ingest-amazon-bedrock-tab"
+            id="infer-api-ingest-amazon-bedrock">
+      Amazon Bedrock
+    </button>
   </div>
   <div tabindex="0"
        role="tabpanel"
@@ -101,6 +107,17 @@ include::infer-api-ingest-pipeline.asciidoc[tag=azure-ai-studio]
 
 include::infer-api-ingest-pipeline.asciidoc[tag=mistral]
 
+++++
+  </div>
+  <div tabindex="0"
+       role="tabpanel"
+       id="infer-api-ingest-amazon-bedrock-tab"
+       aria-labelledby="infer-api-ingest-amazon-bedrock"
+       hidden="">
+++++
+
+include::infer-api-ingest-pipeline.asciidoc[tag=amazon-bedrock]
+
 ++++
   </div>
 </div>

@@ -164,3 +164,29 @@ PUT _ingest/pipeline/mistral_embeddings
 and the `output_field` that will contain the {infer} results.
 
 // end::mistral[]
+
+// tag::amazon-bedrock[]
+
+[source,console]
+--------------------------------------------------
+PUT _ingest/pipeline/amazon_bedrock_embeddings
+{
+  "processors": [
+    {
+      "inference": {
+        "model_id": "amazon_bedrock_embeddings", <1>
+        "input_output": { <2>
+          "input_field": "content",
+          "output_field": "content_embedding"
+        }
+      }
+    }
+  ]
+}
+--------------------------------------------------
+<1> The name of the inference endpoint you created by using the
+<<put-inference-api>>, it's referred to as `inference_id` in that step.
+<2> Configuration object that defines the `input_field` for the {infer} process
+and the `output_field` that will contain the {infer} results.
+
+// end::amazon-bedrock[]
@@ -37,6 +37,12 @@
             id="infer-api-mapping-mistral">
       Mistral
     </button>
+    <button role="tab"
+            aria-selected="false"
+            aria-controls="infer-api-mapping-amazon-bedrock-tab"
+            id="infer-api-mapping-amazon-bedrock">
+      Amazon Bedrock
+    </button>
   </div>
   <div tabindex="0"
        role="tabpanel"
@@ -101,6 +107,17 @@ include::infer-api-mapping.asciidoc[tag=azure-ai-studio]
 
 include::infer-api-mapping.asciidoc[tag=mistral]
 
+++++
+  </div>
+  <div tabindex="0"
+       role="tabpanel"
+       id="infer-api-mapping-amazon-bedrock-tab"
+       aria-labelledby="infer-api-mapping-amazon-bedrock"
+       hidden="">
+++++
+
+include::infer-api-mapping.asciidoc[tag=amazon-bedrock]
+
 ++++
   </div>
 </div>

@@ -207,3 +207,38 @@ the {infer} pipeline configuration in the next step.
 <6> The field type which is text in this example.
 
 // end::mistral[]
+
+// tag::amazon-bedrock[]
+
+[source,console]
+--------------------------------------------------
+PUT amazon-bedrock-embeddings
+{
+  "mappings": {
+    "properties": {
+      "content_embedding": { <1>
+        "type": "dense_vector", <2>
+        "dims": 1024, <3>
+        "element_type": "float",
+        "similarity": "dot_product" <4>
+      },
+      "content": { <5>
+        "type": "text" <6>
+      }
+    }
+  }
+}
+--------------------------------------------------
+<1> The name of the field to contain the generated tokens. It must be referenced
+in the {infer} pipeline configuration in the next step.
+<2> The field to contain the tokens is a `dense_vector` field.
+<3> The output dimensions of the model. This value may be different depending on the underlying model used.
+See the https://docs.aws.amazon.com/bedrock/latest/userguide/titan-multiemb-models.html[Amazon Titan model] or the https://docs.cohere.com/reference/embed[Cohere Embeddings model] documentation.
+<4> For Amazon Bedrock embeddings, the `dot_product` function should be used to
+calculate similarity for Amazon titan models, or `cosine` for Cohere models.
+<5> The name of the field from which to create the dense vector representation.
+In this example, the name of the field is `content`. It must be referenced in
+the {infer} pipeline configuration in the next step.
+<6> The field type which is text in this example.
+
+// end::amazon-bedrock[]
@@ -37,6 +37,12 @@
             id="infer-api-reindex-mistral">
       Mistral
     </button>
+    <button role="tab"
+            aria-selected="false"
+            aria-controls="infer-api-reindex-amazon-bedrock-tab"
+            id="infer-api-reindex-amazon-bedrock">
+      Amazon Bedrock
+    </button>
   </div>
   <div tabindex="0"
        role="tabpanel"
@@ -101,6 +107,17 @@ include::infer-api-reindex.asciidoc[tag=azure-ai-studio]
 
 include::infer-api-reindex.asciidoc[tag=mistral]
 
+++++
+  </div>
+  <div tabindex="0"
+       role="tabpanel"
+       id="infer-api-reindex-amazon-bedrock-tab"
+       aria-labelledby="infer-api-reindex-amazon-bedrock"
+       hidden="">
+++++
+
+include::infer-api-reindex.asciidoc[tag=amazon-bedrock]
+
 ++++
   </div>
 </div>

@@ -154,3 +154,26 @@ number makes the update of the reindexing process quicker which enables you to
 follow the progress closely and detect errors early.
 
 // end::mistral[]
+
+// tag::amazon-bedrock[]
+
+[source,console]
+----
+POST _reindex?wait_for_completion=false
+{
+  "source": {
+    "index": "test-data",
+    "size": 50 <1>
+  },
+  "dest": {
+    "index": "amazon-bedrock-embeddings",
+    "pipeline": "amazon_bedrock_embeddings"
+  }
+}
+----
+// TEST[skip:TBD]
+<1> The default batch size for reindexing is 1000. Reducing `size` to a smaller
+number makes the update of the reindexing process quicker which enables you to
+follow the progress closely and detect errors early.
+
+// end::amazon-bedrock[]