Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/110248.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 110248
summary: "[Inference API] Add Amazon Bedrock Support to Inference API"
area: Machine Learning
type: enhancement
issues: [ ]
1 change: 1 addition & 0 deletions docs/reference/inference/inference-apis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ include::delete-inference.asciidoc[]
include::get-inference.asciidoc[]
include::post-inference.asciidoc[]
include::put-inference.asciidoc[]
include::service-amazon-bedrock.asciidoc[]
include::service-azure-ai-studio.asciidoc[]
include::service-azure-openai.asciidoc[]
include::service-cohere.asciidoc[]
Expand Down
1 change: 1 addition & 0 deletions docs/reference/inference/put-inference.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ The create {infer} API enables you to create an {infer} endpoint and configure a

The following services are available through the {infer} API, click the links to review the configuration details of the services:

* <<infer-service-amazon-bedrock,Amazon Bedrock>>
* <<infer-service-azure-ai-studio,Azure AI Studio>>
* <<infer-service-azure-openai,Azure OpenAI>>
* <<infer-service-cohere,Cohere>>
Expand Down
173 changes: 173 additions & 0 deletions docs/reference/inference/service-amazon-bedrock.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
[[infer-service-amazon-bedrock]]
=== Amazon Bedrock {infer} service

Creates an {infer} endpoint to perform an {infer} task with the `amazonbedrock` service.

[discrete]
[[infer-service-amazon-bedrock-api-request]]
==== {api-request-title}

`PUT /_inference/<task_type>/<inference_id>`

[discrete]
[[infer-service-amazon-bedrock-api-path-params]]
==== {api-path-parms-title}

`<inference_id>`::
(Required, string)
include::inference-shared.asciidoc[tag=inference-id]

`<task_type>`::
(Required, string)
include::inference-shared.asciidoc[tag=task-type]
+
--
Available task types:

* `completion`,
* `text_embedding`.
--

[discrete]
[[infer-service-amazon-bedrock-api-request-body]]
==== {api-request-body-title}

`service`::
(Required, string) The type of service supported for the specified task type.
In this case,
`amazonbedrock`.

`service_settings`::
(Required, object)
include::inference-shared.asciidoc[tag=service-settings]
+
--
These settings are specific to the `amazonbedrock` service.
--

`access_key`:::
(Required, string)
A valid AWS access key that has permissions to use Amazon Bedrock and access to models for inference requests.

`secret_key`:::
(Required, string)
A valid AWS secret key that is paired with the `access_key`.
To create or manage access and secret keys, see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html[Managing access keys for IAM users] in the AWS documentation.

IMPORTANT: You need to provide the access and secret keys only once, during the {infer} model creation.
The <<get-inference-api>> does not retrieve your access or secret keys.
After creating the {infer} model, you cannot change the associated key pairs.
If you want to use a different access and secret key pair, delete the {infer} model and recreate it with the same name and the updated keys.

`provider`:::
(Required, string)
The model provider for your deployment.
Note that some providers may support only certain task types.
Supported providers include:

* `amazontitan` - available for `text_embedding` and `completion` task types
* `anthropic` - available for `completion` task type only
* `ai21labs` - available for `completion` task type only
* `cohere` - available for `text_embedding` and `completion` task types
* `meta` - available for `completion` task type only
* `mistral` - available for `completion` task type only

`model`:::
(Required, string)
The base model ID or an ARN to a custom model based on a foundational model.
The base model IDs can be found in the https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock model IDs] documentation.
Note that the model ID must be available for the provider chosen, and your IAM user must have access to the model.

`region`:::
(Required, string)
The region that your model or ARN is deployed in.
The list of available regions per model can be found in the https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html[Model support by AWS region] documentation.

`rate_limit`:::
(Optional, object)
By default, the `amazonbedrock` service sets the number of requests allowed per minute to `240`.
This helps to minimize the number of rate limit errors returned from Amazon Bedrock.
To modify this, set the `requests_per_minute` setting of this object in your service settings:
+
--
include::inference-shared.asciidoc[tag=request-per-minute-example]
--

`task_settings`::
(Optional, object)
include::inference-shared.asciidoc[tag=task-settings]
+
.`task_settings` for the `completion` task type
[%collapsible%closed]
=====

`max_new_tokens`:::
(Optional, integer)
Provides a hint for the maximum number of output tokens to be generated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Provides a hint for the maximum number of output tokens to be generated.
Sets a maximum number for the output tokens to be generated.

Not sure what "hint" means here, rewording tries to clarify

Defaults to 64.

`temperature`:::
(Optional, float)
A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions.
A number between 0.0 and 1.0 that controls the apparent creativity of the results. At temperature 0.0 the model is most deterministic, at temperature 1.0 most random.

Should not be used if `top_p` or `top_k` is specified.

`top_p`:::
(Optional, float)
A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability.
Alternative to `temperature`. A number in the range of 0.0 to 1.0, to eliminate low-probability tokens. Top-p uses nucleus sampling to select top tokens whose sum of likelihoods does not exceed a certain value, ensuring both variety and coherence.

Should not be used if `temperature` or `top_k` is specified.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading around it looks like top-p and top-k can be used in combination?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - you're correct here... theoretically, you can use all three, but you shouldn't use temperature and top_p at the same time. For reference, see the parameters in Amazon's Anthropic docs


`top_p`:::

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`top_p`:::
`top_k`:::

Assuming the first top_p is the correct one 😉

(Optional, float)
Only available for `anthropic`, `cohere`, and `mistral` providers.
A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability.
Alternative to `temperature`. Limits samples to the top-K most likely words, balancing coherence and variability.
A number in the range of 0.0 to 1.0.

Should not be used if `temperature` or `top_p` is specified.

=====
+
.`task_settings` for the `text_embedding` task type
[%collapsible%closed]
=====

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markjhoy I think this unclosed ==== block might be breaking your build :)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks! I could not figure out for the life of me where that error was coming from!


There are no `task_settings` available for the `text_embedding` task type.

[discrete]
[[inference-example-amazonbedrock]]
==== Amazon Bedrock service example

The following example shows how to create an {infer} endpoint called `amazon_bedrock_embeddings` to perform a `text_embedding` task type.

The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to.

@leemthompo leemthompo Jul 11, 2024

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to.
Choose chat completion and embeddings models you have access to from the https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base models].

nit: keep sentence short


[source,console]
------------------------------------------------------------
PUT _inference/text_embedding/amazon_bedrock_embeddings
{
"service": "amazonbedrock",
"service_settings": {
"access_key": "<aws_access_key>",
"secret_key": "<aws_secret_key>",
"region": "us-east-1",
"provider": "amazontitan",
"model": "amazon.titan-embed-text-v2:0"
}
}
------------------------------------------------------------
// TEST[skip:TBD]

The next example shows how to create an {infer} endpoint called `amazon_bedrock_completion` to perform a `completion` task type.

[source,console]
------------------------------------------------------------
PUT _inference/completion/amazon_bedrock_completion
{
"service": "amazonbedrock",
"service_settings": {
"access_key": "<aws_access_key>",
"secret_key": "<aws_secret_key>",
"region": "us-east-1",
"provider": "amazontitan",
"model": "amazon.titan-text-premier-v1:0"
}
}
------------------------------------------------------------
// TEST[skip:TBD]
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ For a list of supported models available on HuggingFace, refer to
Azure based examples use models available through https://ai.azure.com/explore/models?selectedTask=embeddings[Azure AI Studio]
or https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models[Azure OpenAI].
Mistral examples use the `mistral-embed` model from https://docs.mistral.ai/getting-started/models/[the Mistral API].
Amazon Bedrock examples use the `amazon.titan-embed-text-v1` model from https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[the Amazon Bedrock base models].

Click the name of the service you want to use on any of the widgets below to review the corresponding instructions.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@
id="infer-api-ingest-mistral">
Mistral
</button>
<button role="tab"
aria-selected="false"
aria-controls="infer-api-ingest-amazon-bedrock-tab"
id="infer-api-ingest-amazon-bedrock">
Amazon Bedrock
</button>
</div>
<div tabindex="0"
role="tabpanel"
Expand Down Expand Up @@ -101,6 +107,17 @@ include::infer-api-ingest-pipeline.asciidoc[tag=azure-ai-studio]

include::infer-api-ingest-pipeline.asciidoc[tag=mistral]

++++
</div>
<div tabindex="0"
role="tabpanel"
id="infer-api-ingest-amazon-bedrock-tab"
aria-labelledby="infer-api-ingest-amazon-bedrock"
hidden="">
++++

include::infer-api-ingest-pipeline.asciidoc[tag=amazon-bedrock]

++++
</div>
</div>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -164,3 +164,29 @@ PUT _ingest/pipeline/mistral_embeddings
and the `output_field` that will contain the {infer} results.

// end::mistral[]

// tag::amazon-bedrock[]

[source,console]
--------------------------------------------------
PUT _ingest/pipeline/amazon_bedrock_embeddings
{
"processors": [
{
"inference": {
"model_id": "amazon_bedrock_embeddings", <1>
"input_output": { <2>
"input_field": "content",
"output_field": "content_embedding"
}
}
}
]
}
--------------------------------------------------
<1> The name of the inference endpoint you created by using the
<<put-inference-api>>, it's referred to as `inference_id` in that step.
<2> Configuration object that defines the `input_field` for the {infer} process
and the `output_field` that will contain the {infer} results.

// end::amazon-bedrock[]
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@
id="infer-api-mapping-mistral">
Mistral
</button>
<button role="tab"
aria-selected="false"
aria-controls="infer-api-mapping-amazon-bedrock-tab"
id="infer-api-mapping-amazon-bedrock">
Amazon Bedrock
</button>
</div>
<div tabindex="0"
role="tabpanel"
Expand Down Expand Up @@ -101,6 +107,17 @@ include::infer-api-mapping.asciidoc[tag=azure-ai-studio]

include::infer-api-mapping.asciidoc[tag=mistral]

++++
</div>
<div tabindex="0"
role="tabpanel"
id="infer-api-mapping-amazon-bedrock-tab"
aria-labelledby="infer-api-mapping-amazon-bedrock"
hidden="">
++++

include::infer-api-mapping.asciidoc[tag=amazon-bedrock]

++++
</div>
</div>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -207,3 +207,38 @@ the {infer} pipeline configuration in the next step.
<6> The field type which is text in this example.

// end::mistral[]

// tag::amazon-bedrock[]

[source,console]
--------------------------------------------------
PUT amazon-bedrock-embeddings
{
"mappings": {
"properties": {
"content_embedding": { <1>
"type": "dense_vector", <2>
"dims": 1024, <3>
"element_type": "float",
"similarity": "dot_product" <4>
},
"content": { <5>
"type": "text" <6>
}
}
}
}
--------------------------------------------------
<1> The name of the field to contain the generated tokens. It must be referenced
in the {infer} pipeline configuration in the next step.
<2> The field to contain the tokens is a `dense_vector` field.
<3> The output dimensions of the model. This value may be different depending on the underlying model used.
See the https://docs.aws.amazon.com/bedrock/latest/userguide/titan-multiemb-models.html[Amazon Titan model] or the https://docs.cohere.com/reference/embed[Cohere Embeddings model] documentation.
<4> For Amazon Bedrock embeddings, the `dot_product` function should be used to
calculate similarity for Amazon titan models, or `cosine` for Cohere models.
<5> The name of the field from which to create the dense vector representation.
In this example, the name of the field is `content`. It must be referenced in
the {infer} pipeline configuration in the next step.
<6> The field type which is text in this example.

// end::amazon-bedrock[]
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@
id="infer-api-reindex-mistral">
Mistral
</button>
<button role="tab"
aria-selected="false"
aria-controls="infer-api-reindex-amazon-bedrock-tab"
id="infer-api-reindex-amazon-bedrock">
Amazon Bedrock
</button>
</div>
<div tabindex="0"
role="tabpanel"
Expand Down Expand Up @@ -101,6 +107,17 @@ include::infer-api-reindex.asciidoc[tag=azure-ai-studio]

include::infer-api-reindex.asciidoc[tag=mistral]

++++
</div>
<div tabindex="0"
role="tabpanel"
id="infer-api-reindex-amazon-bedrock-tab"
aria-labelledby="infer-api-reindex-amazon-bedrock"
hidden="">
++++

include::infer-api-reindex.asciidoc[tag=amazon-bedrock]

++++
</div>
</div>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,3 +154,26 @@ number makes the update of the reindexing process quicker which enables you to
follow the progress closely and detect errors early.

// end::mistral[]

// tag::amazon-bedrock[]

[source,console]
----
POST _reindex?wait_for_completion=false
{
"source": {
"index": "test-data",
"size": 50 <1>
},
"dest": {
"index": "amazon-bedrock-embeddings",
"pipeline": "amazon_bedrock_embeddings"
}
}
----
// TEST[skip:TBD]
<1> The default batch size for reindexing is 1000. Reducing `size` to a smaller
number makes the update of the reindexing process quicker which enables you to
follow the progress closely and detect errors early.

// end::amazon-bedrock[]
Loading