[Inference API] Add Docs for Amazon Bedrock Support for the Inference API#110594
[Inference API] Add Docs for Amazon Bedrock Support for the Inference API#110594markjhoy merged 5 commits intoelastic:mainfrom markjhoy:markjhoy/add_docs_amazon_bedrock_inference_api
Conversation
|
Documentation preview: |
timgrein
left a comment
There was a problem hiding this comment.
Just highlighting some Azure AI Studio references (Sorry, I just saw that this is draft, but leaving the comments here, so we don't forget it :) )
| Creates an {infer} endpoint to perform an {infer} task with the `amazonbedrock` service. | ||
|
|
||
| [discrete] | ||
| [[infer-service-azure-ai-studio-api-request]] |
There was a problem hiding this comment.
| [[infer-service-azure-ai-studio-api-request]] | |
| [[infer-service-amazon-bedrock-api-request]] |
| `PUT /_inference/<task_type>/<inference_id>` | ||
|
|
||
| [discrete] | ||
| [[infer-service-azure-ai-studio-api-path-params]] |
There was a problem hiding this comment.
| [[infer-service-azure-ai-studio-api-path-params]] | |
| [[infer-service-amazon-bedrock-path-params]] |
|
|
||
| `rate_limit`::: | ||
| (Optional, object) | ||
| By default, the `azureaistudio` service sets the number of requests allowed per minute to `240`. |
There was a problem hiding this comment.
| By default, the `azureaistudio` service sets the number of requests allowed per minute to `240`. | |
| By default, the `amazonbedrock` service sets the number of requests allowed per minute to `240`. |
| `rate_limit`::: | ||
| (Optional, object) | ||
| By default, the `azureaistudio` service sets the number of requests allowed per minute to `240`. | ||
| This helps to minimize the number of rate limit errors returned from Azure AI Studio. |
There was a problem hiding this comment.
| This helps to minimize the number of rate limit errors returned from Azure AI Studio. | |
| This helps to minimize the number of rate limit errors returned from Amazon Bedrock. |
There was a problem hiding this comment.
Argh - great catches ;) That's what I get for copy / pasting
|
@elasticmachine run docs build |
|
Pinging @elastic/es-docs (Team:Docs) |
| + | ||
| .`task_settings` for the `text_embedding` task type | ||
| [%collapsible%closed] | ||
| ===== |
There was a problem hiding this comment.
@markjhoy I think this unclosed ==== block might be breaking your build :)
There was a problem hiding this comment.
Ah thanks! I could not figure out for the life of me where that error was coming from!
leemthompo
left a comment
There was a problem hiding this comment.
This is looking good. Found a few minor errors and suggested some rephrasings. Also hopefully identified the formatting issues that's failing docs build. Once these updates are made and we can preview the formatting for tabs, will be ready for final review!
| A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability. | ||
| Should not be used if `temperature` or `top_k` is specified. | ||
|
|
||
| `top_p`::: |
There was a problem hiding this comment.
| `top_p`::: | |
| `top_k`::: |
Assuming the first top_p is the correct one 😉
|
|
||
| `max_new_tokens`::: | ||
| (Optional, integer) | ||
| Provides a hint for the maximum number of output tokens to be generated. |
There was a problem hiding this comment.
| Provides a hint for the maximum number of output tokens to be generated. | |
| Sets a maximum number for the output tokens to be generated. |
Not sure what "hint" means here, rewording tries to clarify
|
|
||
| `temperature`::: | ||
| (Optional, float) | ||
| A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions. |
There was a problem hiding this comment.
| A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions. | |
| A number between 0.0 and 1.0 that controls the apparent creativity of the results. At temperature 0.0 the model is most deterministic, at temperature 1.0 most random. |
|
|
||
| `top_p`::: | ||
| (Optional, float) | ||
| A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability. |
There was a problem hiding this comment.
| A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability. | |
| Alternative to `temperature`. A number in the range of 0.0 to 1.0, to eliminate low-probability tokens. Top-p uses nucleus sampling to select top tokens whose sum of likelihoods does not exceed a certain value, ensuring both variety and coherence. |
| `top_p`::: | ||
| (Optional, float) | ||
| A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability. | ||
| Should not be used if `temperature` or `top_k` is specified. |
There was a problem hiding this comment.
Reading around it looks like top-p and top-k can be used in combination?
There was a problem hiding this comment.
FYI - you're correct here... theoretically, you can use all three, but you shouldn't use temperature and top_p at the same time. For reference, see the parameters in Amazon's Anthropic docs
| `top_p`::: | ||
| (Optional, float) | ||
| Only available for `anthropic`, `cohere`, and `mistral` providers. | ||
| A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability. |
There was a problem hiding this comment.
| A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability. | |
| Alternative to `temperature`. Limits samples to the top-K most likely words, balancing coherence and variability. | |
| A number in the range of 0.0 to 1.0. |
|
|
||
| The following example shows how to create an {infer} endpoint called `amazon_bedrock_embeddings` to perform a `text_embedding` task type. | ||
|
|
||
| The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to. |
There was a problem hiding this comment.
| The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to. | |
| Choose chat completion and embeddings models you have access to from the https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base models]. |
nit: keep sentence short
💚 Backport successful
|
… API (#110594) * Add Amazon Bedrock Inference API to docs * fix example errors * update semantic search tutorial; add changelog * fix typo * fix error; accept suggestions
… API (#110594) * Add Amazon Bedrock Inference API to docs * fix example errors * update semantic search tutorial; add changelog * fix typo * fix error; accept suggestions
Add docs in support of Amazon Bedrock support in the Inference API: #110248