-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
feat: add sagemaker_nova provider for Amazon Nova models on SageMaker #21542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
aceb6c4
57efcb9
cd4248b
868f9d0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -505,6 +505,7 @@ | |
| "azure_ai", | ||
| "sagemaker", | ||
| "sagemaker_chat", | ||
| "sagemaker_nova", | ||
| "bedrock", | ||
| "vllm", | ||
| "nlp_cloud", | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| from .transformation import SagemakerNovaConfig # noqa: F401 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| """ | ||
| Translate from OpenAI's `/v1/chat/completions` to SageMaker Nova Inference endpoints. | ||
|
|
||
| Nova models on SageMaker use OpenAI-compatible request/response format with | ||
| additional Nova-specific parameters (top_k, reasoning_effort, etc.). | ||
|
|
||
| Docs: https://docs.aws.amazon.com/nova/latest/nova2-userguide/nova-sagemaker-inference-api-reference.html | ||
| """ | ||
|
|
||
| from typing import List | ||
|
|
||
| from litellm.types.llms.openai import AllMessageValues | ||
|
|
||
| from ..chat.transformation import SagemakerChatConfig | ||
|
|
||
|
|
||
| class SagemakerNovaConfig(SagemakerChatConfig): | ||
| """ | ||
| Config for Amazon Nova models deployed on SageMaker Inference endpoints. | ||
|
|
||
| Nova uses OpenAI-compatible format (same as sagemaker_chat / HF Messages API) | ||
| but with additional Nova-specific parameters and requires `stream: true` in | ||
| the request body for streaming. | ||
|
|
||
| Usage: | ||
| model="sagemaker_nova/<endpoint-name>" | ||
| """ | ||
|
|
||
| @property | ||
| def supports_stream_param_in_request_body(self) -> bool: | ||
| """Nova expects `stream: true` in the request body for streaming.""" | ||
| return True | ||
|
|
||
| def get_supported_openai_params(self, model: str) -> List: | ||
| """Extend parent params with Nova-specific parameters.""" | ||
| params = super().get_supported_openai_params(model) | ||
| nova_params = [ | ||
| "top_k", | ||
| "reasoning_effort", | ||
| "allowed_token_ids", | ||
| "truncate_prompt_tokens", | ||
| ] | ||
| for p in nova_params: | ||
| if p not in params: | ||
| params.append(p) | ||
| return params | ||
|
|
||
| def transform_request( | ||
| self, | ||
| model: str, | ||
| messages: List[AllMessageValues], | ||
| optional_params: dict, | ||
| litellm_params: dict, | ||
| headers: dict, | ||
| ) -> dict: | ||
| """ | ||
| Nova SageMaker endpoints do not accept 'model' in the request body. | ||
| Only supported fields: messages, max_tokens, max_completion_tokens, | ||
| temperature, top_p, top_k, stream, stream_options, logprobs, | ||
| top_logprobs, reasoning_effort, allowed_token_ids, truncate_prompt_tokens. | ||
| """ | ||
| request_body = super().transform_request( | ||
| model=model, | ||
| messages=messages, | ||
| optional_params=optional_params, | ||
| litellm_params=litellm_params, | ||
| headers=headers, | ||
| ) | ||
| request_body.pop("model", None) | ||
| return request_body | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3712,8 +3712,10 @@ def completion( # type: ignore # noqa: PLR0915 | |
| ): | ||
| return _model_response | ||
| response = _model_response | ||
| elif custom_llm_provider == "sagemaker_chat": | ||
| elif custom_llm_provider in ("sagemaker_chat", "sagemaker_nova"): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
elif (
custom_llm_provider == "sagemaker"
or custom_llm_provider == "sagemaker_chat"
):Because elif (
custom_llm_provider == "sagemaker"
or custom_llm_provider == "sagemaker_chat"
or custom_llm_provider == "sagemaker_nova"
): |
||
| # boto3 reads keys from .env | ||
| # sagemaker_chat: HF Messages API endpoints | ||
| # sagemaker_nova: Nova models on SageMaker (OpenAI-compatible) | ||
| model_response = base_llm_http_handler.completion( | ||
| model=model, | ||
| stream=stream, | ||
|
|
@@ -3723,7 +3725,7 @@ def completion( # type: ignore # noqa: PLR0915 | |
| model_response=model_response, | ||
| optional_params=optional_params, | ||
| litellm_params=litellm_params, | ||
| custom_llm_provider="sagemaker_chat", | ||
| custom_llm_provider=custom_llm_provider, | ||
| timeout=timeout, | ||
| headers=headers, | ||
| encoding=_get_encoding(), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reasoning_efforthardcoded as supported for all Nova endpointsPer the PR description and AWS docs,
reasoning_effortis only supported by Nova 2 Lite custom models — not Nova Micro or Nova Lite. However, it is unconditionally added toget_supported_openai_paramsfor everysagemaker_novaendpoint.This violates the project's custom instruction (rule 2605a1b1): model-specific capability flags should live in
model_prices_and_context_window.jsonand be read viaget_model_info, so that support can be updated without a LiteLLM code release.Because SageMaker endpoint names are opaque (just a string like
"my-nova-endpoint"), there is no static entry in the pricing file for these models. The recommended pattern is to rely on a capability flag (supports_reasoning) checked viaget_model_info, or to document that passingreasoning_effortto a non-Nova-2-Lite endpoint will result in an API error, rather than advertising it as universally supported.At minimum, a docstring clarifying the model restriction would prevent confusion, but the preferred fix per project policy is to make this capability opt-in or gate it on a runtime check.
Rule Used: What: Do not hardcode model-specific flags in the ... (source)