-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Currently, to generate a response that conforms to a specific JSON schema using TensorRT-LLM's OpenAI-compatible API, i should set the type to structural_tag in response_format, as in:
TensorRT-LLM/tensorrt_llm/serve/openai_protocol.py
Lines 138 to 152 in b01d1c2
| def _response_format_to_guided_decoding_params( | |
| response_format: Optional[ResponseFormat] | |
| ) -> Optional[GuidedDecodingParams]: | |
| if response_format is None: | |
| return None | |
| elif response_format.type == "text": | |
| return None | |
| elif response_format.type == "json_object": | |
| return GuidedDecodingParams(json_object=True) | |
| elif response_format.type == "structural_tag": | |
| return GuidedDecodingParams( | |
| structural_tag=response_format.model_dump_json(by_alias=True, | |
| exclude_none=True)) | |
| else: | |
| raise ValueError(f"Unsupported response format: {response_format.type}") |
While this method works as intended, it has several inconveniences, such as having to mention in the prompt that a model should write begin and end tag.
Maybe it would be better to expose the existing JSON schema decoding capabilities of TensorRT-LLM's backend in the OpenAI-compatible API layer.
The core of this request revolves around a discrepancy between the backend's capabilities and the API's interface. The internal GuidedDecodingParams class already supports a json field, which allows for direct and powerful enforcement of a specific JSON schema.
TensorRT-LLM/tensorrt_llm/sampling_params.py
Lines 14 to 36 in 37293e4
| class GuidedDecodingParams: | |
| """Guided decoding parameters for text generation. Only one of the fields could be effective. | |
| Args: | |
| json (str, pydantic.main.BaseModel, dict, optional): The generated text is amenable to json format with additional user-specified restrictions, namely schema. Defaults to None. | |
| regex (str, optional): The generated text is amenable to the user-specified regular expression. Defaults to None. | |
| grammar (str, optional): The generated text is amenable to the user-specified extended Backus-Naur form (EBNF) grammar. Defaults to None. | |
| json_object (bool): If True, the generated text is amenable to json format. Defaults to False. | |
| structural_tag (str, optional): The generated text is amenable to the user-specified structural tag. Structural tag is supported by xgrammar in PyTorch backend only. Defaults to None. | |
| """ # noqa: E501 | |
| json: Optional[Union[str, BaseModel, dict]] = None | |
| regex: Optional[str] = None | |
| grammar: Optional[str] = None | |
| json_object: bool = False | |
| structural_tag: Optional[str] = None | |
| def _validate(self): | |
| num_guides = 0 | |
| for _field in fields(self): | |
| num_guides += bool(getattr(self, _field.name)) | |
| if num_guides > 1: | |
| raise ValueError(f"Only one guide can be used for a request, but got {num_guides}.") |
Proposed Solution
To bridge the gap between the backend's capabilities and the API's interface, I propose enhancing the response_format object to directly leverage GuidedDecodingParams.
Ideal API Request:
{
"model": "your_model",
"messages": [...],
"response_format": {
"type": "json",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
}
}This would provide a clean, intuitive API that directly maps to the GuidedDecodingParams functionality. It would allow users to get a pure, schema-compliant JSON response without the overhead of wrapper tags, extra prompt engineering, or post-processing.
Thank you for considering this improvement.