Skip to content

[Feature Request] Direct JSON Schema Support in OpenAI-Compatible API #5952

@noiji

Description

@noiji

Currently, to generate a response that conforms to a specific JSON schema using TensorRT-LLM's OpenAI-compatible API, i should set the type to structural_tag in response_format, as in:

def _response_format_to_guided_decoding_params(
response_format: Optional[ResponseFormat]
) -> Optional[GuidedDecodingParams]:
if response_format is None:
return None
elif response_format.type == "text":
return None
elif response_format.type == "json_object":
return GuidedDecodingParams(json_object=True)
elif response_format.type == "structural_tag":
return GuidedDecodingParams(
structural_tag=response_format.model_dump_json(by_alias=True,
exclude_none=True))
else:
raise ValueError(f"Unsupported response format: {response_format.type}")

While this method works as intended, it has several inconveniences, such as having to mention in the prompt that a model should write begin and end tag.

Maybe it would be better to expose the existing JSON schema decoding capabilities of TensorRT-LLM's backend in the OpenAI-compatible API layer.

The core of this request revolves around a discrepancy between the backend's capabilities and the API's interface. The internal GuidedDecodingParams class already supports a json field, which allows for direct and powerful enforcement of a specific JSON schema.

class GuidedDecodingParams:
"""Guided decoding parameters for text generation. Only one of the fields could be effective.
Args:
json (str, pydantic.main.BaseModel, dict, optional): The generated text is amenable to json format with additional user-specified restrictions, namely schema. Defaults to None.
regex (str, optional): The generated text is amenable to the user-specified regular expression. Defaults to None.
grammar (str, optional): The generated text is amenable to the user-specified extended Backus-Naur form (EBNF) grammar. Defaults to None.
json_object (bool): If True, the generated text is amenable to json format. Defaults to False.
structural_tag (str, optional): The generated text is amenable to the user-specified structural tag. Structural tag is supported by xgrammar in PyTorch backend only. Defaults to None.
""" # noqa: E501
json: Optional[Union[str, BaseModel, dict]] = None
regex: Optional[str] = None
grammar: Optional[str] = None
json_object: bool = False
structural_tag: Optional[str] = None
def _validate(self):
num_guides = 0
for _field in fields(self):
num_guides += bool(getattr(self, _field.name))
if num_guides > 1:
raise ValueError(f"Only one guide can be used for a request, but got {num_guides}.")


Proposed Solution

To bridge the gap between the backend's capabilities and the API's interface, I propose enhancing the response_format object to directly leverage GuidedDecodingParams.

Ideal API Request:

{
  "model": "your_model",
  "messages": [...],
  "response_format": {
    "type": "json",
    "schema": {
      "type": "object",
      "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
      },
      "required": ["name", "age"]
    }
  }
}

This would provide a clean, intuitive API that directly maps to the GuidedDecodingParams functionality. It would allow users to get a pure, schema-compliant JSON response without the overhead of wrapper tags, extra prompt engineering, or post-processing.

Thank you for considering this improvement.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions