Skip to content
This repository was archived by the owner on Mar 21, 2026. It is now read-only.

DRAFT: Custom logits processors#1274

Closed
noamgat wants to merge 2 commits intohuggingface:mainfrom
noamgat:feature/logits-processors
Closed

DRAFT: Custom logits processors#1274
noamgat wants to merge 2 commits intohuggingface:mainfrom
noamgat:feature/logits-processors

Conversation

@noamgat
Copy link

@noamgat noamgat commented Nov 21, 2023

What does this PR do?

This PR is meant for discussion purposes, around the idea to add custom logits processors, discussed here:
#1269
It is not fully implemented yet, but the interfaces are there, and the design discussion can be held.

The idea is, how can we add custom logits processing code, allowing individual generation requests to control them, without injecting new dependencies to the text-generation-inference server.

Example use case:
The lm-format-enforcer library supports JSON Schema and Regex decoding using logits processing. Can we use the text-generation-inference server to decode different JSON schemas and regexes for different requests?

This PR proposes the following solution:

  • The NextTokenChooserParameters message will contain optional LogitsProcessorParameters objects, each containing the name of the processor, and a list of strings as its parameters (staying as generic as possible to make the grpc layer simple).
  • The server will have a central repository of named "logits processor factories" that can receive these parameters and create a request-specific logits warper
  • The server will have a command line module loader option, to enable loading of these modules, that will register themselves in the central repository when they loaded.

This will allow external libraries to register logits processors, that will be used when requests that mention them are given.

Example flow:

  • Server is loaded with --custom_modules=lmformatenforcer.integrations.text_generation_inference flag
  • The module is loaded, and its __init__.py code calls CustomLogitsProcessorsManager.register_factory(JsonSchemaFactory)
  • A request arrives, with name=jsonschema, parameters=["<str of json schema>"] in the logits parameters
  • The NextTokenChooser constructor finds the factory in the central repository, and creates a logits warper for decoding the specific JSON schema
  • After every timestep, the warper is called, putting -inf in the logits of the tokens that would violate the json schema
  • The result is a valid json string that conforms to the schema

What do you think of this design? You can look at the branch comparison for most of the key ideas. Can I go ahead and continue the full implementation?

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@mjsteele12
Copy link

this will change my life. if I can help with testing in any way let me know

@noamgat
Copy link
Author

noamgat commented Nov 28, 2023

I am still waiting for feedback from the team before I continue with the implementation. @Narsil @OlivierDehaene can you have a look and tell me what you think?

@mjsteele12
Copy link

I am still waiting for feedback from the team before I continue with the implementation. @Narsil @OlivierDehaene can you have a look and tell me what you think?
Thanks, still very interested in this. in the meantime will switch to vllm.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants