-
Notifications
You must be signed in to change notification settings - Fork 195
remove Model field from LLMRequest #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @kfswain |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
|
I am working on the Flow Controller POC. We currently enforce the dispatch policy (for fairness) by model, not resolved target model. I will either need to use a different type when enqueuing a request than LLMRequest or will need model to remain as a field on this struct. The Flow Controller is not tightly coupled to the Scheduler, so either approach is fine. |
@LukeAVanDrie thanks for bringing this up. Ideally, each layer should get the data it needs for its mission, and only that. |
To my understanding, @LukeAVanDrie needs the
|
A subset (model, criticality) and some new fields: a reference to the request prompt size in bytes and a reference to the request context.
This is a good point. I have removed my dependency on scheduling.LLMRequest and am defining my own type for the FlowController input. My POC's fate is no longer tied to this PR. |
|
ready for review after #781 was merged |
| if reqCtx.ResolvedTargetModel == "" { | ||
| return reqCtx, errutil.Error{Code: errutil.BadConfiguration, Msg: fmt.Sprintf("error getting target model name for model %v", modelObj.Name)} | ||
| } | ||
| reqCtx.Request.Body["model"] = reqCtx.ResolvedTargetModel // Update target model in the body. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was moved from PostDispatch to here.
in this point the resolution of the target model is done, it seems to belong here more naturally.
|
@nirrozenbaum needs a rebase. |
danehans
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, nirrozenbaum The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Nir Rozenbaum <[email protected]>
|
/lgtm |
* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <[email protected]> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Updates to tests Signed-off-by: Shmuel Kallner <[email protected]> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <[email protected]> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <[email protected]> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <[email protected]> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <[email protected]> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <[email protected]> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <[email protected]> * Updates due to rebase issues * merge functions in env utils (#819) Signed-off-by: Nir Rozenbaum <[email protected]> * generalize scheduling cycle state concept (#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <[email protected]> * typo Signed-off-by: Nir Rozenbaum <[email protected]> * make linter happy Signed-off-by: Nir Rozenbaum <[email protected]> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> * remove Model field from LLMRequest (#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <[email protected]> * rebase handling Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <[email protected]> --------- Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Nir Rozenbaum <[email protected]> Co-authored-by: Nir Rozenbaum <[email protected]>
* remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <[email protected]> * rebase handling Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]>
* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <[email protected]> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Updates to tests Signed-off-by: Shmuel Kallner <[email protected]> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <[email protected]> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <[email protected]> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <[email protected]> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <[email protected]> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <[email protected]> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <[email protected]> * Updates due to rebase issues * merge functions in env utils (kubernetes-sigs#819) Signed-off-by: Nir Rozenbaum <[email protected]> * generalize scheduling cycle state concept (kubernetes-sigs#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <[email protected]> * typo Signed-off-by: Nir Rozenbaum <[email protected]> * make linter happy Signed-off-by: Nir Rozenbaum <[email protected]> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> * remove Model field from LLMRequest (kubernetes-sigs#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <[email protected]> * rebase handling Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <[email protected]> --------- Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Nir Rozenbaum <[email protected]> Co-authored-by: Nir Rozenbaum <[email protected]>
* remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <[email protected]> * rebase handling Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]>
* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <[email protected]> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Updates to tests Signed-off-by: Shmuel Kallner <[email protected]> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <[email protected]> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <[email protected]> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <[email protected]> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <[email protected]> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <[email protected]> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <[email protected]> * Updates due to rebase issues * merge functions in env utils (kubernetes-sigs#819) Signed-off-by: Nir Rozenbaum <[email protected]> * generalize scheduling cycle state concept (kubernetes-sigs#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <[email protected]> * typo Signed-off-by: Nir Rozenbaum <[email protected]> * make linter happy Signed-off-by: Nir Rozenbaum <[email protected]> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> * remove Model field from LLMRequest (kubernetes-sigs#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <[email protected]> * rebase handling Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <[email protected]> --------- Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Nir Rozenbaum <[email protected]> Co-authored-by: Nir Rozenbaum <[email protected]>
* remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <[email protected]> * rebase handling Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]>
* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <[email protected]> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Updates to tests Signed-off-by: Shmuel Kallner <[email protected]> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <[email protected]> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <[email protected]> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <[email protected]> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <[email protected]> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <[email protected]> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <[email protected]> * Updates due to rebase issues * merge functions in env utils (kubernetes-sigs#819) Signed-off-by: Nir Rozenbaum <[email protected]> * generalize scheduling cycle state concept (kubernetes-sigs#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <[email protected]> * typo Signed-off-by: Nir Rozenbaum <[email protected]> * make linter happy Signed-off-by: Nir Rozenbaum <[email protected]> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> * remove Model field from LLMRequest (kubernetes-sigs#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <[email protected]> * rebase handling Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <[email protected]> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <[email protected]> --------- Signed-off-by: Shmuel Kallner <[email protected]> Signed-off-by: Nir Rozenbaum <[email protected]> Co-authored-by: Nir Rozenbaum <[email protected]>
This PR removes the
Modelfield fromScheduling.LLMRequeststruct.from the scheduler point of view, it doesn't care about the original requested model name, only about the resolved model after traffic splitting that was done in a higher level.
we can see that
Modelfield was used in unit-tests only and that it's always set to be identical to ResolvedTargetModel. Scheduler doesn't use this field.In addition to removing
Modelfield fromLLMRequeststruct, this PR renamesResolvedTargetModeltoTargetModelinLLMRequestfrom the same reasons. scheduler plugins don't care about traffic splitting and "resolved" model, only about what is target model.unit-tests and other usage places were updated accordingly.