-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[Metrics] Refactor LoRA state tracking #26801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a solid refactoring of the LoRA state tracking logic. The changes effectively address several issues, including memory leaks when log_stats is disabled, ambiguity around method idempotency, and overly complex state management. The new implementation is much clearer, more robust, and easier to maintain. The introduction of a comprehensive unit test is a significant improvement that validates the correctness of the new logic and will help prevent future regressions. Overall, this is an excellent contribution that improves the quality and reliability of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
This comment was marked as resolved.
This comment was marked as resolved.
af4c449 to
3f1ba30
Compare
|
@njhill Could you please look at this PR? |
Signed-off-by: Mark McLoughlin <[email protected]>
LoRARequestStates serves a simple purpose - keep track on a per-lora basis which requests are running, and which are waiting. The current implementation is somewhat sloppy and over-complicated: - LoRAStats is never freed if a LoRA becomes unused or unloaded - We leak if log_stats is disabled and LoRA is enabled - It's not clear whether finish_request() should be idempotent - The static methods are a bit weird - We're adding each request to the waiting set twice - ... Hopefully this implementation is more clear and easier to verify that e.g. there are no leaks. Signed-off-by: Mark McLoughlin <[email protected]>
3f1ba30 to
cae669c
Compare
SchedulerStats is the right place for this really, just like the regular running/waiting counts. Make sure to call LoRARequestStates.update_scheduler_stats() even where there was no engine core outputs. Signed-off-by: Mark McLoughlin <[email protected]>
cae669c to
e6aa440
Compare
jeejeelee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's land this PR first
Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
LoRARequestStates serves a simple purpose - keep track on a per-lora basis which requests are running, and which are waiting.
The current implementation is somewhat sloppy and over-complicated:
Hopefully this implementation is more clear and easier to verify that e.g. there are no leaks. And now with a unit test!