fix: abort over-length embedding and grammar errors without crash#21224
fix: abort over-length embedding and grammar errors without crash#21224yang1002378395-cmyk wants to merge 1 commit intosgl-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the robustness of the system by addressing two critical error scenarios: over-length embedding requests and grammar processing failures. It introduces mechanisms to gracefully abort problematic requests, preventing server crashes and dead loops, thereby improving system stability and reliability. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces two important fixes to prevent server crashes. For over-length embedding requests, it now correctly marks the request for abortion before adding it to the queue, preventing illegal memory access. This is a good fix for consistency with how generate requests are handled. For grammar errors during output processing, it replaces abort_request with a safer mechanism of setting req.to_finish. This avoids potential dead loops when modifying scheduler state from within the processing loop. The changes are well-targeted and correctly address the described issues, improving the robustness of the server. The implementation looks solid.
f989b7f to
fccdfee
Compare
20721b0 to
6698169
Compare
…ound This allows proper fallback to diffusers backend when native config is not available for a model. Fixes sgl-project#21311
6698169 to
d69c39b
Compare
|
seems like the modifiction is unrelated to the description |
|
Closing this PR as the actual changes don't match the description. The commit pushed here (fix: return None instead of raising RuntimeError) is duplicated in PR #21319. This PR was supposed to fix issues #21136, #21168, #21171, #21173 but the wrong commit was pushed. Will create a new PR with the correct fixes. |
Summary
Fixes #21136, Fixes #21168, Fixes #21171, Fixes #21173
Issue #21136: Auto-truncate leaves zero generation tokens
context_len,--allow-auto-truncatetruncated to exactlycontext_len, leavingmax_new_tokens = 0Issue #21168: Embedding request crash
set_finish_with_abort(error_msg)before adding to queueIssue #21171: Grammar error crash
abort_request()during output processing caused dead loopreq.to_finish = FINISH_ABORT(message=str(e))+continueto safely abortIssue #21173: Race condition in _handle_abort_req
self.rid_to_state[recv_obj.rid]throws KeyError when request already cleaned up.get()with None check, consistent with_handle_batch_outputpatternChanges
tokenizer_manager.py: Fix auto-truncate to reserve generation space + safe dict access in_handle_abort_reqscheduler.py: Add error handling for over-length embedding requestsscheduler_output_processor_mixin.py: Replaceabort_requestwithto_finishin both prefill and decode pathsTesting
@zhaochenyang2021 @merrymercy @hnyls2002