[Enhancement] Patch OmniStage.try_collect() with ray alive checks#1561
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e5051e2d34
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
lishunyang12
left a comment
There was a problem hiding this comment.
Left a couple of comments.
lishunyang12
left a comment
There was a problem hiding this comment.
Traceback surfacing looks good now. One more issue in \ — see inline.
|
Traceback surfacing with |
|
@lishunyang12 Does this look alright to you now? |
lishunyang12
left a comment
There was a problem hiding this comment.
yep, elif structure looks correct now. LGTM
| pass | ||
| except Exception as e: | ||
| logger.error("Unexpected error when collecting OmniStage output queue:", exc_info=e) | ||
| self.stop_stage_worker() |
There was a problem hiding this comment.
@wtomin I had in both PRs because I didn't know which one would be merged first. I will resolve merge conflict if any appears during rebase
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Signed-off-by: Daniel Huang <pilotflyer824@gmail.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
7cc5ebc to
b5b72ad
Compare
…lm-project#1561) Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Signed-off-by: Daniel Huang <pilotflyer824@gmail.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Signed-off-by: Megha Agarwal <agarwalmegha1308@gmail.com>
…lm-project#1561) Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Signed-off-by: Daniel Huang <pilotflyer824@gmail.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
…lm-project#1561) Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Signed-off-by: Daniel Huang <pilotflyer824@gmail.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>
…lm-project#1561) Signed-off-by: Daniel Huang <daniel1.huang@intel.com> Signed-off-by: Daniel Huang <pilotflyer824@gmail.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Waiting for OmniStage involves checking the output queue for results. However, try_collect() does not check if ray process has died and will hang indefinitely. This fixes this issue by explicitly checking that the process is alive before attempting to read the output queue. Component of #1557 relating to issue #1346. Complementary part of #1560
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)