[CI] Fail subprocess tests with root-cause error#23795
[CI] Fail subprocess tests with root-cause error#23795simon-mo merged 22 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request is a great quality-of-life improvement for debugging tests that run in forked processes. By serializing and re-raising exceptions from the child process, it ensures that the root-cause error is not lost, which will significantly speed up debugging.
I've found one critical bug related to operator precedence that prevents the exception from being correctly re-raised, and one high-severity issue concerning a resource leak where temporary files are not being cleaned up. Addressing these issues will make this a solid and robust enhancement.
Signed-off-by: Nick Hill <nhill@redhat.com>
5bc6669 to
00f167a
Compare
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
|
Retrying async tests to see if the timeout is just flaky or caused by this PR. |
Thanks @DarkLight1337. Looks like it hung again at the same place after ~10min. I'll cancel it so that it doesn't timeout after 3hrs again, will investigate tomorrow. |
Signed-off-by: Nick Hill <nhill@redhat.com>
# Conflicts: # tests/conftest.py
11214c2 to
1510ff0
Compare
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
1510ff0 to
a5b79e2
Compare
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
QOL improvement
We use a
create_new_process_for_each_testdecorator on many tests to avoid cleanup issues. However when such tests fail the pytest error is from the process failure not the actual test failure. This change propagates the actual test exception of interest so that it's reported by pytest as if the test was running inline.Before:

After:
