-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serve][tests] Add a timeout for resnet app image request #51569
[serve][tests] Add a timeout for resnet app image request #51569
Conversation
Signed-off-by: akyang-anyscale <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there anything else we might want to potentially guard against like the torch ml APIs to be completely sure the bug isn't within serve
Signed-off-by: akyang-anyscale <[email protected]>
yeah it's not the prettiest but it could be worth to surface any potential serve issues faster. wrapped them in a thread w/ timeout |
Signed-off-by: akyang-anyscale <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have a feeling that this is a symptom of a larger issue and this change is masking some underlying problem.
Okay to ship it as a stopgap, but let's file a follow up and link that ticket in the diff.
Also, please include tracebacks that were observed in the test failure.
Yeah there may be a larger problem we don't know yet. Hopefully this PR will tell us if the flakiness is caused by |
Why are these changes needed?
The request for the image in the resnet50 application could hang indefinitely. This could block the event loop and make tests flaky. This PR adds a 5s timeout to the
get
call.Replica is stuck for some reason. As a result, requests are not making progress and the client is disconnecting due to timeout.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.