Skip to content

Conversation

lapp0
Copy link

@lapp0 lapp0 commented Jun 3, 2023

In short

  • show server error in error code
  • if 429, wait longer between requests

/{endpoint_id}/status/{job_id} is not guaranteed to have an "output" key:

curl -X POST https://api.runpod.ai/v1/9<redacted>/status/54<redacted> \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer T9<redacted>'

{"delayTime":9862,"error":"name 'app' is not defined","executionTime":97,"id":"54a54b10-d43e-43cb-b29e-c4cac8712ea6","input":{"prompts":["a cute magical flying dog, fantasy art drawn by disney concept artists"]},"status":"FAILED"}

In runpod-python this results in an uniformative KeyError, which is fixed by this PR:

File "/usr/src/app/myapp/embeddings/text_embeddings.py", line 60, in send_run_request
    return run_request.output()

File "/usr/local/lib/python3.11/site-packages/runpod/endpoint/runner.py", line 96, in output
    return output_request.json()["output"]
           ~~~~~^^^^^^^^^^
KeyError: 'output'

Additionally sometimes there is no json returned at all, I was getting 429 in this case:

File "/usr/local/lib/python3.11/site-packages/runpod/endpoint/runner.py", line 85, in output
    while self.status() not in ["COMPLETED", "FAILED"]:
          ^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/runpod/endpoint/runner.py", line 76, in status
    return status_request.json()["status"]
           ^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

This PR results in a more informative traceback:

File "/usr/local/lib/python3.11/site-packages/runpod/endpoint/runner.py", line 105, in output
    while self.status() not in ["COMPLETED", "FAILED"]:

File "/usr/local/lib/python3.11/site-packages/runpod/endpoint/runner.py", line 97, in status
    return self._status_json()["status"]
           ^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/runpod/endpoint/runner.py", line 80, in _status_json
    raise ValueError(
ValueError: Error decoding response json. Status Code: 429, Raw Response: ''

I further improved this by raising TooManyRequestsError in the case of 429 and applying backoff. The above traceback will only be shown for invalid json responses which are not 429.

@lapp0
Copy link
Author

lapp0 commented Jun 3, 2023

As a further improvement, I suggest RunPod make server-side changes to increase the threshold for 429 error responses so users who are enqueueing many jobs in parallel are less likely to recieve an error.

However this is a separate suggestion. This PR alone will resolve much of the issue.

@justinmerrell
Copy link
Contributor

Thank you for the PR and detailed comments. I have revised your code, please test it to make sure it is still addressing the issues you originally identified.

@justinmerrell justinmerrell merged commit 66899d6 into runpod:main Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants