Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

429 (TooManyRequests) Responses from ClearML #492

Open
Enkidu93 opened this issue Sep 16, 2024 · 9 comments
Open

429 (TooManyRequests) Responses from ClearML #492

Enkidu93 opened this issue Sep 16, 2024 · 9 comments
Assignees

Comments

@Enkidu93
Copy link
Collaborator

In recent weeks, we've seen exceptions thrown from the ClearMLMonitorService regarding a failure to parse responses from ClearML. John added improved logging and the issue has reoccurred, so I peeked at the logs and the unexpected responses are 429s. Do we need to poll more infrequently and/or change our http client retry strategy? The response we're getting from ClearML does not seem to specify a Retry-After unfortunately. I haven't yet found any information regarding the rate-limiting on ClearML - maybe it's worth reaching out to them (?).

@johnml1135
Copy link
Collaborator

@Enkidu93
Copy link
Collaborator Author

@johnml1135 johnml1135 self-assigned this Sep 16, 2024
@ddaspit
Copy link
Contributor

ddaspit commented Sep 16, 2024

It looks like we are making extra calls to GetTasksByIdAsync. We are currently calling it for every engine type when we only need to call it once.

@Enkidu93
Copy link
Collaborator Author

We could try spitting out the header as well: https://stackoverflow.com/questions/75302783/how-do-i-catch-the-http-error-429-retry-after-time-in-a-try-except-scenario

It's happened again since you made that change. Here's the log:

{"log":"warn: Serval.Machine.Shared.Services.ClearMLService[0]
      Failed to parse ClearML response with code TooManyRequests from request path `queues.get_all_ex`: Date: Wed, 25 Sep 2024 01:44:15 GMT
      Connection: keep-alive
      x-robots-tag: noindex,nofollow
      x-envoy-ratelimited: true
      x-clearml-accept-encoding: gzip
      Server: clearml
      
","stream":"stdout","time":"2024-09-25T01:44:16.007998856Z"}

So yes, no Retry-After 🫤.

It looks like we are making extra calls to GetTasksByIdAsync. We are currently calling it for every engine type when we only need to call it once.

That's true - and an easy fix. You think that's sufficient to give us these errors though?

@ddaspit
Copy link
Contributor

ddaspit commented Sep 27, 2024

We should remove the extra calls and increase the polling timeout by a bit. Hopefully that will be enough to fix this issue.

@johnml1135
Copy link
Collaborator

I wonder - did you see the messages from IDX on the ClearML channel?- it could be that they are the ones causing the issue. It may be that our request comes in right when they are in the middle of DoS'ing the server and we get the "TooManyRequests" error. It shouldn't happen because we are using different user creds than IDX, but it could still be the case. Either way, we should elegantly handle it.

@Enkidu93
Copy link
Collaborator Author

Enkidu93 commented Oct 3, 2024

I wonder - did you see the messages from IDX on the ClearML channel?- it could be that they are the ones causing the issue. It may be that our request comes in right when they are in the middle of DoS'ing the server and we get the "TooManyRequests" error. It shouldn't happen because we are using different user creds than IDX, but it could still be the case. Either way, we should elegantly handle it.

I don't think I'm on that channel. Could you add me or point me there?

Also, @johnml1135, unless you're already working on this one actively, I think I'll go ahead and do this while I'm waiting for review. We've been getting a lot of these errors.

@johnml1135 johnml1135 assigned Enkidu93 and unassigned johnml1135 Oct 3, 2024
@johnml1135
Copy link
Collaborator

Go for it.

@Enkidu93
Copy link
Collaborator Author

Waiting to see if once PR #504 is on QA, the errors stop appearing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants