429 (TooManyRequests) Responses from ClearML #492

Enkidu93 · 2024-09-16T14:03:52Z

In recent weeks, we've seen exceptions thrown from the ClearMLMonitorService regarding a failure to parse responses from ClearML. John added improved logging and the issue has reoccurred, so I peeked at the logs and the unexpected responses are 429s. Do we need to poll more infrequently and/or change our http client retry strategy? The response we're getting from ClearML does not seem to specify a Retry-After unfortunately. I haven't yet found any information regarding the rate-limiting on ClearML - maybe it's worth reaching out to them (?).

The text was updated successfully, but these errors were encountered:

johnml1135 · 2024-09-16T16:05:08Z

We could try spitting out the header as well:
https://stackoverflow.com/questions/75302783/how-do-i-catch-the-http-error-429-retry-after-time-in-a-try-except-scenario

Enkidu93 · 2024-09-16T16:50:16Z

We could try spitting out the header as well: https://stackoverflow.com/questions/75302783/how-do-i-catch-the-http-error-429-retry-after-time-in-a-try-except-scenario

Oh, good point!

ddaspit · 2024-09-16T18:44:16Z

It looks like we are making extra calls to GetTasksByIdAsync. We are currently calling it for every engine type when we only need to call it once.

Enkidu93 · 2024-09-25T14:33:49Z

We could try spitting out the header as well: https://stackoverflow.com/questions/75302783/how-do-i-catch-the-http-error-429-retry-after-time-in-a-try-except-scenario

It's happened again since you made that change. Here's the log:

{"log":"warn: Serval.Machine.Shared.Services.ClearMLService[0]
      Failed to parse ClearML response with code TooManyRequests from request path `queues.get_all_ex`: Date: Wed, 25 Sep 2024 01:44:15 GMT
      Connection: keep-alive
      x-robots-tag: noindex,nofollow
      x-envoy-ratelimited: true
      x-clearml-accept-encoding: gzip
      Server: clearml
      
","stream":"stdout","time":"2024-09-25T01:44:16.007998856Z"}

So yes, no Retry-After 🫤.

It looks like we are making extra calls to GetTasksByIdAsync. We are currently calling it for every engine type when we only need to call it once.

That's true - and an easy fix. You think that's sufficient to give us these errors though?

ddaspit · 2024-09-27T16:08:11Z

We should remove the extra calls and increase the polling timeout by a bit. Hopefully that will be enough to fix this issue.

johnml1135 · 2024-09-30T19:35:07Z

I wonder - did you see the messages from IDX on the ClearML channel?- it could be that they are the ones causing the issue. It may be that our request comes in right when they are in the middle of DoS'ing the server and we get the "TooManyRequests" error. It shouldn't happen because we are using different user creds than IDX, but it could still be the case. Either way, we should elegantly handle it.

Enkidu93 · 2024-10-03T18:01:29Z

I wonder - did you see the messages from IDX on the ClearML channel?- it could be that they are the ones causing the issue. It may be that our request comes in right when they are in the middle of DoS'ing the server and we get the "TooManyRequests" error. It shouldn't happen because we are using different user creds than IDX, but it could still be the case. Either way, we should elegantly handle it.

I don't think I'm on that channel. Could you add me or point me there?

Also, @johnml1135, unless you're already working on this one actively, I think I'll go ahead and do this while I'm waiting for review. We've been getting a lot of these errors.

johnml1135 · 2024-10-03T19:00:55Z

Go for it.

Enkidu93 · 2024-10-10T00:29:28Z

Waiting to see if once PR #504 is on QA, the errors stop appearing.

johnml1135 mentioned this issue Sep 16, 2024

Log the headers #493

Merged

johnml1135 self-assigned this Sep 16, 2024

johnml1135 assigned Enkidu93 and unassigned johnml1135 Oct 3, 2024

Enkidu93 mentioned this issue Oct 3, 2024

Call 'GetAsksById' once per DoWork #504

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

429 (TooManyRequests) Responses from ClearML #492

429 (TooManyRequests) Responses from ClearML #492

Enkidu93 commented Sep 16, 2024

johnml1135 commented Sep 16, 2024

Enkidu93 commented Sep 16, 2024

ddaspit commented Sep 16, 2024

Enkidu93 commented Sep 25, 2024

ddaspit commented Sep 27, 2024

johnml1135 commented Sep 30, 2024

Enkidu93 commented Oct 3, 2024

johnml1135 commented Oct 3, 2024

Enkidu93 commented Oct 10, 2024

429 (TooManyRequests) Responses from ClearML #492

429 (TooManyRequests) Responses from ClearML #492

Comments

Enkidu93 commented Sep 16, 2024

johnml1135 commented Sep 16, 2024

Enkidu93 commented Sep 16, 2024

ddaspit commented Sep 16, 2024

Enkidu93 commented Sep 25, 2024

ddaspit commented Sep 27, 2024

johnml1135 commented Sep 30, 2024

Enkidu93 commented Oct 3, 2024

johnml1135 commented Oct 3, 2024

Enkidu93 commented Oct 10, 2024