-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry not wait when connecting to the instance on submiting solution #484
Conversation
…this time there is still no available instance the worker status will be reset to "retry"
Codecov Report
@@ Coverage Diff @@
## master #484 +/- ##
==========================================
+ Coverage 93.56% 93.58% +0.01%
==========================================
Files 99 99
Lines 8496 8506 +10
==========================================
+ Hits 7949 7960 +11
+ Misses 547 546 -1
Continue to review full report at Codecov.
|
good to go from your end? |
I think so. but it's always better if someone reviews it... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only giving it quick look - sorry i couldn't find the part where it is added to the queue? Can you point me in the right direction? Thanks!
@@ -57,7 +57,7 @@ | |||
|
|||
# how long to wait for connections | |||
WAIT_MINUTES = 2 | |||
MAX_TRIES_TO_CONNECT = 5 | |||
MAX_TRIES_TO_CONNECT = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change this to 1, do we still need the if n_try < max_tries_to_connect:
? I guess leaving it gives us the option to increase MAX_TRIES_TO_CONNECT
in future ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I prefer to leave it as a param for two reasons:
- it is easier to test because for the test we can set this and the
WAIT_MINUTES
to low values so that the tests is done within reasonable time - we can easily update it in the future
one idea...
could we deploy RAMP on an AWS instance so we can easily test master branch
with some non empty database?
… |
When the worker status is set to 'retry' dispatcher sets the submission back to
|
@agramfort |
@agramfort @lucyleeow if you are happy could you pls merge (then it could already be present in the next release) |
Ah I was looking for an explicit adding to the queue but I had forgotten I added the 'retry' function! LGTM |
thx @maikia ! |
for the staging it's up to you if it helps you or not |
…aris-saclay-cds#484) * updated the wait for the available instance to only 2 mins. If after this time there is still no available instance the worker status will be reset to "retry" * added test to make sure that retry status is set * updated the the test to make sure the worker status and the log are correct * updates so that api can pass to the worker request for retrying later
…484) * updated the wait for the available instance to only 2 mins. If after this time there is still no available instance the worker status will be reset to "retry" * added test to make sure that retry status is set * updated the the test to make sure the worker status and the log are correct * updates so that api can pass to the worker request for retrying later
Sometimes the worker cannot connect to the instance because there are not enough instances are available. This can happen for multiple of reasons one of them being that the instance which was set to terminate do not yet is available for the use.
For that reason we previously added a possibility to wait and try again few times before giving an error.
This was not very efficient because it forced the whole dispatcher to wait this time and not allowing it to collect the results from other workers and possibly free an instance -> which would solve the problem of not having enough instances if the cause was different from the above.
This PR shortens the waiting time and if the instances are still not available it puts the worker back into the queue to try later.