Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add timeout/retry logic to docker cache download #13573

Merged
merged 6 commits into from
Dec 21, 2018

Conversation

jlcontreras
Copy link
Contributor

Dockerhub sometimes has unstable behavior, causing downloads to take much longer than they should. Average download time is below 2min, but in some rare cases it can take more than 1h (see http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/2033/pipeline/ for example)

This PR wraps the cache download call with timeout/retry logic as a fix

@@ -131,14 +134,15 @@ def _login_dockerhub():
logging.error(p.stderr)

# Linear backoff
time.sleep(1000 * DOCKERHUB_RETRY_SECONDS * (i + 1))
time.sleep(DOCKERHUB_RETRY_SECONDS * (i + 1))
Copy link
Contributor

@larroy larroy Dec 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use @Retry decorator, it's in util under CI I think. just so we don't reimplement retry logic everytime. Or maybe you can abstract this in another retry_with_timeout deco.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the suggestion :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, why not also above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to keep this PR specific to this method, but I guess it makes more sense to add it also, done

@nswamy nswamy added CI pr-awaiting-response PR is reviewed and waiting for contributor to respond and removed pr-awaiting-review PR is waiting for code review labels Dec 8, 2018
Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, LGTM

@jlcontreras
Copy link
Contributor Author

@mxnet-label-bot update[CI, pr-awaiting-merge]

@marcoabreu marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-response PR is reviewed and waiting for contributor to respond labels Dec 13, 2018
@Roshrini
Copy link
Member

@marcoabreu @KellenSunderland Can someone merge this PR?

Copy link
Contributor

@KellenSunderland KellenSunderland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for working on this @jlcontreras

@KellenSunderland KellenSunderland merged commit 63fe849 into apache:master Dec 21, 2018
rondogency pushed a commit to rondogency/incubator-mxnet that referenced this pull request Jan 9, 2019
* Added timeout/retry (linear backoff) to docker cache download

* Units changed, as time.sleep takes seconds as argument

* Improved error handling

* Using retry decorator

* Added retry decorator to _login_dockerhub method

* Fixed wrong import
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Added timeout/retry (linear backoff) to docker cache download

* Units changed, as time.sleep takes seconds as argument

* Improved error handling

* Using retry decorator

* Added retry decorator to _login_dockerhub method

* Fixed wrong import
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CI pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants