-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Separate refactoring from #12276 in a prior PR #12296
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change
ci/build.py
Outdated
each retry | ||
:type backoff: int | ||
""" | ||
import time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to do the imports at the top of the file in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it better? it's perfectly legal to do this, and some consider best practice to scope imports to the minimum scope necessary, like for example function scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But would it be imported every time the retry function gets called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about this in either direction, for some functions like this is better so it's self contained. I can also move this function to util.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. Thanks!
ci/build.py
Outdated
if args.build_only: | ||
continue | ||
build_platform = "build_{}".format(platform) | ||
cmd = ["/work/mxnet/ci/docker/runtime_functions.sh", build_platform] | ||
git_cleanup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good, but maybe we should issue a warning here with confirmation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, how this would work when all platforms are build one after another? Only the latest artifacts survive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work if for example I want to test a new file locally that hasn't yet been added to git?
ci/build.py
Outdated
build_platform = "build_{}".format(platform) | ||
plat_buildir = os.path.abspath(os.path.join(get_mxnet_root(), '..', | ||
"mxnet_{}".format(build_platform))) | ||
if os.path.exists(plat_buildir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the path exist if git_cleanup was called before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is addressed, I removed git cleanup.
ci/build.py
Outdated
@@ -383,7 +472,8 @@ def use_cache(): | |||
|
|||
./build.py -a | |||
|
|||
Builds for all platforms and leaves artifacts in build_<platform> | |||
Builds for all platforms and leaves artifacts in build_<platform>. **WARNING** it performs git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a -y for quiet confirmation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed that git cleanup.
ci/build.py
Outdated
@@ -284,7 +364,7 @@ def script_name() -> str: | |||
default=1, | |||
type=int) | |||
|
|||
parser.add_argument("-c", "--cache", action="store_true", | |||
parser.add_argument("--no-cache", action="store_true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the background for this change? I'm a little concerned that changing script args could have downstream impact so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that this is changing behavior of the script and does not fall under "minor refactorings". Even if it's a oneliner - it needs a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will split it again then. But this doesn't affect CI given the command line used in the Jenkinsfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too much trouble to split, as agreed in 1:1 I improved documentation about this flag.
ci/build.py
Outdated
if args.build_only: | ||
continue | ||
build_platform = "build_{}".format(platform) | ||
cmd = ["/work/mxnet/ci/docker/runtime_functions.sh", build_platform] | ||
git_cleanup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work if for example I want to test a new file locally that hasn't yet been added to git?
ci/build.py
Outdated
|
||
command = list(chain(*args.command)) | ||
docker_binary = get_docker_binary(args.nvidiadocker) | ||
shared_memory_size = args.shared_memory_size | ||
num_docker_build_retires = args.docker_build_retries | ||
|
||
if args.list: | ||
list_platforms() | ||
print(list_platforms()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You used logger before, maybe use it here as well instead of print?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is different concerns, this prints unconditionally to stdout which is what you want. Logging can be supressed. This is intentional
ci/build.py
Outdated
@@ -284,7 +364,7 @@ def script_name() -> str: | |||
default=1, | |||
type=int) | |||
|
|||
parser.add_argument("-c", "--cache", action="store_true", | |||
parser.add_argument("--no-cache", action="store_true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that this is changing behavior of the script and does not fall under "minor refactorings". Even if it's a oneliner - it needs a separate PR.
c513740
to
bc5d32b
Compare
dc6b876
to
2cea66c
Compare
ci/build.py
Outdated
@@ -284,8 +288,9 @@ def script_name() -> str: | |||
default=1, | |||
type=int) | |||
|
|||
parser.add_argument("-c", "--cache", action="store_true", | |||
help="Enable docker registry cache") | |||
parser.add_argument("--no-cache", action="store_true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the name a bit confusing, maybe --dockerhub--cache
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️
3b6e8aa
to
04b8cd8
Compare
@@ -284,8 +288,10 @@ def script_name() -> str: | |||
default=1, | |||
type=int) | |||
|
|||
parser.add_argument("-c", "--cache", action="store_true", | |||
help="Enable docker registry cache") | |||
parser.add_argument("-c", "--no-dockerhub-cache", action="store_true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the same as not providing --docker-registry
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather have it explicit. but we would have to code it if we want it the way you say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that for "default" behaviour, which I assume is use local cache (as for normal docker builds) we need to provide --no-dockerhub-cache explicitly. I think it should be the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-c
is not a good shortening anymore for the option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behavior remains unchanged if I'm not mistaken. Which is to use the dockerhub cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kellen. One doesn't need to specify the registry. Hence this option to disable cache which is enabled by default. I'm not sure what's the cause of confusion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a work around for broken behaviour to me. Would you mind if we leave it out of this PR, and try to think about how to fix the root cause of the problem? To me it makes sense to not use remote caching if a user doesn't specify '--docker-registry'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s bad because we want to use cache by default. This was discussed at length with Marco before and we came to agree on this behavior. I wouldn’t want to change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s bad because we want to use cache by default. This was discussed at length with Marco before and we came to agree on this behavior. I wouldn’t want to change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's the information I was missing here. In this case it makes sense to me.
if instance_id: | ||
logging.info("EC2 Instance id: %s", instance_id) | ||
pp = pprint.PrettyPrinter(indent=4) | ||
logging.debug("Build environment: %s", pp.pformat(dict(os.environ))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be info? The global level is at that level. Or should we have separate levels for ci and local?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's too verbose. In general I don't think we need to view this environment. I had it to diagnose. I wouldn't bother right now.
logging.getLogger().setLevel(logging.INFO) | ||
def log_environment(): | ||
instance_id = ec2_instance_id_hostname() | ||
if instance_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always true? The function returns '?' if something goes wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty string is false. In that case is not printed. For example outside of ci or locally.
try: | ||
return f(*args, **kwargs) | ||
except target_exception as e: | ||
logging.warning("Exception: %s, Retrying in %d seconds...", str(e), mdelay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we propagate the exception if we out of retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last call to f is propagating the exception as you say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo the new caching arguments, which I'm not still 100% understanding.
"-f", get_dockerfile(platform), | ||
"--build-arg", "USER_ID={}".format(os.getuid()), | ||
"--build-arg", "GROUP_ID={}".format(os.getgid())] | ||
if use_cache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I find this a bit of a misnomer. It sounds like we're disabling docker-cache, which is possible, but it's not what's done here. Reading the code I would expect it to pass --no-cache to the docker cmd being run. Maybe a more specific name could help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open. Any suggestions?
cmd.extend(["--cache-from", tag]) | ||
cmd.extend(["-t", tag, get_dockerfiles_path()]) | ||
|
||
@retry(subprocess.CalledProcessError, tries=num_retries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like this.
if not dry_run and interactive: | ||
into_cmd = deepcopy(runlist) | ||
# -ti can't be after the tag, as is interpreted as a command so hook it up after the -u argument | ||
idx = into_cmd.index('-u') + 2 | ||
into_cmd[idx:idx] = ['-ti'] | ||
cmd = '\\\n\t'.join(into_cmd) | ||
cmd = ' \\\n\t'.join(into_cmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this.
@lebeg - This PR is ready to go, if your concerns are addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, lgtm
@marcoabreu - can you take a look at this? |
LGTM, thanks for iterating on this one! |
* Separate minor refactoring from apache#12276 in a prior PR * Address CR comments
Description
This is a prior refactor separated from #12276 at request from the reviewer.
Fixes --no-dockerhub-cache options and docker's --cache-from so local cache can be used on request by providing --no-dockerhub-cache.
Checked with mypy and pycharm code analyze.
@KellenSunderland
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Comments