Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-769] set MXNET_HOME as base for downloaded models through base.data_dir() #11636

Merged
merged 2 commits into from
Aug 2, 2018

Conversation

larroy
Copy link
Contributor

@larroy larroy commented Jul 10, 2018

Description

This allows setting the data dir for models as an environment variable and with better per platform defaults.

Would also solve the following issue:
#11616

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@@ -126,6 +126,11 @@ When USE_PROFILER is enabled in Makefile or CMake, the following environments ca
- Values: String ```(default='https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/'```
- The repository url to be used for Gluon datasets and pre-trained models.

* MXNET_DATA_DIR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are models with such default path too. Rename to MXNET_HOME?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate? I is this path hardcoded somewhere else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the confusion. What I meant is we can use a variable name such as MXNET_HOME so that other assets such as models and datasets can all be saved there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I don't see what is the different semantic with dada dir and home, might be my poor english skills, so fine to take the suggestion.

if 'XDG_DATA_HOME'in os.environ:
return os.path.join(os.environ['XDG_DATA_HOME'], 'mxnet')
else:
return os.path.join(os.path.expanduser("~"), '.local', 'share', 'mxnet')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change the default value from $HOME/.mxnet/ or otherwise we'd be forcing everyone to re-download.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a big deal? we can also migrate the directory...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a big deal.

if system == 'Windows':
return os.environ.get('APPDATA')
elif system == 'Linux' or system == 'Darwin':
if 'XDG_DATA_HOME'in os.environ:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's XDG? why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we ever followed this spec anywhere? It doesn't seem necessary.

@larroy larroy force-pushed the data_dir branch 2 times, most recently from a4273ce to aff84a1 Compare July 10, 2018 23:10
Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate how this solves the issue? The problem is that the access to the same directory and file is not synchronized. Since we want people to use MXNet in multi tenancy mode (without having to redownload the models for every single instance), we should add synchronisation.
This solution would only fix the problem if people set a unique path for every process.

@larroy
Copy link
Contributor Author

larroy commented Jul 11, 2018

@marcoabreu this is the first step in the right direction. It solves the problem if you have it by specifying different directories. Also allow to run unit tests in parallel in ci. This should be done before any other complexities such as portable cross system wide locks which are not trivial. Windows is not fully POSIX

* MXNET_DATA_DIR
- Data directory in the filesystem for storage, for example when downloading gluon models.
- Default in *nix is .local/share/mxnet APPDATA/mxnet in windows, XDG_DATA_HOME/mxnet if
XDG_DATA_HOME is set.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove XDG_DATA_HOME

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove it?

@larroy larroy force-pushed the data_dir branch 3 times, most recently from 0d8e575 to 0077502 Compare July 18, 2018 19:11
@larroy larroy force-pushed the data_dir branch 7 times, most recently from 4cebf87 to eb718d5 Compare August 1, 2018 12:11
@larroy larroy changed the title [WIP] set MXNET_DATA_DIR as base for downloaded models through base.data_dir() set MXNET_DATA_DIR as base for downloaded models through base.data_dir() Aug 1, 2018
@larroy
Copy link
Contributor Author

larroy commented Aug 1, 2018

ready to merge

* MXNET_DATA_DIR
- Data directory in the filesystem for storage, for example when downloading gluon models.
- Default in *nix is .local/share/mxnet APPDATA/mxnet in windows, XDG_DATA_HOME/mxnet if
XDG_DATA_HOME is set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove it?


:return: data directory in the filesystem for storage, for example when downloading models
"""
return os.getenv('MXNET_DATA_DIR', data_dir_default())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rename as requested by Sheng?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, scala package already using MXNET_DATA_DIR, so seems it's the convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szha @nswamy could you help us out here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What help do we need?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MXNET_DATA_DIR is the place where we download models/training/test/other data.
MXNET_HOME sounds like a place where you have library, for example if you want set it to PYTHONPATH

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scala should follow the previous convention instead of making up new ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if MXNET_DATA_DIR being used on our side, but we do have this:
link

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable already existed in some examples when I started working
https://github.com/apache/incubator-mxnet/blob/5627bc3ab71449532aa42297827993353d9ea580/scala-package/examples/src/main/scala/org/apache/mxnetexamples/imclassification/TrainMnist.scala#L115

I have no preference one way or another as long as we consistently follow the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the library we have MXNET_LIBRARY_PATH as can be seen in libinfo.py
@szha where is the MXNET_HOME convention? can you point out files?

@@ -484,7 +484,7 @@ def assert_almost_equal(a, b, rtol=None, atol=None, names=('a', 'b'), equal_nan=
return

index, rel = find_max_violation(a, b, rtol, atol)
np.set_printoptions(threshold=4, suppress=True)
np.set_printoptions(threshold=np.nan, suppress=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert this please as it is outside the scope of this PR?

@@ -0,0 +1,30 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check, but don't we already have a place for utility functions?

Copy link
Contributor Author

@larroy larroy Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked for it, but I think we don't.

available in Python2"""
if sys.version_info[0] < 3:
from distutils.dir_util import mkpath
mkpath(d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I assuming right that this function is a NO-OP if the dir exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -1282,7 +1282,7 @@ void GraphExecutor::InitDataEntryMemory(std::vector<NDArray>* shared_pool) {
for (size_t i = 0; i < pool_info.size(); i++) {
sorted_pool_index.push_back(i);
}
auto pool_comparator = [&pool_info](int lhs, int rhs){
auto pool_comparator = [&pool_info](size_t lhs, size_t rhs){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert this please as it is outside the scope of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes a warning, minor change, why revert?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep concerns separate

@@ -0,0 +1,50 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@larroy
Copy link
Contributor Author

larroy commented Aug 1, 2018

Addressed comments.

@larroy larroy changed the title set MXNET_DATA_DIR as base for downloaded models through base.data_dir() [MXNET-769] set MXNET_DATA_DIR as base for downloaded models through base.data_dir() Aug 1, 2018
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't use MXNET_DATA_DIR

@larroy
Copy link
Contributor Author

larroy commented Aug 1, 2018

@szha grep for MXNET_HOME, is used for where the root folder of mxnet source is located. This seems ambiguous and confusing for users. Can you elaborate why MXNET_HOME is better than MXNET_DATA_DIR?

@larroy larroy requested a review from yzhliu as a code owner August 1, 2018 19:42
@larroy larroy changed the title [MXNET-769] set MXNET_DATA_DIR as base for downloaded models through base.data_dir() [MXNET-769] set MXNET_HOME as base for downloaded models through base.data_dir() Aug 1, 2018
@larroy
Copy link
Contributor Author

larroy commented Aug 1, 2018

@szha merge?

@marcoabreu
Copy link
Contributor

If you think mxnet home is already occupied for a different purpose, feel free to propose an alternative.

@larroy
Copy link
Contributor Author

larroy commented Aug 1, 2018

@marcoabreu did you see the changes?

@marcoabreu
Copy link
Contributor

marcoabreu commented Aug 1, 2018

Yes. I am concerned that we are breaking backwards compatibility if we merge it as-is.

I personally don't think that mxnet home is a good variable if our convention is that it points to the source (I didn't check, just judging from the conversation here). We dont want the models to be stored in the git managed repository or pollute the workspace.

@szha
Copy link
Member

szha commented Aug 2, 2018

Other usages of MXNET_HOME are not user facing as they are only used internally in scripts such as installation.

@marcoabreu marcoabreu merged commit 564e01a into apache:master Aug 2, 2018
aaronmarkham pushed a commit to aaronmarkham/incubator-mxnet that referenced this pull request Aug 6, 2018
….data_dir() (apache#11636)

* set MXNET_DATA_DIR as base for downloaded models through base.data_dir()
push joblib to save containers so is not required when running

* MXNET_DATA_DIR -> MXNET_HOME
aaronmarkham added a commit to aaronmarkham/incubator-mxnet that referenced this pull request Aug 7, 2018
[MXNET-750] fix nested call on CachedOp. (apache#11951)

* fix nested call on cachedop.

* fix.

extend reshape op to allow reverse shape inference (apache#11956)

Improve sparse embedding index out of bound error message; (apache#11940)

[MXNET-770] Remove fixed seed in flaky test (apache#11958)

* Remove fixed seed in flaky test

* Remove fixed seed in flaky test

Update ONNX docs with the latest supported ONNX version (apache#11936)

Reduced test to 3 epochs and made gpu only (apache#11863)

* Reduced test to 3 epochs and made GPU only

* Moved logger variable so that it's accessible

Fix flaky tests for test_laop_4 (apache#11972)

Updating R client docs (apache#11954)

* Updating R client docs

* Forcing build

Fix install instructions for MXNET-R (apache#11976)

* fix install instructions for MXNET-R

* fix install instructions for MXNET-R

* fix default cuda version for MXNet-R

[MXNET-751] fix ce_loss flaky (apache#11971)

* add xavier initializer

* remove comment line

[MXNET-769] set MXNET_HOME as base for downloaded models through base.data_dir() (apache#11636)

* set MXNET_DATA_DIR as base for downloaded models through base.data_dir()
push joblib to save containers so is not required when running

* MXNET_DATA_DIR -> MXNET_HOME

[MXNET-748] linker fixed on Scala issues (apache#11989)

* put force load back as a temporary solution

* use project.basedir as relative path for OSX linker

[MXNET-772] Re-enable test_module.py:test_module_set_params (apache#11979)

[MXNET-771] Fix Flaky Test test_executor.py:test_dot (apache#11978)

* use assert_almost_equal, increase rtol, reduce matrix size

* remove seed in test_bind

* add seed 0 to test_bind, it is still flaky

* add comments for tracking

remove mod from arity 2 version of load-checkpoint in clojure-package (apache#11808)

* remove mod from arity 2 version of load-checkpoint

* load-checkpoint arity 2 test

Add unit test stage for mxnet cpu in debug mode (apache#11974)

Website broken link fixes (apache#12014)

* fix broken link

* fix broken link

* switch to .md links

* fix broken link

removed seed from flaky test (apache#11975)

Disable ccache log print due to threadunsafety (apache#11997)

Added default tolerance levels for regression checks for MBCC (apache#12006)

* Added tolerance level for assert_almost_equal for MBCC

* Nudge to CI

Disable flaky mkldnn test_requantize_int32_to_int8 (apache#11748)

[MXNET-769] Usability improvements to windows builds (apache#11947)

* Windows scripted build
Adjust Jenkins builds to use ci/build_windows.py

Issues:

    apache#8714
    apache#11100
    apache#10166
    apache#10049

* Fix bug

* Fix non-portable ut

* add xunit

Fix import statement (apache#12005)

array and multiply are undefined. Importing them from
ndarray

Disable flaky test test_random.test_gamma_generator (apache#12022)

[MXNET-770] Fix flaky test: test_factorization_machine_module (apache#12023)

* Remove fixed seed in flaky test

* Remove fixed seed in flaky test

* Update random seed to reproduce the issue

* Fix Flaky unit test and add a training test

* Remove fixed seed in flaky test

* Update random seed to reproduce the issue

* Fix Flaky unit test and add a training test

* Increase accuracy check

disable opencv threading for forked process (apache#12025)

Bug fixes in control flow operators (apache#11942)

Fix data narrowing warning on graph_executor.cc (apache#11969)

Fix flaky tests for test_squared_hinge_loss (apache#12017)

Fix flaky tests for test_hinge_loss (apache#12020)

remove fixed seed for test_sparse_ndarray/test_operator_gpu.test_sparse_nd_pickle (apache#12012)

Removed fixed seed from , test_loss:test_ctc_loss_train (apache#11985)

Removed fixed seed from , test_loss:test_sample_weight_loss (apache#11986)

Fix reduce_kernel_M1 (apache#12026)

* Fix reduce_kernel_M1

* Improve test_norm

Update test_loss.py to remove fixed seed (apache#11995)

[MXNET-23] Adding support to profile kvstore server during distributed training  (apache#11215)

* server profiling

merge with master

cleanup old code

added a check and better info message

add functions for C compatibility

fix doc

lint fixes

fix compile issues

lint fix

build error

update function signatures to preserve compatibility

fix comments

lint

* add part1 of test

* add integration test

Re-enabling test_ndarray/test_cached (apache#11950)

Test passes on CPU and GPU (10000 runs)

make gluon rnn layers hybrid blocks (apache#11482)

* make Gluon RNN layer hybrid block

* separate gluon gpu tests

* remove excess assert_raises_cudnn_disabled usage

* add comments and refactor

* add bidirectional test

* temporarily remove hybridize in test_gluon_rnn.test_layer_fill_shape

[MXNET-751] fix bce_loss flaky (apache#11955)

* add fix to bce_loss

* add comments

* remove unecessary comments

Doc fix for a few optimizers (apache#12034)

* Update optimizer.py

* Update optimizer.py
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
….data_dir() (apache#11636)

* set MXNET_DATA_DIR as base for downloaded models through base.data_dir()
push joblib to save containers so is not required when running

* MXNET_DATA_DIR -> MXNET_HOME
@larroy larroy deleted the data_dir branch November 15, 2018 18:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants