Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-854] SVRG Optimization in Python Module API #12376

Merged
merged 1 commit into from
Sep 10, 2018

Conversation

StephanieYuan
Copy link
Contributor

@StephanieYuan StephanieYuan commented Aug 27, 2018

Description

SVRG stands for Stochastic Variance Reduced Gradient, which is an optimization technique that complements SGD. It is introduced in the paper Accelerating Stochastic Gradient Descent using Predicative Variance Reduction in 2013.
Key Characteristics of SVRG:

  • Explicit Variance Reduction
  • Converge much faster compared to SGD on strongly convex smooth functions

SVRG Module should:

  • seamlessly supports both dense and sparse data, run on CPU and GPU instances on single machine and in distributed setting.
  • minimize the API differences with Python Module API.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-854], where MXNET-854 refers to the relevant JIRA issue created (except PRs with tiny changes)
  • All changes have test coverage:
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • SVRGModule inherits from Python Module and minimizes the API changes of Module API. Both intermediate level and high level APIs are supported
  • Functional tests and unittests have been tested for SVRGModule and SVRG Optimizer.

Comments

  • Design decisions of why using Python Module instead of Optimizer can be found in the design doc here
  • This version of SVRGModule supports single machine run with single cpu, single gpu and multiple-gpus. Distributed SVRGModule is soon to be updated.
  • Benchmarking has been performed on this single machine version SVRGModule on single cpu on linear regression model. Distributed SVRGModule to be updated soon. Benchmarking results can be found in contrib/svrg_optimization_python/benchmarks.

@sandeep-krishnamurthy sandeep-krishnamurthy added Python pr-awaiting-review PR is waiting for code review labels Aug 28, 2018
@sandeep-krishnamurthy
Copy link
Contributor

Thanks @StephanieYuan - Welcome to MXNet community!

@piiswrong @eric-haibin-lin @anirudhacharya @Roshrini @vandanavk - You will be interested to have a look at this.

Copy link
Contributor

@sandeep-krishnamurthy sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few high level comments for now. Will dive deep by building this change and playing around with MNIST.

Thanks for doing the benchmarks and adding the results. Looks promising and very useful.

It will be very helpful, if we can benchmark on MNIST as depicted in the paper, and see if we can achieve the results in paper, to verify correctness of the implementation.

Examples
--------
>>> # An example of declaring and using SVRGModule.
>>> mod = mod = SVRGModule(symbol=lro, data_names=['data'], label_names=['lin_reg_label'], update_freq=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct this example.

if isinstance(update_freq, int):
self.update_freq = update_freq
else:
raise TypeError("update_freq must be an integer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make error more useful for user to understand the issue. Example: (update_freq in SVRGModule must be an integer. Example: 2. Given update_freq = ', update_freq)

"""Internal function to reset binded state."""
super(SVRGModule, self)._reset_bind()
self._mod_aux._reset_bind()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra spaces?

self.logger.info('Epoch[%d] Train-%s=%f', epoch, name, val)
toc = time.time()
self.logger.info('Epoch[%d] Time cost=%.3f', epoch, (toc - tic))
print('Epoch[%d] Time cost=%.3f', epoch, eval_metric.get())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use logger instead of print every where. example self.logger.info


@mx.optimizer.register
class SVRGOptimizer(mx.optimizer.Optimizer):
"""SVRGOptimizer is a wrapper class for two optimizers: one for accumulating full gradients and the other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly useful with SGD right? Why allow any optimizer to be passed in here?

Copy link
Contributor Author

@StephanieYuan StephanieYuan Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the previous benchmarks SVRG is used with SGD. Meanwhile I did some exploration of using NAG + SVRG and Adam + SVRG. I think it will be valuable to benchmark those optimizers with SVRG too.

@@ -0,0 +1,84 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests should go tests. May be like below?

  1. tests/python/train/test_contrib_svrg.py : Full end to end train and testing. See test_mlp.py as an example. Please, note, it should be faster (< 2 min at max?). If it is longer than that, we need to move it to nightly test.
  2. tests/python/unittest/test_contrib_svrgmodule.py / ..._svrgoptimizer.py etc..

mod.forward_backward(data_batch=batch)
mod.update()
mod.update_metric(metrics, batch.label)
print('Epoch[%d] Time cost=%.3f', e, metrics.get())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use logging.

:return: an instance of mx.io.NDArrayIter
:return: an instance of mx.mod.svrgmodule for performing SVRG optimization
"""
mx.random.seed(42)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please do not use fixed seed. You can see other tests under tests/
  2. Also, what are we asserting here?

return di, mod

# run as a script
if __name__ == "__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be? Please see other tests.

if __name__ == '__main__':
    import nose
    nose.runmodule()

@StephanieYuan
Copy link
Contributor Author

StephanieYuan commented Aug 29, 2018

Moved svrg_optimization to python/contrib package.

@StephanieYuan StephanieYuan force-pushed the SVRGOptimization branch 3 times, most recently from 495be3b to a107e3b Compare August 29, 2018 05:42
@xcgoner
Copy link

xcgoner commented Aug 29, 2018

@StephanieYuan The original version of SVRG could be time-consuming due to the computation of the full gradient at the beginning of each epoch. Could you also include the cheap version in your implementation:
https://arxiv.org/abs/1511.01942

Copy link
Contributor

@vandanavk vandanavk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, please add documentation for the input params for functions and a description of return variables wherever missing.

Please execute pylint on these files using ci/other/pylintrc in incubator-mxnet folder.
Something like this
pylint --rcfile=ci/other/pylintrc --ignore-patterns="..so$$,..dll$$,..dylib$$" python/mxnet/contrib/svrg_optimization/.py example/svrg_module/.py tests/python/unittest/test_contrib_svrg

python/mxnet/contrib/svrg_optimization/svrg_optimizer.py Outdated Show resolved Hide resolved
python/mxnet/contrib/svrg_optimization/svrg_optimizer.py Outdated Show resolved Hide resolved
tests/python/unittest/test_contrib_svrg_optimizer.py Outdated Show resolved Hide resolved
tests/python/unittest/test_contrib_svrg_optimizer.py Outdated Show resolved Hide resolved
tests/python/unittest/test_contrib_svrg_optimizer.py Outdated Show resolved Hide resolved
example/svrg_module/train.py Outdated Show resolved Hide resolved
example/svrg_module/train.py Outdated Show resolved Hide resolved
parser.add_argument('-b', dest="batch_size", help="define the batch size for training", type=int,
default=100, required=False)
parser.add_argument('-m', dest='metrics', help="create eval metric", type=str, required=False)
parser.add_argument('--gpus', type=str, help='list of gpus to run, e.g. 0 or 0,2,5. empty means using cpu')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default is mx.cpu() as defined in line 35.

tests/python/unittest/test_contrib_svrg_module.py Outdated Show resolved Hide resolved
@StephanieYuan
Copy link
Contributor Author

@xcgoner Thank you for the pointer! I'd like to try out the improvements in the later iterations of this project.

Copy link
Member

@anirudhacharya anirudhacharya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the API docs here - module.md and optimization.md with references to the SVRG module.

example/svrg_module/api_usage_example/example_inference.py Outdated Show resolved Hide resolved

@mx.optimizer.register
class SVRGOptimizer(mx.optimizer.Optimizer):
"""SVRGOptimizer is a wrapper class for two optimizers: self.aux_opt for accumulating full gradients and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write the doc string in the following format to maintain uniformity - https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/optimizer.py#L445-L488

tests/python/unittest/test_contrib_svrg_module.py Outdated Show resolved Hide resolved
import numpy as np


def set_up():
Copy link
Member

@anirudhacharya anirudhacharya Aug 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SVRGModule lacks test coverage. Can you add tests for module.save_checkpoint, module.load, module.forward, module.predict.

## SVRG Optimization in Python Module

### Problem Description
SVRG stands for Stochastic Variance Reduced Gradient, which was first introduced in the paper _Accelerating Stochastic
Copy link
Member

@anirudhacharya anirudhacharya Aug 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README should be removed from here and the SVRG description and characteristics should go into the documentation of the SVRG optimizer.

* test_svrg_module.py: unittests for SVRGModule API
* test_svrg_optimizer.py: unittests for SVRGOptimizer API

### Examples:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add these in the README of the examples folder.

* examples/svrg_module/example_api_train.py: a demo script for training a SVRGModule using intermediate level api and high level api
* examples/svrg_module/example_inference.py: a demo script for SVRGModule inference

#### Benchmarks:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the benchmark graphs are present in the "examples" folder, move these lines to the README in the examples folder.


#### Testing:

Unit Tests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these lines about the Unit Tests.

data_reader.py to download the data.

This example trains a linear regression model using SVRGModule. Logs of the training results can be found in
experiments.log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is experiments.log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be generated when runs the train script.

@StephanieYuan StephanieYuan force-pushed the SVRGOptimization branch 3 times, most recently from 2f7b40d to 4a2b644 Compare September 4, 2018 21:47
Copy link
Member

@anirudhacharya anirudhacharya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run the following command to get the coverage results nosetests -v --with-coverage tests/python/unittest/test_contrib_svrg_*

mxnet/contrib/svrg_optimization/__init__.py                  3      0   100%
mxnet/contrib/svrg_optimization/svrg_module.py             175     92    47%
mxnet/contrib/svrg_optimization/svrg_optimizer.py           38      1    97%

If you notice svrg_module.py coverage is deficient. Please increase the coverage for that file, for example update_full_grads method is not tested and you have overriden the fit method of the base class Module, please add tests for them.

Also I am still not sure if the organization of the documentation for this new module is right. Marking @aaronmarkham to review the doc files.

@@ -168,6 +168,7 @@ def test_module_layout():
assert mod.get_outputs()[0].shape == dshape

hdshape = (3, 4, 7)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary blank line

Copy link
Contributor

@aaronmarkham aaronmarkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without trying it out, I just have a couple of suggestions for updates.
To be clear, at a minimum, you need to update api/python/index.md and add this module in the contrib section.

@@ -0,0 +1,26 @@
## SVRG Module Example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce what it is as well. From your other document:
"SVRGModule is an extension to the Module API that implements SVRG (Stochastic Variance Reduced Gradients) optimization ..."

## SVRG Module Example

#### API Usage Example
SVRGModule provides both high-level and intermediate-level APIs while minimizing the changes with Module API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link back to the docs. This should link back to this spot - where you should add an entry for this module.
https://github.com/apache/incubator-mxnet/blob/master/docs/api/python/index.md#contrib-package
Since this comes from the example folder and outside of the docs, use a full static link.

@@ -1,188 +0,0 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete this file and legacy_ndarray.v0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done by mistake, I'll add them back when I push for more svrg_module test coverage. Thank you for the heads up!

@anirudhacharya
Copy link
Member

anirudhacharya commented Sep 5, 2018

@StephanieYuan Also the way svrg_optimizer.py is written it leaves a few questions open.

From your design document I understand that SVRGOptimizer cannot be used with Module API which is why you wrote the SVRGModule API which will use SVRGOptimizer under the hood instead of the usual SGD or other optimizers. Can you explain why in greater detail?

Also if this is indeed the case, then the SVRGOptimizer class needs to be declared private using a leading underscore in its name, else it will be very misleading to the user who might try to use SVRGOptimizer with the plain vanilla Module API. Also it should be made clear in the doc string for the SRVGOptimizer that this optimizer is to be used with the SVRGModule alone.

Also you might want to consider moving the SVRGOptimizer to the optimizer.py file and keep SVRGModule alone in the contrib folder. But I am not sure of this, you might want to get a second opinion on this. @piiswrong @eric-haibin-lin

@StephanieYuan StephanieYuan force-pushed the SVRGOptimization branch 2 times, most recently from fa1753c to bb31363 Compare September 5, 2018 04:33
@sandeep-krishnamurthy
Copy link
Contributor

Great progress here. Thanks everyone.
@StephanieYuan

  1. Please rebase and resolve conflicts.
  2. Please comment about tests added and coverage.
  3. We had discussed in above comment, to verify this implementation with MNIST to confirm correctness per the claims in paper. Please comment.

@anirudhacharya - I would still prefer keeping SVRG optimizer in contrib and allow it to evolve. As mentioned in above comments and design doc, Other combination like Adam+SVRG could also be used but not benchmarked. I would set the default optimizer for SVRGModule to be SVRGOptimizer but not remove the option altogether and make SVRGOptimizer internal. Thoughts?

@StephanieYuan StephanieYuan force-pushed the SVRGOptimization branch 2 times, most recently from 4c48130 to c446dc0 Compare September 5, 2018 13:51
@StephanieYuan
Copy link
Contributor Author

Thanks everyone for the feedback and input!

I've rebased and resolved the conflict with master. I have also added some more unittests for SVRGModule especially on verifying SVRG gradients calculations and the test coverage is now:
mxnet/contrib/svrg_optimization/svrg_module.py 175 21 88%
mxnet/contrib/svrg_optimization/svrg_optimizer.py 38 1 97%

@sandeep-krishnamurthy I'm working on creating the MNIST example but I'm expecting it will take a few more days as I have other tasks in queue. Maybe I can include the example in my next PR on SVRG in distributed mode?

@anirudhacharya SVRGOptimizer is designed to be used with SVRGModule alone. The actual SVRG optimization logic is implemented in the SVRGModule and the SVRGOptimizer is designed to assist the full gradients accumulation in the KVStore. I've made SVRGOptimizer a private class and added the reasonings behind the design in greater detail in the docs/api/python/contrib/svrg_optimization. Hopefully it clarifies things a bit.

@aaronmarkham I've added pointer about the svrg_optimization in index.md and refined the docs in the example/svrg_module/README.

Copy link
Contributor

@sandeep-krishnamurthy sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left 2 minor comments, please fix. Thank you for your contributions.

This PR is ready to go after approval from @aaronmarkham for documentation and @anirudhacharya for his code review comments.

mod.init_params()
mod.init_optimizer(optimizer='sgd', optimizer_params={'learning_rate': 0.1})
mod.update()
mod.save_checkpoint('test', 0, save_optimizer_states=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended to use tempfile. This avoid race condition in file access from other tests and also clean up is easy. Something like:

import tempfile
import os
....
 tmp = tempfile.mkdtemp()
 tmpfile = os.path.join(tmp, 'svrg_test')
mod.save_checkpoint(tmpfile, .....)

@@ -815,4 +815,4 @@ def test_forward_types():

if __name__ == '__main__':
import nose
nose.runmodule()
nose.runmodule()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a new line (pep)

@anirudhacharya
Copy link
Member

@sandeep-krishnamurthy it now looks better after SVRGOptimizer was made private. Including SVRGOptimizer in optimizer.py file instead of the contrib folder was just a thought. I am okay either ways. Will take a look at @StephanieYuan 's changes and approve it accordingly.

Copy link
Member

@anirudhacharya anirudhacharya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the documentation for this module renders right now - http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-12376/17/api/python/contrib/svrg_optimization.html
Private members like _SVRGOptimizer should not show up in the documentation

http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-12376/17/api/python/module/module.html#module-api

The description for the SVRGModule is not rendering. cc: @aaronmarkham

Copy link
Member

@anirudhacharya anirudhacharya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address the comments on svrg_optimization.md file

@StephanieYuan
Copy link
Contributor Author

@anirudhacharya I'll fix it soon. Is there a way that I can see the rendered view before it gets pushed?

@anirudhacharya
Copy link
Member

@StephanieYuan some of the issues are with the content of the docs and few are with how the content renders. @aaronmarkham might be able to help with how the docs renders. You will probably have to build the docs and have to deploy it locally.

@StephanieYuan StephanieYuan force-pushed the SVRGOptimization branch 3 times, most recently from c6842a0 to d6e33f1 Compare September 5, 2018 20:57
@aaronmarkham
Copy link
Contributor

I just updated the how-to guide for building the docs.
https://cwiki.apache.org/confluence/display/MXNET/How+to+Build+the+Website
Let me know if you have any questions.

@aaronmarkham
Copy link
Contributor

I'm not sure why those private members are showing up. Sphinx should recognize them as such if they're defined that way.
Look at http://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html and maybe you can use :members or similar to limit what's displayed.
If you get your paths to be specific and accurate (without shortcuts) you should see the description populate assuming you have one. I noticed with ONNX that while mxnet.onnx.import_model might work in Python, it didn't work with Sphinx. The full path was required for the docs to generate properly mxnet.contrib.onnx.onnx2mx.import_model. I'm not sure why though.

Be sure to adjust settings.ini when testing the docs build, otherwise you'll be waiting forever each time while scala builds.

@StephanieYuan
Copy link
Contributor Author

@aaronmarkham Thanks for the pointer!
I'm thinking to exclude SVRGModule from module.md since SVRGModule will be a contrib package at current stage.

@StephanieYuan StephanieYuan force-pushed the SVRGOptimization branch 2 times, most recently from dfddae1 to 3887fe9 Compare September 6, 2018 00:50
@StephanieYuan StephanieYuan force-pushed the SVRGOptimization branch 2 times, most recently from e39bf63 to 49220e0 Compare September 7, 2018 15:41
Copy link
Member

@anirudhacharya anirudhacharya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the examples folder could you please add the code you used to generate the benchmarking graphs for the linear regression example. Since you are including the graphs in the examples folder, you should also have the code that can reproduce those results.

@StephanieYuan
Copy link
Contributor Author

@anirudhacharya Sounds good! It's a Jupyter Notebook. I'll add it to the example folder.

…c. This version supports single machine SVRG with single cpu, gpu and multi-gpus.
Copy link
Member

@anirudhacharya anirudhacharya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sandeep-krishnamurthy sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for awesome work.
Having docs, example jupyter notebook, good test coverage is super useful.

@sandeep-krishnamurthy sandeep-krishnamurthy merged commit 3406845 into apache:master Sep 10, 2018
aaronmarkham pushed a commit to aaronmarkham/incubator-mxnet that referenced this pull request Sep 11, 2018
…he#12376)

Implemented a python SVRGModule for performing SVRG Optimization Logic. This version supports single machine SVRG with single cpu, gpu and multi-gpus. (apache#12376)
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request Sep 19, 2018
…he#12376)

Implemented a python SVRGModule for performing SVRG Optimization Logic. This version supports single machine SVRG with single cpu, gpu and multi-gpus. (apache#12376)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants