Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak #307

Merged
merged 9 commits into from
Feb 10, 2022
Merged

Fix memory leak #307

merged 9 commits into from
Feb 10, 2022

Conversation

kzhang2
Copy link
Contributor

@kzhang2 kzhang2 commented Feb 10, 2022

Description

Fixes #284

Fix memory leak in maml.py and meta-sgd.py and add tests to maml_test.py and metasgd_test.py to check for possible future memory leaks. A test involving cloning parameters seems to fail, but my changes have nothing to do with it.

If necessary, use the following space to provide context or more details.

Contribution Checklist

If your contribution modifies code in the core library (not docs, tests, or examples), please fill the following checklist.

  • My contribution is listed in CHANGELOG.md with attribution.
  • My contribution modifies code in the main library.
  • My modifications are tested.
  • My modifications are documented.

Optional

If you make major changes to the core library, please run make alltests and copy-paste the content of alltests.txt below.

make[1]: Entering directory '/home/kevin/Documents/umd_cp/research/open-source/learn2learn'
OMP_NUM_THREADS=1 \
MKL_NUM_THREADS=1 \
python -W ignore -m unittest discover -s 'tests' -p '*_test.py' -v
9464832it [00:01, 4735385.84it/s]                             otIntegrationTests) ... 
6463488it [00:01, 4715230.93it/s]                             
ok
test_adaptation (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_allow_nograd (unit.algorithms.gbml_test.TestGBMLgorithm) ... Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/learn2learn/optim/parameter_update.py", line 119, in forward
    gradients = torch.autograd.grad(
  File "/home/kevin/anaconda3/envs/research/lib/python3.8/site-packages/torch/autograd/__init__.py", line 234, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: One of the differentiated Tensors does not require grad
ok
test_allow_unused (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_clone_module (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_graph_connection (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_adaptation (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_allow_nograd (unit.algorithms.maml_test.TestMAMLAlgorithm) ... Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/learn2learn/algorithms/maml.py", line 159, in adapt
    gradients = grad(loss,
  File "/home/kevin/anaconda3/envs/research/lib/python3.8/site-packages/torch/autograd/__init__.py", line 234, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: One of the differentiated Tensors does not require grad
ok
test_allow_unused (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_clone_module (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_first_order_adaptation (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_graph_connection (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_memory_consumption (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_module_shared_params (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_adaptation (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_clone_module (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_graph_connection (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_memory_consumption (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_meta_lr (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
9464832it [00:02, 3925553.32it/s]                             
6463488it [00:01, 4160425.13it/s]                             
test_data_labels_length (unit.data.metadataset_test.TestMetaDataset) ... ok
test_data_labels_values (unit.data.metadataset_test.TestMetaDataset) ... ok
test_data_length (unit.data.metadataset_test.TestMetaDataset) ... ok
test_fails_with_non_torch_dataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_filtered_metadataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_get_item (unit.data.metadataset_test.TestMetaDataset) ... ok
test_labels_to_indices (unit.data.metadataset_test.TestMetaDataset) ... ok
test_union_metadataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_dataloader (unit.data.task_dataset_test.TestTaskDataset) ... Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_background.zip to ./data/omniglot-py/images_background.zip
Extracting ./data/omniglot-py/images_background.zip to ./data/omniglot-py
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_evaluation.zip to ./data/omniglot-py/images_evaluation.zip
Extracting ./data/omniglot-py/images_evaluation.zip to ./data/omniglot-py
0 Meta Train Accuracy 0.42500000912696123
1 Meta Train Accuracy 0.5062500112690032
2 Meta Train Accuracy 0.537500012665987
3 Meta Train Accuracy 0.43125001015141606
4 Meta Train Accuracy 0.5187500142492354
learn2learn: Maybe try with allow_nograd=True and/orallow_unused=True ?
learn2learn: Maybe try with allow_nograd=True and/or allow_unused=True ?
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_background.zip to /tmp/datasets/omniglot-py/images_background.zip
Extracting /tmp/datasets/omniglot-py/images_background.zip to /tmp/datasets/omniglot-py
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_evaluation.zip to /tmp/datasets/omniglot-py/images_evaluation.zip
Extracting /tmp/datasets/omniglot-py/images_evaluation.zip to /tmp/datasets/omniglot-py
Downloading FC100. (160Mb)
Downloading CIFARFS to  /home/kevin/data
Creating CIFARFS splits
ok
test_infinite_tasks (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_instanciation (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_task_caching (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_task_transforms (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_filter_labels (unit.data.transforms_test.TestTransforms) ... ok
test_k_shots (unit.data.transforms_test.TestTransforms) ... ok
test_load_data (unit.data.transforms_test.TestTransforms) ... ok
test_n_ways (unit.data.transforms_test.TestTransforms) ... ok
test_remap_labels (unit.data.transforms_test.TestTransforms) ... ok
test_infinite_iterator (unit.data.utils_test.DataUtilsTests) ... ok
test_partition_task (unit.data.utils_test.DataUtilsTests) ... ok
test_illegal_dimensions (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_illegal_dimensions_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_n_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_n_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_n_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_n_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_simple (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_simple_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_cosine_distance (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_euclidean_distance (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_simple (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_clone_module_basics (unit.utils_test.UtilTests) ... ok
test_clone_module_models (unit.utils_test.UtilTests) ... ok
test_clone_module_nomodule (unit.utils_test.UtilTests) ... ok
test_distribution_clone (unit.utils_test.UtilTests) ... ok
test_distribution_detach (unit.utils_test.UtilTests) ... ok
test_module_clone_shared_params (unit.utils_test.UtilTests) ... ok
test_module_detach (unit.utils_test.UtilTests) ... ok
test_module_detach_keep_requires_grad (unit.utils_test.UtilTests) ... ok
test_module_update_shared_params (unit.utils_test.UtilTests) ... FAIL
test_rnn_clone (unit.utils_test.UtilTests) ... ok

======================================================================
FAIL: test_module_update_shared_params (unit.utils_test.UtilTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/tests/unit/utils_test.py", line 268, in test_module_update_shared_params
    self.assertTrue(
AssertionError: False is not true : clone and original do not have same number of parameters.

----------------------------------------------------------------------
Ran 62 tests in 128.143s

FAILED (failures=1)
make[1]: *** [Makefile:31: tests] Error 1
make[1]: Leaving directory '/home/kevin/Documents/umd_cp/research/open-source/learn2learn'

@seba-1511
Copy link
Member

Thanks a lot @kzhang2 -- this looks great (incl. Meta-SGD!). I'll merge and cut a new release as soon as it passes the tests.

N_STEPS = 5
N_EVAL = 2

device = torch.device('cuda:0')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub actions don't have cuda. Can we shield this test with: if torch.cuda.is_available(): so it doesn't run if the machine doesn't have cuda?

Copy link
Contributor Author

@kzhang2 kzhang2 Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -80,6 +80,34 @@ def test_adaptation(self):
self.assertTrue(hasattr(p, 'grad'))
self.assertTrue(p.grad.norm(p=2).item() > 0.0)

def test_memory_consumption(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kzhang2 and shield this test too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@seba-1511
Copy link
Member

OK, it took a bit of elbow grease but it seems to work now (I also took the opportunity to rewrite some flaky tests). Thanks for contributing this.

@seba-1511 seba-1511 merged commit 883d36a into learnables:master Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential Memory Leak Error
2 participants