Fix memory leak #307

kzhang2 · 2022-02-10T00:45:43Z

Description

Fixes #284

Fix memory leak in maml.py and meta-sgd.py and add tests to maml_test.py and metasgd_test.py to check for possible future memory leaks. A test involving cloning parameters seems to fail, but my changes have nothing to do with it.

If necessary, use the following space to provide context or more details.

Contribution Checklist

If your contribution modifies code in the core library (not docs, tests, or examples), please fill the following checklist.

My contribution is listed in CHANGELOG.md with attribution.
My contribution modifies code in the main library.
My modifications are tested.
My modifications are documented.

Optional

If you make major changes to the core library, please run make alltests and copy-paste the content of alltests.txt below.

make[1]: Entering directory '/home/kevin/Documents/umd_cp/research/open-source/learn2learn'
OMP_NUM_THREADS=1 \
MKL_NUM_THREADS=1 \
python -W ignore -m unittest discover -s 'tests' -p '*_test.py' -v
9464832it [00:01, 4735385.84it/s]                             otIntegrationTests) ... 
6463488it [00:01, 4715230.93it/s]                             
ok
test_adaptation (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_allow_nograd (unit.algorithms.gbml_test.TestGBMLgorithm) ... Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/learn2learn/optim/parameter_update.py", line 119, in forward
    gradients = torch.autograd.grad(
  File "/home/kevin/anaconda3/envs/research/lib/python3.8/site-packages/torch/autograd/__init__.py", line 234, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: One of the differentiated Tensors does not require grad
ok
test_allow_unused (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_clone_module (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_graph_connection (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_adaptation (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_allow_nograd (unit.algorithms.maml_test.TestMAMLAlgorithm) ... Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/learn2learn/algorithms/maml.py", line 159, in adapt
    gradients = grad(loss,
  File "/home/kevin/anaconda3/envs/research/lib/python3.8/site-packages/torch/autograd/__init__.py", line 234, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: One of the differentiated Tensors does not require grad
ok
test_allow_unused (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_clone_module (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_first_order_adaptation (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_graph_connection (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_memory_consumption (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_module_shared_params (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_adaptation (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_clone_module (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_graph_connection (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_memory_consumption (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_meta_lr (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
9464832it [00:02, 3925553.32it/s]                             
6463488it [00:01, 4160425.13it/s]                             
test_data_labels_length (unit.data.metadataset_test.TestMetaDataset) ... ok
test_data_labels_values (unit.data.metadataset_test.TestMetaDataset) ... ok
test_data_length (unit.data.metadataset_test.TestMetaDataset) ... ok
test_fails_with_non_torch_dataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_filtered_metadataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_get_item (unit.data.metadataset_test.TestMetaDataset) ... ok
test_labels_to_indices (unit.data.metadataset_test.TestMetaDataset) ... ok
test_union_metadataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_dataloader (unit.data.task_dataset_test.TestTaskDataset) ... Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_background.zip to ./data/omniglot-py/images_background.zip
Extracting ./data/omniglot-py/images_background.zip to ./data/omniglot-py
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_evaluation.zip to ./data/omniglot-py/images_evaluation.zip
Extracting ./data/omniglot-py/images_evaluation.zip to ./data/omniglot-py
0 Meta Train Accuracy 0.42500000912696123
1 Meta Train Accuracy 0.5062500112690032
2 Meta Train Accuracy 0.537500012665987
3 Meta Train Accuracy 0.43125001015141606
4 Meta Train Accuracy 0.5187500142492354
learn2learn: Maybe try with allow_nograd=True and/orallow_unused=True ?
learn2learn: Maybe try with allow_nograd=True and/or allow_unused=True ?
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_background.zip to /tmp/datasets/omniglot-py/images_background.zip
Extracting /tmp/datasets/omniglot-py/images_background.zip to /tmp/datasets/omniglot-py
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_evaluation.zip to /tmp/datasets/omniglot-py/images_evaluation.zip
Extracting /tmp/datasets/omniglot-py/images_evaluation.zip to /tmp/datasets/omniglot-py
Downloading FC100. (160Mb)
Downloading CIFARFS to  /home/kevin/data
Creating CIFARFS splits
ok
test_infinite_tasks (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_instanciation (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_task_caching (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_task_transforms (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_filter_labels (unit.data.transforms_test.TestTransforms) ... ok
test_k_shots (unit.data.transforms_test.TestTransforms) ... ok
test_load_data (unit.data.transforms_test.TestTransforms) ... ok
test_n_ways (unit.data.transforms_test.TestTransforms) ... ok
test_remap_labels (unit.data.transforms_test.TestTransforms) ... ok
test_infinite_iterator (unit.data.utils_test.DataUtilsTests) ... ok
test_partition_task (unit.data.utils_test.DataUtilsTests) ... ok
test_illegal_dimensions (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_illegal_dimensions_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_n_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_n_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_n_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_n_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_simple (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_simple_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_cosine_distance (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_euclidean_distance (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_simple (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_clone_module_basics (unit.utils_test.UtilTests) ... ok
test_clone_module_models (unit.utils_test.UtilTests) ... ok
test_clone_module_nomodule (unit.utils_test.UtilTests) ... ok
test_distribution_clone (unit.utils_test.UtilTests) ... ok
test_distribution_detach (unit.utils_test.UtilTests) ... ok
test_module_clone_shared_params (unit.utils_test.UtilTests) ... ok
test_module_detach (unit.utils_test.UtilTests) ... ok
test_module_detach_keep_requires_grad (unit.utils_test.UtilTests) ... ok
test_module_update_shared_params (unit.utils_test.UtilTests) ... FAIL
test_rnn_clone (unit.utils_test.UtilTests) ... ok

======================================================================
FAIL: test_module_update_shared_params (unit.utils_test.UtilTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/tests/unit/utils_test.py", line 268, in test_module_update_shared_params
    self.assertTrue(
AssertionError: False is not true : clone and original do not have same number of parameters.

----------------------------------------------------------------------
Ran 62 tests in 128.143s

FAILED (failures=1)
make[1]: *** [Makefile:31: tests] Error 1
make[1]: Leaving directory '/home/kevin/Documents/umd_cp/research/open-source/learn2learn'

seba-1511 · 2022-02-10T00:47:51Z

Thanks a lot @kzhang2 -- this looks great (incl. Meta-SGD!). I'll merge and cut a new release as soon as it passes the tests.

seba-1511 · 2022-02-10T00:52:27Z

tests/unit/algorithms/maml_test.py

+        N_STEPS = 5
+        N_EVAL = 2
+
+        device = torch.device('cuda:0')


GitHub actions don't have cuda. Can we shield this test with: if torch.cuda.is_available(): so it doesn't run if the machine doesn't have cuda?

seba-1511 · 2022-02-10T00:52:52Z

tests/unit/algorithms/metasgd_test.py

@@ -80,6 +80,34 @@ def test_adaptation(self):
            self.assertTrue(hasattr(p, 'grad'))
            self.assertTrue(p.grad.norm(p=2).item() > 0.0)

+    def test_memory_consumption(self):


@kzhang2 and shield this test too.

seba-1511 · 2022-02-10T03:22:39Z

OK, it took a bit of elbow grease but it seems to work now (I also took the opportunity to rewrite some flaky tests). Thanks for contributing this.

kzhang2 added 2 commits February 9, 2022 19:27

fixed memory leak issue and added tests to check for memory leaks

79e96e2

changelog

1b1ce0e

seba-1511 reviewed Feb 10, 2022

View reviewed changes

kzhang2 and others added 7 commits February 9, 2022 19:59

shield tests to ensure they run on machines without gpu

9249596

Fix cuda device.

1a1c66b

Try to fix cuda device.

93cacce

Fix num_parameters test not passing.

3a0c857

Remove unnecesary whitespace.

5a4ee91

Try fixing memory tests.

75bf4e0

Rewrite some tests to avoid GDrive downloads.

9031b8f

seba-1511 merged commit 883d36a into learnables:master Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory leak #307

Fix memory leak #307

kzhang2 commented Feb 10, 2022

seba-1511 commented Feb 10, 2022

seba-1511 Feb 10, 2022

kzhang2 Feb 10, 2022 •

edited

Loading

seba-1511 Feb 10, 2022

kzhang2 Feb 10, 2022

seba-1511 commented Feb 10, 2022

Fix memory leak #307

Fix memory leak #307

Conversation

kzhang2 commented Feb 10, 2022

Description

Contribution Checklist

Optional

seba-1511 commented Feb 10, 2022

seba-1511 Feb 10, 2022

Choose a reason for hiding this comment

kzhang2 Feb 10, 2022 • edited Loading

Choose a reason for hiding this comment

seba-1511 Feb 10, 2022

Choose a reason for hiding this comment

kzhang2 Feb 10, 2022

Choose a reason for hiding this comment

seba-1511 commented Feb 10, 2022

kzhang2 Feb 10, 2022 •

edited

Loading