diff --git a/.gitignore b/.gitignore index cd0ba22453512..c007140257188 100644 --- a/.gitignore +++ b/.gitignore @@ -157,3 +157,4 @@ tags data MNIST runs +*traces* diff --git a/CHANGELOG.md b/CHANGELOG.md index 6004a28dd0829..5f005f583c5ed 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,8 +14,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). - Added a way to print to terminal without breaking up the progress bar ([#5470](https://github.com/PyTorchLightning/pytorch-lightning/pull/5470)) + - Added support to checkpoint after training steps in `ModelCheckpoint` callback ([#6146](https://github.com/PyTorchLightning/pytorch-lightning/pull/6146)) + - Added `checkpoint` parameter to callback's `on_save_checkpoint` hook ([#6072](https://github.com/PyTorchLightning/pytorch-lightning/pull/6072)) @@ -37,6 +39,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). - Added arg to `self.log` that enables users to give custom names when dealing with multiple dataloaders ([#6274](https://github.com/PyTorchLightning/pytorch-lightning/pull/6274)) +- Added `teardown` method to `BaseProfiler` to enable subclasses defining post-profiling steps outside of `__del__` ([#6370](https://github.com/PyTorchLightning/pytorch-lightning/pull/6370)) + + - Added no return warning to predict ([#6139](https://github.com/PyTorchLightning/pytorch-lightning/pull/6139)) @@ -120,6 +125,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). - Added Autocast in validation, test and predict modes for Native AMP ([#6565](https://github.com/PyTorchLightning/pytorch-lightning/pull/6565)) + - Made the `Plugin.reduce` method more consistent across all Plugins to reflect a mean-reduction by default ([#6011](https://github.com/PyTorchLightning/pytorch-lightning/pull/6011)) @@ -147,6 +153,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). - Fixed LightningModule `all_gather` on cpu tensors ([#6416](https://github.com/PyTorchLightning/pytorch-lightning/pull/6416)) +- Fixed a bug where `all_gather` would not work correctly with `tpu_cores=8` ([#6587](https://github.com/PyTorchLightning/pytorch-lightning/pull/6587)) + + +- Update Gradient Clipping for the TPU Accelerator ([#6576](https://github.com/PyTorchLightning/pytorch-lightning/pull/6576)) + + - Fixed torch distributed not available in setup hook for DDP ([#6506](https://github.com/PyTorchLightning/pytorch-lightning/pull/6506)) @@ -170,12 +182,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). - Fixed when Train loop config was run during `Trainer.predict` ([#6541](https://github.com/PyTorchLightning/pytorch-lightning/pull/6541)) -- Fixed a bug where `all_gather` would not work correctly with `tpu_cores=8` ([#6587](https://github.com/PyTorchLightning/pytorch-lightning/pull/6587)) - - -- Update Gradient Clipping for the TPU Accelerator ([#6576](https://github.com/PyTorchLightning/pytorch-lightning/pull/6576)) - - ## [1.2.3] - 2021-03-09 ### Fixed diff --git a/pytorch_lightning/profiler/profilers.py b/pytorch_lightning/profiler/profilers.py index d704ba83236c1..55898dc2ee4e1 100644 --- a/pytorch_lightning/profiler/profilers.py +++ b/pytorch_lightning/profiler/profilers.py @@ -55,6 +55,10 @@ def start(self, action_name: str) -> None: def stop(self, action_name: str) -> None: """Defines how to record the duration once an action is complete.""" + def teardown(self) -> None: + """Execute arbitrary post-profiling tear-down steps as defined by subclass.""" + pass + @contextmanager def profile(self, action_name: str) -> None: """ @@ -211,14 +215,16 @@ def log_row(action, mean, total): def describe(self): """Logs a profile report after the conclusion of the training run.""" super().describe() - if self.output_file: - self.output_file.flush() + self.teardown() - def __del__(self): + def teardown(self) -> None: """Close profiler's stream.""" if self.output_file: self.output_file.close() + def __del__(self): + self.teardown() + class AdvancedProfiler(BaseProfiler): """ @@ -283,10 +289,12 @@ def summary(self) -> str: def describe(self): """Logs a profile report after the conclusion of the training run.""" super().describe() - if self.output_file: - self.output_file.flush() + self.teardown() - def __del__(self): + def teardown(self) -> None: """Close profiler's stream.""" if self.output_file: self.output_file.close() + + def __del__(self): + self.teardown() diff --git a/pytorch_lightning/profiler/pytorch.py b/pytorch_lightning/profiler/pytorch.py index 88a33a3d367f8..fdde80589acf3 100644 --- a/pytorch_lightning/profiler/pytorch.py +++ b/pytorch_lightning/profiler/pytorch.py @@ -294,10 +294,12 @@ def summary(self) -> str: def describe(self): """Logs a profile report after the conclusion of the training run.""" super().describe() - if self.output_file: - self.output_file.flush() + self.teardown() - def __del__(self): + def teardown(self) -> None: """Close profiler's stream.""" if self.output_file: self.output_file.close() + + def __del__(self): + self.teardown() diff --git a/pytorch_lightning/trainer/trainer.py b/pytorch_lightning/trainer/trainer.py index 0e9e28c9996f2..a5b99871d55f9 100644 --- a/pytorch_lightning/trainer/trainer.py +++ b/pytorch_lightning/trainer/trainer.py @@ -1077,6 +1077,7 @@ def call_teardown_hook(self, model: LightningModule) -> None: else: state = None + self.profiler.teardown() self.teardown(stage=state) model.teardown(stage=state) diff --git a/pytorch_lightning/trainer/training_loop.py b/pytorch_lightning/trainer/training_loop.py index 7e737c424ff26..a77d91a7402b4 100644 --- a/pytorch_lightning/trainer/training_loop.py +++ b/pytorch_lightning/trainer/training_loop.py @@ -140,6 +140,7 @@ def on_train_end(self): self.trainer.logger.finalize("success") # summarize profile results + # todo (tchaton) All ranks should call describe. if self.trainer.global_rank == 0: self.trainer.profiler.describe() diff --git a/tests/test_profiler.py b/tests/test_profiler.py index 5221c0cbf7bf6..ccdd8a569c9a8 100644 --- a/tests/test_profiler.py +++ b/tests/test_profiler.py @@ -252,8 +252,8 @@ def test_pytorch_profiler_trainer_ddp(tmpdir, use_output_filename): assert profiler.summary() is None assert set(profiler.profiled_actions.keys()) == set() - if use_output_filename: - profiler.describe() + # todo (tchaton) add support for all ranks + if use_output_filename and os.getenv("LOCAL_RANK") == "0": data = Path(profiler.output_fname).read_text() assert len(data) > 0 @@ -316,3 +316,21 @@ def test_pytorch_profiler_nested_emit_nvtx(tmpdir): gpus=1, ) trainer.fit(model) + + +@pytest.mark.parametrize("cls", (SimpleProfiler, AdvancedProfiler, PyTorchProfiler)) +def test_profiler_teardown(tmpdir, cls): + """ + This test checks if profiler teardown method is called when trainer is exiting. + """ + profiler = cls(output_filename=os.path.join(tmpdir, "profiler.txt")) + + model = BoringModel() + trainer = Trainer( + default_root_dir=tmpdir, + fast_dev_run=True, + profiler=profiler, + ) + trainer.fit(model) + + assert profiler.output_file.closed