-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bugfix] Resolve PyTorch Profiling for Manual Optimization #9316
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for more information, see https://pre-commit.ci
tchaton
requested review from
awaelchli,
Borda,
carmocca,
justusschock,
kaushikb11,
SeanNaren and
williamFalcon
as code owners
September 3, 2021 18:10
tchaton
changed the title
progress
[bugfix] Resolve PyTorch Profiling for Manual Optimization
Sep 3, 2021
carmocca
approved these changes
Sep 4, 2021
Co-authored-by: Carlos Mocholí <[email protected]>
awaelchli
approved these changes
Sep 4, 2021
awaelchli
reviewed
Sep 4, 2021
I copied the tests to master and they both pass. Can you check the fix / repro again? |
Borda
approved these changes
Sep 6, 2021
Thanks @awaelchli. My test wasn't good enough. On master: class ManualOptimBoringModel(BoringModel):
def __init__(self):
super().__init__()
self.automatic_optimization = False
def training_step(self, batch, batch_idx):
opt = self.optimizers()
output = self(batch)
loss = self.loss(batch, output)
opt.zero_grad()
self.manual_backward(loss)
opt.step()
return loss
@pytest.mark.parametrize("fast_dev_run", [1, 2, 3, 4, 5])
@pytest.mark.parametrize("boring_model_cls", [BoringModel, ManualOptimBoringModel])
def test_pytorch_profiler_trainer_fit(fast_dev_run, boring_model_cls, tmpdir):
"""Ensure that the profiler can be given to the trainer and test step are properly recorded."""
pytorch_profiler = PyTorchProfiler(dirpath=tmpdir, filename="profile")
model = boring_model_cls()
trainer = Trainer(default_root_dir=tmpdir, max_epochs=1, fast_dev_run=fast_dev_run, profiler=pytorch_profiler)
trainer.fit(model)
assert sum(e.name == "validation_step" for e in pytorch_profiler.function_events)
path = pytorch_profiler.dirpath / f"fit-{pytorch_profiler.filename}.txt"
assert path.read_text("utf-8")
if _KINETO_AVAILABLE:
files = sorted(file for file in os.listdir(tmpdir) if file.endswith(".json"))
assert any(f"fit-{pytorch_profiler.filename}" in f for f in files)
path = pytorch_profiler.dirpath / f"fit-{pytorch_profiler.filename}.txt"
assert path.read_text("utf-8") ➜ pytorch-lightning git:(master) ✗ pytest tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit --capture=no -v
=============================================================================== test session starts ===============================================================================
platform darwin -- Python 3.8.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /Users/thomas/.pyenv/versions/3.8.5/bin/python3.8
cachedir: .pytest_cache
rootdir: /Users/thomas/Documents/GitHub/pytorch-lightning, configfile: setup.cfg
collected 10 items
tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit[BoringModel-1] INFO:pytorch_lightning.utilities.distributed:GPU available: False, used: False
INFO:pytorch_lightning.utilities.distributed:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.distributed:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.distributed:Running in fast_dev_run mode: will run a full train, val, test and prediction loop using 1 batch(es).
INFO:pytorch_lightning.utilities.model_summary:
| Name | Type | Params
---------------------------------
0 | layer | Linear | 66
---------------------------------
66 Trainable params
0 Non-trainable params
66 Total params
0.000 Total estimated model params size (MB)
Epoch 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 187.85it/s, loss=2.42, v_num=]
FAILED
tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit[BoringModel-2] INFO:pytorch_lightning.utilities.distributed:GPU available: False, used: False
INFO:pytorch_lightning.utilities.distributed:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.distributed:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.distributed:Running in fast_dev_run mode: will run a full train, val, test and prediction loop using 2 batch(es).
INFO:pytorch_lightning.utilities.model_summary:
| Name | Type | Params
---------------------------------
0 | layer | Linear | 66
---------------------------------
66 Trainable params
0 Non-trainable params
66 Total params
0.000 Total estimated model params size (MB)
Epoch 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 274.74it/s, loss=1.47, v_num=]
Fatal Python error: Segmentation fault
Thread 0x000070000f681000 (most recent call first):
<no Python frame>
Current thread 0x000000010505cdc0 (most recent call first):
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/autograd/profiler.py", line 498 in __exit__
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/profiler/profiler.py", line 435 in _stop_trace
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/profiler/profiler.py", line 408 in _exit_actions
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/profiler/profiler.py", line 253 in __exit__
File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/profiler/pytorch.py", line 461 in _delete_profilers
File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/profiler/pytorch.py", line 421 in summary
File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/profiler/base.py", line 138 in describe
File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1282 in _call_teardown_hook
File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1028 in _run
File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 589 in _fit_impl
File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 511 in _call_and_handle_interrupt
File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 559 in fit
File "/Users/thomas/Documents/GitHub/pytorch-lightning/tests/profiler/test_profiler.py", line 326 in test_pytorch_profiler_trainer_fit
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/python.py", line 1641 in runtest
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 255 in <lambda>
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 311 in from_call
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 254 in call_runtest_hook
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 215 in call_and_report
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 126 in runtestprotocol
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 323 in _main
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 269 in wrap_session
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/config/__init__.py", line 162 in main
File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/config/__init__.py", line 185 in console_main
File "/Users/thomas/.pyenv/versions/3.8.5/bin/pytest", line 8 in <module>
[1] 84479 segmentation fault pytest tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit -v
/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d |
Codecov Report
@@ Coverage Diff @@
## master #9316 +/- ##
=======================================
- Coverage 92% 88% -4%
=======================================
Files 178 178
Lines 14860 14875 +15
=======================================
- Hits 13712 13117 -595
- Misses 1148 1758 +610 |
justusschock
added a commit
that referenced
this pull request
Sep 7, 2021
Co-authored-by: Carlos Mocholí <[email protected]>
justusschock
added a commit
that referenced
this pull request
Sep 7, 2021
Co-authored-by: Carlos Mocholí <[email protected]>
awaelchli
pushed a commit
that referenced
this pull request
Sep 7, 2021
Co-authored-by: Carlos Mocholí <[email protected]>
lexierule
pushed a commit
that referenced
this pull request
Sep 10, 2021
Co-authored-by: Carlos Mocholí <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #9177
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃