Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix] Resolve PyTorch Profiling for Manual Optimization #9316

Merged
merged 15 commits into from
Sep 6, 2021

Conversation

tchaton
Copy link
Contributor

@tchaton tchaton commented Sep 3, 2021

What does this PR do?

Fixes #9177

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@tchaton tchaton self-assigned this Sep 3, 2021
@tchaton tchaton added the bug Something isn't working label Sep 3, 2021
@tchaton tchaton marked this pull request as ready for review September 3, 2021 18:10
@tchaton tchaton changed the title progress [bugfix] Resolve PyTorch Profiling for Manual Optimization Sep 3, 2021
@mergify mergify bot added the has conflicts label Sep 3, 2021
@mergify mergify bot removed the has conflicts label Sep 3, 2021
@tchaton tchaton enabled auto-merge (squash) September 3, 2021 19:01
pl_examples/basic_examples/profiler_example.py Outdated Show resolved Hide resolved
pytorch_lightning/profiler/pytorch.py Outdated Show resolved Hide resolved
@mergify mergify bot added the ready PRs ready to be merged label Sep 4, 2021
@awaelchli
Copy link
Contributor

I copied the tests to master and they both pass. Can you check the fix / repro again?

@carmocca carmocca disabled auto-merge September 5, 2021 23:42
@tchaton
Copy link
Contributor Author

tchaton commented Sep 6, 2021

I copied the tests to master and they both pass. Can you check the fix / repro again?

Thanks @awaelchli. My test wasn't good enough.

On master:

class ManualOptimBoringModel(BoringModel):
    def __init__(self):
        super().__init__()
        self.automatic_optimization = False

    def training_step(self, batch, batch_idx):
        opt = self.optimizers()
        output = self(batch)
        loss = self.loss(batch, output)
        opt.zero_grad()
        self.manual_backward(loss)
        opt.step()
        return loss

@pytest.mark.parametrize("fast_dev_run", [1, 2, 3, 4, 5])
@pytest.mark.parametrize("boring_model_cls", [BoringModel, ManualOptimBoringModel])
def test_pytorch_profiler_trainer_fit(fast_dev_run, boring_model_cls, tmpdir):
    """Ensure that the profiler can be given to the trainer and test step are properly recorded."""
    pytorch_profiler = PyTorchProfiler(dirpath=tmpdir, filename="profile")
    model = boring_model_cls()
    trainer = Trainer(default_root_dir=tmpdir, max_epochs=1, fast_dev_run=fast_dev_run, profiler=pytorch_profiler)
    trainer.fit(model)

    assert sum(e.name == "validation_step" for e in pytorch_profiler.function_events)

    path = pytorch_profiler.dirpath / f"fit-{pytorch_profiler.filename}.txt"
    assert path.read_text("utf-8")

    if _KINETO_AVAILABLE:
        files = sorted(file for file in os.listdir(tmpdir) if file.endswith(".json"))
        assert any(f"fit-{pytorch_profiler.filename}" in f for f in files)
        path = pytorch_profiler.dirpath / f"fit-{pytorch_profiler.filename}.txt"
        assert path.read_text("utf-8")
pytorch-lightning git:(master) ✗ pytest tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit --capture=no -v
=============================================================================== test session starts ===============================================================================
platform darwin -- Python 3.8.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /Users/thomas/.pyenv/versions/3.8.5/bin/python3.8
cachedir: .pytest_cache
rootdir: /Users/thomas/Documents/GitHub/pytorch-lightning, configfile: setup.cfg
collected 10 items                                                                                                                                                                

tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit[BoringModel-1] INFO:pytorch_lightning.utilities.distributed:GPU available: False, used: False
INFO:pytorch_lightning.utilities.distributed:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.distributed:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.distributed:Running in fast_dev_run mode: will run a full train, val, test and prediction loop using 1 batch(es).
INFO:pytorch_lightning.utilities.model_summary:
  | Name  | Type   | Params
---------------------------------
0 | layer | Linear | 66    
---------------------------------
66        Trainable params
0         Non-trainable params
66        Total params
0.000     Total estimated model params size (MB)
Epoch 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 187.85it/s, loss=2.42, v_num=]
FAILED                                                                                                                                                                             
tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit[BoringModel-2] INFO:pytorch_lightning.utilities.distributed:GPU available: False, used: False
INFO:pytorch_lightning.utilities.distributed:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.distributed:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.distributed:Running in fast_dev_run mode: will run a full train, val, test and prediction loop using 2 batch(es).
INFO:pytorch_lightning.utilities.model_summary:
  | Name  | Type   | Params
---------------------------------
0 | layer | Linear | 66    
---------------------------------
66        Trainable params
0         Non-trainable params
66        Total params
0.000     Total estimated model params size (MB)
Epoch 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 274.74it/s, loss=1.47, v_num=]
Fatal Python error: Segmentation fault                                                                                                                                             

Thread 0x000070000f681000 (most recent call first):
<no Python frame>

Current thread 0x000000010505cdc0 (most recent call first):
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/autograd/profiler.py", line 498 in __exit__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/profiler/profiler.py", line 435 in _stop_trace
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/profiler/profiler.py", line 408 in _exit_actions
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/profiler/profiler.py", line 253 in __exit__
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/profiler/pytorch.py", line 461 in _delete_profilers
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/profiler/pytorch.py", line 421 in summary
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/profiler/base.py", line 138 in describe
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1282 in _call_teardown_hook
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1028 in _run
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 589 in _fit_impl
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 511 in _call_and_handle_interrupt
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 559 in fit
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/tests/profiler/test_profiler.py", line 326 in test_pytorch_profiler_trainer_fit
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/python.py", line 1641 in runtest
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 255 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 311 in from_call
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 254 in call_runtest_hook
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 215 in call_and_report
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 126 in runtestprotocol
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 323 in _main
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 269 in wrap_session
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/config/__init__.py", line 162 in main
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/config/__init__.py", line 185 in console_main
  File "/Users/thomas/.pyenv/versions/3.8.5/bin/pytest", line 8 in <module>
[1]    84479 segmentation fault  pytest tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit  -v
/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d 

@mergify mergify bot added the has conflicts label Sep 6, 2021
@codecov
Copy link

codecov bot commented Sep 6, 2021

Codecov Report

Merging #9316 (02517fd) into master (904dde7) will decrease coverage by 4%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #9316    +/-   ##
=======================================
- Coverage      92%     88%    -4%     
=======================================
  Files         178     178            
  Lines       14860   14875    +15     
=======================================
- Hits        13712   13117   -595     
- Misses       1148    1758   +610     

@mergify mergify bot removed the has conflicts label Sep 6, 2021
@tchaton tchaton enabled auto-merge (squash) September 6, 2021 10:18
@tchaton tchaton merged commit 9149b64 into master Sep 6, 2021
@tchaton tchaton deleted the profiler_manual branch September 6, 2021 10:45
justusschock added a commit that referenced this pull request Sep 7, 2021
justusschock added a commit that referenced this pull request Sep 7, 2021
awaelchli pushed a commit that referenced this pull request Sep 7, 2021
lexierule pushed a commit that referenced this pull request Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

automatic_optimization = False breaks pytorch profiler
4 participants