[bugfix] Resolve PyTorch Profiling for Manual Optimization #9316

tchaton · 2021-09-03T17:57:13Z

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

for more information, see https://pre-commit.ci

…/pytorch-lightning into profiler_manual

pl_examples/basic_examples/profiler_example.py

pytorch_lightning/profiler/pytorch.py

Co-authored-by: Carlos Mocholí <[email protected]>

tests/profiler/test_profiler.py

awaelchli · 2021-09-04T23:05:00Z

I copied the tests to master and they both pass. Can you check the fix / repro again?

tchaton · 2021-09-06T09:33:40Z

I copied the tests to master and they both pass. Can you check the fix / repro again?

Thanks @awaelchli. My test wasn't good enough.

On master:

class ManualOptimBoringModel(BoringModel):
    def __init__(self):
        super().__init__()
        self.automatic_optimization = False

    def training_step(self, batch, batch_idx):
        opt = self.optimizers()
        output = self(batch)
        loss = self.loss(batch, output)
        opt.zero_grad()
        self.manual_backward(loss)
        opt.step()
        return loss

@pytest.mark.parametrize("fast_dev_run", [1, 2, 3, 4, 5])
@pytest.mark.parametrize("boring_model_cls", [BoringModel, ManualOptimBoringModel])
def test_pytorch_profiler_trainer_fit(fast_dev_run, boring_model_cls, tmpdir):
    """Ensure that the profiler can be given to the trainer and test step are properly recorded."""
    pytorch_profiler = PyTorchProfiler(dirpath=tmpdir, filename="profile")
    model = boring_model_cls()
    trainer = Trainer(default_root_dir=tmpdir, max_epochs=1, fast_dev_run=fast_dev_run, profiler=pytorch_profiler)
    trainer.fit(model)

    assert sum(e.name == "validation_step" for e in pytorch_profiler.function_events)

    path = pytorch_profiler.dirpath / f"fit-{pytorch_profiler.filename}.txt"
    assert path.read_text("utf-8")

    if _KINETO_AVAILABLE:
        files = sorted(file for file in os.listdir(tmpdir) if file.endswith(".json"))
        assert any(f"fit-{pytorch_profiler.filename}" in f for f in files)
        path = pytorch_profiler.dirpath / f"fit-{pytorch_profiler.filename}.txt"
        assert path.read_text("utf-8")

➜  pytorch-lightning git:(master) ✗ pytest tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit --capture=no -v
=============================================================================== test session starts ===============================================================================
platform darwin -- Python 3.8.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /Users/thomas/.pyenv/versions/3.8.5/bin/python3.8
cachedir: .pytest_cache
rootdir: /Users/thomas/Documents/GitHub/pytorch-lightning, configfile: setup.cfg
collected 10 items                                                                                                                                                                

tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit[BoringModel-1] INFO:pytorch_lightning.utilities.distributed:GPU available: False, used: False
INFO:pytorch_lightning.utilities.distributed:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.distributed:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.distributed:Running in fast_dev_run mode: will run a full train, val, test and prediction loop using 1 batch(es).
INFO:pytorch_lightning.utilities.model_summary:
  | Name  | Type   | Params
---------------------------------
0 | layer | Linear | 66    
---------------------------------
66        Trainable params
0         Non-trainable params
66        Total params
0.000     Total estimated model params size (MB)
Epoch 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 187.85it/s, loss=2.42, v_num=]
FAILED                                                                                                                                                                             
tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit[BoringModel-2] INFO:pytorch_lightning.utilities.distributed:GPU available: False, used: False
INFO:pytorch_lightning.utilities.distributed:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.distributed:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.distributed:Running in fast_dev_run mode: will run a full train, val, test and prediction loop using 2 batch(es).
INFO:pytorch_lightning.utilities.model_summary:
  | Name  | Type   | Params
---------------------------------
0 | layer | Linear | 66    
---------------------------------
66        Trainable params
0         Non-trainable params
66        Total params
0.000     Total estimated model params size (MB)
Epoch 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 274.74it/s, loss=1.47, v_num=]
Fatal Python error: Segmentation fault                                                                                                                                             

Thread 0x000070000f681000 (most recent call first):
<no Python frame>

Current thread 0x000000010505cdc0 (most recent call first):
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/autograd/profiler.py", line 498 in __exit__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/profiler/profiler.py", line 435 in _stop_trace
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/profiler/profiler.py", line 408 in _exit_actions
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torch/profiler/profiler.py", line 253 in __exit__
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/profiler/pytorch.py", line 461 in _delete_profilers
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/profiler/pytorch.py", line 421 in summary
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/profiler/base.py", line 138 in describe
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1282 in _call_teardown_hook
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1028 in _run
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 589 in _fit_impl
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 511 in _call_and_handle_interrupt
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 559 in fit
  File "/Users/thomas/Documents/GitHub/pytorch-lightning/tests/profiler/test_profiler.py", line 326 in test_pytorch_profiler_trainer_fit
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/python.py", line 183 in pytest_pyfunc_call
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/python.py", line 1641 in runtest
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 162 in pytest_runtest_call
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 255 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 311 in from_call
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 254 in call_runtest_hook
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 215 in call_and_report
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 126 in runtestprotocol
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/runner.py", line 109 in pytest_runtest_protocol
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 348 in pytest_runtestloop
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 323 in _main
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 269 in wrap_session
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/main.py", line 316 in pytest_cmdline_main
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/config/__init__.py", line 162 in main
  File "/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/site-packages/_pytest/config/__init__.py", line 185 in console_main
  File "/Users/thomas/.pyenv/versions/3.8.5/bin/pytest", line 8 in <module>
[1]    84479 segmentation fault  pytest tests/profiler/test_profiler.py::test_pytorch_profiler_trainer_fit  -v
/Users/thomas/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d

codecov · 2021-09-06T09:58:50Z

Codecov Report

Merging #9316 (02517fd) into master (904dde7) will decrease coverage by 4%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #9316    +/-   ##
=======================================
- Coverage      92%     88%    -4%     
=======================================
  Files         178     178            
  Lines       14860   14875    +15     
=======================================
- Hits        13712   13117   -595     
- Misses       1148    1758   +610

…/pytorch-lightning into profiler_manual

Co-authored-by: Carlos Mocholí <[email protected]>

tchaton and others added 3 commits September 3, 2021 13:55

progress

f55bd0d

[pre-commit.ci] auto fixes from pre-commit.com hooks

5b3b2d0

for more information, see https://pre-commit.ci

update

27b7bd7

tchaton self-assigned this Sep 3, 2021

tchaton added the bug Something isn't working label Sep 3, 2021

tchaton marked this pull request as ready for review September 3, 2021 18:10

tchaton requested review from awaelchli, Borda, carmocca, justusschock, kaushikb11, SeanNaren and williamFalcon as code owners September 3, 2021 18:10

tchaton changed the title ~~progress~~ [bugfix] Resolve PyTorch Profiling for Manual Optimization Sep 3, 2021

tchaton added 4 commits September 3, 2021 19:12

Merge branch 'profiler_manual' of https://github.com/PyTorchLightning…

0444fd5

…/pytorch-lightning into profiler_manual

update

af2cc4c

update

730ce5a

add manual optimizaiton

7204b7f

mergify bot added the has conflicts label Sep 3, 2021

Merge branch 'master' into profiler_manual

d42bc02

mergify bot removed the has conflicts label Sep 3, 2021

tchaton enabled auto-merge (squash) September 3, 2021 19:01

carmocca approved these changes Sep 4, 2021

View reviewed changes

pl_examples/basic_examples/profiler_example.py Outdated Show resolved Hide resolved

pytorch_lightning/profiler/pytorch.py Outdated Show resolved Hide resolved

Update pl_examples/basic_examples/profiler_example.py

0ab7ac7

Co-authored-by: Carlos Mocholí <[email protected]>

awaelchli approved these changes Sep 4, 2021

View reviewed changes

mergify bot added the ready PRs ready to be merged label Sep 4, 2021

awaelchli reviewed Sep 4, 2021

View reviewed changes

tests/profiler/test_profiler.py Outdated Show resolved Hide resolved

carmocca disabled auto-merge September 5, 2021 23:42

Borda approved these changes Sep 6, 2021

View reviewed changes

update

bc0aaac

mergify bot added the has conflicts label Sep 6, 2021

tchaton added 2 commits September 6, 2021 11:04

update

25d50d2

Merge branch 'master' into profiler_manual

69a6ebf

mergify bot removed the has conflicts label Sep 6, 2021

tchaton added 3 commits September 6, 2021 11:06

remove -1

3aba0b7

Merge branch 'profiler_manual' of https://github.com/PyTorchLightning…

e464663

…/pytorch-lightning into profiler_manual

update

02517fd

tchaton enabled auto-merge (squash) September 6, 2021 10:18

tchaton merged commit 9149b64 into master Sep 6, 2021

tchaton deleted the profiler_manual branch September 6, 2021 10:45

justusschock added a commit that referenced this pull request Sep 7, 2021

[bugfix] Resolve PyTorch Profiling for Manual Optimization (#9316)

6c27347

Co-authored-by: Carlos Mocholí <[email protected]>

justusschock added a commit that referenced this pull request Sep 7, 2021

[bugfix] Resolve PyTorch Profiling for Manual Optimization (#9316)

6ed21cb

Co-authored-by: Carlos Mocholí <[email protected]>

awaelchli pushed a commit that referenced this pull request Sep 7, 2021

[bugfix] Resolve PyTorch Profiling for Manual Optimization (#9316)

130bc06

Co-authored-by: Carlos Mocholí <[email protected]>

lexierule pushed a commit that referenced this pull request Sep 10, 2021

[bugfix] Resolve PyTorch Profiling for Manual Optimization (#9316)

a5ad966

Co-authored-by: Carlos Mocholí <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] Resolve PyTorch Profiling for Manual Optimization #9316

[bugfix] Resolve PyTorch Profiling for Manual Optimization #9316

tchaton commented Sep 3, 2021 •

edited

Loading

awaelchli commented Sep 4, 2021

tchaton commented Sep 6, 2021 •

edited

Loading

codecov bot commented Sep 6, 2021 •

edited

Loading

[bugfix] Resolve PyTorch Profiling for Manual Optimization #9316

[bugfix] Resolve PyTorch Profiling for Manual Optimization #9316

Conversation

tchaton commented Sep 3, 2021 • edited Loading

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Did you have fun?

awaelchli commented Sep 4, 2021

tchaton commented Sep 6, 2021 • edited Loading

codecov bot commented Sep 6, 2021 • edited Loading

Codecov Report

tchaton commented Sep 3, 2021 •

edited

Loading

tchaton commented Sep 6, 2021 •

edited

Loading

codecov bot commented Sep 6, 2021 •

edited

Loading