-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only track dev debugger events if enabled #7875
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7875 +/- ##
======================================
- Coverage 93% 92% -0%
======================================
Files 202 202
Lines 13123 13125 +2
======================================
- Hits 12156 12120 -36
- Misses 967 1005 +38 |
@@ -51,6 +51,7 @@ def __init__(self, trainer): | |||
self.test_dataloader_calls = [] | |||
self.dataloader_sequence_calls = [] | |||
|
|||
@enabled_only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this debugger used only for tests? should we look at removing this dependency and rebuilding event tracking later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not even. It was used in the past for debugging, but we found out it was risky to rely on it for testing.
I am not sure of its utility right now and could be deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all the usages of the dev debugger in tests.
tests/callbacks/test_early_stopping.py:111: assert len(trainer.dev_debugger.early_stopping_history) == expected_count
tests/callbacks/test_stochastic_weight_avg.py:100: assert trainer.dev_debugger.count_events(
tests/plugins/test_rpc_sequential_plugin.py:46: assert len(trainer.dev_debugger.pbar_added_metrics) > 0
tests/plugins/test_rpc_sequential_plugin.py:91: assert len(trainer.dev_debugger.pbar_added_metrics) > 0
tests/models/test_amp.py:216: assert trainer.dev_debugger.count_events('AMP') == 0
tests/models/test_amp.py:249: assert trainer.dev_debugger.count_events('AMP') == 10
tests/checkpointing/test_model_checkpoint.py:143: lr_scheduler_debug = trainer.dev_debugger.saved_lr_scheduler_updates
tests/checkpointing/test_model_checkpoint.py:249: lr_scheduler_debug = trainer.dev_debugger.saved_lr_scheduler_updates
tests/checkpointing/test_model_checkpoint.py:867: assert len(trainer.dev_debugger.checkpoint_callback_history) == 3
tests/checkpointing/test_model_checkpoint.py:954: assert trainer.dev_debugger.checkpoint_callback_history[-1]['epoch'] == len(monitor) - 1
tests/checkpointing/test_checkpoint_callback_frequency.py:38: assert len(trainer.dev_debugger.checkpoint_callback_history) == 0
tests/checkpointing/test_checkpoint_callback_frequency.py:47: assert len(trainer.dev_debugger.checkpoint_callback_history) == 0
tests/trainer/test_dataloaders.py:527: assert len(trainer.dev_debugger.num_seen_val_check_batches) == num_val_dataloaders
tests/trainer/test_dataloaders.py:528: for dataloader_idx, num_batches in trainer.dev_debugger.num_seen_val_check_batches.items():
tests/trainer/test_dataloaders.py:532: assert len(trainer.dev_debugger.num_seen_test_check_batches) == num_test_dataloaders
tests/trainer/test_dataloaders.py:533: for dataloader_idx, num_batches in trainer.dev_debugger.num_seen_test_check_batches.items():
tests/trainer/test_dataloaders.py:594: assert trainer.dev_debugger.num_seen_sanity_check_batches == trainer.num_sanity_val_steps * num_val_dataloaders
tests/trainer/test_dataloaders.py:1282: assert len(trainer.dev_debugger.val_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1283: assert len(trainer.dev_debugger.test_dataloader_calls) == 0
tests/trainer/test_dataloaders.py:1284: assert len(trainer.dev_debugger.train_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1287: calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/test_dataloaders.py:1315: assert len(trainer.dev_debugger.val_dataloader_calls) == 10
tests/trainer/test_dataloaders.py:1316: assert len(trainer.dev_debugger.test_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1317: assert len(trainer.dev_debugger.train_dataloader_calls) == 3
tests/trainer/test_dataloaders.py:1320: calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/test_dataloaders.py:1357: assert len(trainer.dev_debugger.val_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1358: assert len(trainer.dev_debugger.test_dataloader_calls) == 0
tests/trainer/test_dataloaders.py:1359: assert len(trainer.dev_debugger.train_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1362: calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/test_dataloaders.py:1389: assert len(trainer.dev_debugger.val_dataloader_calls) == 4
tests/trainer/test_dataloaders.py:1390: assert len(trainer.dev_debugger.train_dataloader_calls) == 3
tests/trainer/test_dataloaders.py:1391: assert len(trainer.dev_debugger.test_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1394: calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/test_dataloaders.py:1438: assert len(trainer.dev_debugger.val_dataloader_calls) == 4
tests/trainer/test_dataloaders.py:1439: assert len(trainer.dev_debugger.train_dataloader_calls) == 3
tests/trainer/test_dataloaders.py:1440: assert len(trainer.dev_debugger.test_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1443: calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/test_dataloaders.py:1490: assert len(trainer.dev_debugger.val_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1491: assert len(trainer.dev_debugger.test_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1492: assert len(trainer.dev_debugger.train_dataloader_calls) == 1
tests/trainer/test_dataloaders.py:1495: calls = trainer.dev_debugger.dataloader_sequence_calls
tests/trainer/flags/test_fast_dev_run.py:102: assert len(trainer.dev_debugger.checkpoint_callback_history) == 0
tests/trainer/flags/test_fast_dev_run.py:106: assert len(trainer.dev_debugger.early_stopping_history) == 0
tests/trainer/optimization/test_manual_optimization.py:98: assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:130: assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:161: assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:187: assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:225: assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:502: assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * num_manual_backward_calls
tests/trainer/optimization/test_manual_optimization.py:577: assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * 2
tests/trainer/optimization/test_manual_optimization.py:636: assert trainer.dev_debugger.count_events('backward_call') == limit_train_batches * 2
Most could be replaced with progress tracking. Others could have the test changed to track manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, dev debugger should not exist at all. we want to rely on the existing built in, well tested unit testing framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
What does this PR do?
Noticed memory was building up from this while benchmarking the logger connector changes
Before submitting
PR review