Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report [and proposed fix] for autoreload error caused by explicit module.super(ChildClass,self)__init__() #1670

Closed
mkarikom opened this issue Sep 13, 2022 · 3 comments
Labels

Comments

@mkarikom
Copy link
Contributor

mkarikom commented Sep 13, 2022

I noticed that autoreload fails when updating class methods in the api_overview.ipynb, and that this can be fixed by removing the explicit child class parameter in the super()__init__() call. In my case, I was debugging an updated TrainingPlan, but I suspect this will be effective for other classes as well.

I can submit a PR for this but wanted to make sure it wasn't already being addressed somehow.

  1. add a cell in api_overview.ipynb with
%load_ext autoreload
%autoreload 2
  1. instantiate and train the model
  2. introduce a bug (eg in train.TrainingPlan, inherited by the already-instantiated model) like:
    def training_step(self, batch, batch_idx, optimizer_idx=0):
        if "kl_weight" in self.loss_kwargs:
            self.loss_kwargs.update({"kl_weight": self.kl_weight})
        _, _, scvi_loss = self.forward(batch, loss_kwargs=self.loss_kwargs)
        self.log("train_loss", scvi_loss_not_defined.loss, on_epoch=True) # this will cause an error
        self.compute_and_log_metrics(scvi_loss, self.train_metrics, "train")
        return scvi_loss.loss

which tries to log an undefined key and gives the following error

/workspaces/scvi_work_pip/scvi-tools/scvi/model/base/_training_mixin.py:67: UserWarning: max_epochs=4 is less than n_epochs_kl_warmup=400. The max_kl_weight will not be reached during training.
  warnings.warn(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
Epoch 1/4:   0%|          | 0/4 [00:00<?, ?it/s]/workspaces/scvi_work_pip/scvi-tools/scvi/distributions/_negative_binomial.py:64: UserWarning: Specified kernel cache directory could not be created! This disables kernel caching. Specified directory is /home/vscode/.cache/torch/kernels. This warning will appear only once per process. (Triggered internally at  ../aten/src/ATen/native/cuda/jit_utils.cpp:860.)
  + torch.lgamma(x + theta)
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?f36d3b59-1e46-42ab-b965-5092e5ad6188)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/workspaces/scvi_work_pip/scvi-tools/docs/tutorials/notebooks/api_overview.ipynb Cell 22 in <cell line: 1>()
----> [1](vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f61752f436f6e7461696e6572732f736376695f776f726b5f7069702f776f726b73706163652e636f64652d776f726b7370616365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a227373683a2f2f7332227d7d/workspaces/scvi_work_pip/scvi-tools/docs/tutorials/notebooks/api_overview.ipynb#X30sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) model.train(max_epochs=4)

File /workspaces/scvi_work_pip/scvi-tools/scvi/model/base/_training_mixin.py:142, in UnsupervisedTrainingMixin.train(self, max_epochs, use_gpu, train_size, validation_size, batch_size, early_stopping, plan_kwargs, **trainer_kwargs)
    131 trainer_kwargs[es] = (
    132     early_stopping if es not in trainer_kwargs.keys() else trainer_kwargs[es]
    133 )
    134 runner = TrainRunner(
    135     self,
    136     training_plan=training_plan,
   (...)
    140     **trainer_kwargs,
    141 )
--> 142 return runner()

File /workspaces/scvi_work_pip/scvi-tools/scvi/train/_trainrunner.py:74, in TrainRunner.__call__(self)
     71 if hasattr(self.data_splitter, "n_val"):
     72     self.training_plan.n_obs_validation = self.data_splitter.n_val
---> 74 self.trainer.fit(self.training_plan, self.data_splitter)
     75 self._update_history()
     77 # data splitter only gets these attrs after fit

File /workspaces/scvi_work_pip/scvi-tools/scvi/train/_trainer.py:188, in Trainer.fit(self, *args, **kwargs)
...
--> 327 self.log("train_loss", scvi_loss.loss_not_defined, on_epoch=True)
    328 self.compute_and_log_metrics(scvi_loss, self.train_metrics, "train")
    329 return scvi_loss.loss

AttributeError: 'LossRecorder' object has no attribute 'loss_not_defined'
  1. now fix the bug, which would normally reload the class method but instead causes the following error:
/workspaces/scvi_work_pip/scvi-tools/scvi/model/base/_training_mixin.py:67: UserWarning: max_epochs=4 is less than n_epochs_kl_warmup=400. The max_kl_weight will not be reached during training.
  warnings.warn(
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?343d0b6e-d72c-481c-a5f0-4218a06efaca)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/workspaces/scvi_work_pip/scvi-tools/docs/tutorials/notebooks/api_overview.ipynb Cell 22 in <cell line: 1>()
----> [1](vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f61752f436f6e7461696e6572732f736376695f776f726b5f7069702f776f726b73706163652e636f64652d776f726b7370616365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a227373683a2f2f7332227d7d/workspaces/scvi_work_pip/scvi-tools/docs/tutorials/notebooks/api_overview.ipynb#X30sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) model.train(max_epochs=4)

File /workspaces/scvi_work_pip/scvi-tools/scvi/model/base/_training_mixin.py:128, in UnsupervisedTrainingMixin.train(self, max_epochs, use_gpu, train_size, validation_size, batch_size, early_stopping, plan_kwargs, **trainer_kwargs)
    119 _check_warmup(plan_kwargs, max_epochs, n_cells, batch_size)
    121 data_splitter = DataSplitter(
    122     self.adata_manager,
    123     train_size=train_size,
   (...)
    126     use_gpu=use_gpu,
    127 )
--> 128 training_plan = TrainingPlan(self.module, **plan_kwargs)
    130 es = "early_stopping"
    131 trainer_kwargs[es] = (
    132     early_stopping if es not in trainer_kwargs.keys() else trainer_kwargs[es]
    133 )

File /workspaces/scvi_work_pip/scvi-tools/scvi/train/_trainingplans.py:154, in TrainingPlan.__init__(self, module, lr, weight_decay, eps, optimizer, n_steps_kl_warmup, n_epochs_kl_warmup, reduce_lr_on_plateau, lr_factor, lr_patience, lr_threshold, lr_scheduler_metric, lr_min, max_kl_weight, min_kl_weight, **loss_kwargs)
    133 def __init__(
    134     self,
    135     module: BaseModuleClass,
   (...)
    152     **loss_kwargs,
...
--> 154     super(TrainingPlan, self).__init__()
    155     self.module = module
    156     self.lr = lr

TypeError: super(type, obj): obj must be an instance or subtype of type
  1. update train/_trainingplans.py:154 so that train.TrainingPlan.__init__() calls init() on the parent implicitly like:
        super().__init__()
  1. without re-instantiating model, re-run model.train(max_epochs=4), and notice that our fix in (4) is reloaded successfully:
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
Epoch 4/4: 100%|██████████| 4/4 [00:14<00:00,  3.64s/it, loss=297, v_num=1]

Versions:

'0.17.2'

@mkarikom mkarikom added the bug label Sep 13, 2022
@mkarikom mkarikom changed the title bug and possible fix for autoreload error caused by explicit module.super(ChildClass,self)__init__() report [and proposed fix] for autoreload error caused by explicit module.super(ChildClass,self)__init__() Sep 13, 2022
@adamgayoso
Copy link
Member

I'm not sure I understand why this is the fix, but I do agree we should use super().__init__() everywhere. A PR changing all super calls codebase-wide with just super() would be appreciated.

@mkarikom
Copy link
Contributor Author

mkarikom commented Sep 13, 2022

done, please see #1671 and #76

@adamgayoso
Copy link
Member

This was closed by #1671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants