Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Fix 'NoneType' Error on jupyter notebooks #3337

Merged
merged 2 commits into from
Feb 2, 2021

Conversation

tczhangzhi
Copy link
Contributor

Some objects don't return anything for getmodule(), e.g., jupyter notebook.
While I am using it on jupyter notebooks, the code throws an Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-720eba80f94f> in <module>()
     87 
     88 @register_trainer
---> 89 class Trainer(BaseTrainer):
     90     def __init__(self, model,
     91                  train_dataloader, val_dataloader, loss_fn, optimizer, device=torch.device('cpu'), trainer_kwargs={'max_epochs': 10}):

~/anaconda3/lib/python3.7/site-packages/nni/retiarii/utils.py in register_trainer(cls)
    117     """
    118     frm = inspect.stack()[1]
--> 119     module_name = inspect.getmodule(frm[0]).__name__
    120     return _blackbox_cls(cls, module_name, 'full')
    121 

AttributeError: 'NoneType' object has no attribute '__name__'

In this case, we can use getsourcefile to get the name of the current cell in the jupyter notebook, for example, <ipython-input-14-f75f6173b0d5> means the 14th cell and the subsequent programs can run normally.

Some objects don't return anything for `getmodule()`, e.g., jupyter notebook.
While I am using it on jupyter notebooks, the code throws an Error:
```
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-720eba80f94f> in <module>()
     87 
     88 @register_trainer
---> 89 class Trainer(BaseTrainer):
     90     def __init__(self, model,
     91                  train_dataloader, val_dataloader, loss_fn, optimizer, device=torch.device('cpu'), trainer_kwargs={'max_epochs': 10}):

~/anaconda3/lib/python3.7/site-packages/nni/retiarii/utils.py in register_trainer(cls)
    117     """
    118     frm = inspect.stack()[1]
--> 119     module_name = inspect.getmodule(frm[0]).__name__
    120     return _blackbox_cls(cls, module_name, 'full')
    121 

AttributeError: 'NoneType' object has no attribute '__name__'
```
@tczhangzhi
Copy link
Contributor Author

I found that this problem is related to the framework design. The core of the problem is that we have to generate different codes according to the user's settings and manage their operation. To solve the problem of "how to find the location of custom modules (including custom blackbox_module, Trainer, etc.) in the generated code", I made a lot of changes, but I am not sure whether they can cover many scenarios, so I don’t submit and respect the choices of existing developers.

Now, NNI's codes depend on two main ideas, i.e., hard-code definition and dynamic inspect. They are not robust, but it is indeed a simple and clear method. In this situation, we need users to create a new module when they need to custom new blackbox_module, Trainer, etc. (not able to finish the definition process in a single file). For example, if we try to custom a layer as follows:

import random

import nni.retiarii.nn.pytorch as nn
import torch.nn.functional as F
from nni.retiarii.utils import blackbox_module
from nni.retiarii.experiment import RetiariiExeConfig, RetiariiExperiment
from nni.retiarii.strategies import RandomStrategy
from nni.retiarii.trainer import PyTorchImageClassificationTrainer

# from layer.sequential import LayerChoiceSequential

import nni.retiarii.nn.pytorch as nn
from nni.retiarii.utils import blackbox_module

@blackbox_module
class LayerChoiceSequential(nn.Module):
    def __init__(self, m):
        super().__init__()
        self.m = nn.Sequential(m)

    def forward(self, x):
        return self.m(x)
        
class Net(nn.Module):
    def __init__(self, hidden_size):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.LayerChoice([
            LayerChoiceSequential(nn.Linear(4*4*50, hidden_size)),
            nn.Linear(4*4*50, hidden_size, bias=False)
        ])
        self.fc2 = nn.Linear(hidden_size, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


if __name__ == '__main__':
    base_model = Net(128)
    trainer = PyTorchImageClassificationTrainer(base_model, dataset_cls="MNIST",
                                                dataset_kwargs={"root": "data/mnist", "download": True},
                                                dataloader_kwargs={"batch_size": 32},
                                                optimizer_kwargs={"lr": 1e-3},
                                                trainer_kwargs={"max_epochs": 1})

    simple_startegy = RandomStrategy()

    exp = RetiariiExperiment(base_model, trainer, [], simple_startegy)

    exp_config = RetiariiExeConfig('local')
    exp_config.experiment_name = 'mnist_search'
    exp_config.trial_concurrency = 2
    exp_config.max_trial_number = 10
    exp_config.training_service.use_active_gpu = False

    exp.run(exp_config, 7081 + random.randint(0, 100))

the generated code would be:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

=== look here ===
import __main__
=== look here ===
import nni


class _model(nn.Module):
    def __init__(self):
        super().__init__()
        self.__conv1 = nni.retiarii.nn.pytorch.nn.Conv2d(in_channels=1, out_channels=20, kernel_size=5, stride=1)
        self.__conv2 = nni.retiarii.nn.pytorch.nn.Conv2d(in_channels=20, out_channels=50, kernel_size=5, stride=1)
        === look here ===
        self.__fc1 = __main__.LayerChoiceSequential()
        === look here ===
        self.__fc2 = nni.retiarii.nn.pytorch.nn.Linear(in_features=128, out_features=10)
...

Obviously, it did not meet our expectations. But we can create a new module and folder for the custom layer, so that it can be generated normally. For example, in the main.py (a jupyter notebook) we can do:

import random

import nni.retiarii.nn.pytorch as nn
import torch.nn.functional as F
from nni.retiarii.utils import blackbox_module
from nni.retiarii.experiment import RetiariiExeConfig, RetiariiExperiment
from nni.retiarii.strategies import RandomStrategy
from nni.retiarii.trainer import PyTorchImageClassificationTrainer

from layer.sequential import LayerChoiceSequential

class Net(nn.Module):
    def __init__(self, hidden_size):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.LayerChoice([
            LayerChoiceSequential(nn.Linear(4*4*50, hidden_size)),
            nn.Linear(4*4*50, hidden_size, bias=False)
        ])
        self.fc2 = nn.Linear(hidden_size, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


if __name__ == '__main__':
    base_model = Net(128)
    trainer = PyTorchImageClassificationTrainer(base_model, dataset_cls="MNIST",
                                                dataset_kwargs={"root": "data/mnist", "download": True},
                                                dataloader_kwargs={"batch_size": 32},
                                                optimizer_kwargs={"lr": 1e-3},
                                                trainer_kwargs={"max_epochs": 1})

    simple_startegy = RandomStrategy()

    exp = RetiariiExperiment(base_model, trainer, [], simple_startegy)

    exp_config = RetiariiExeConfig('local')
    exp_config.experiment_name = 'mnist_search'
    exp_config.trial_concurrency = 2
    exp_config.max_trial_number = 10
    exp_config.training_service.use_active_gpu = False

    exp.run(exp_config, 7081 + random.randint(0, 100))

and finish the definition in layer/sequential.py:

import nni.retiarii.nn.pytorch as nn
from nni.retiarii.utils import blackbox_module

@blackbox_module
class LayerChoiceSequential(nn.Module):
    def __init__(self, m):
        super().__init__()
        self.m = nn.Sequential(m)

    def forward(self, x):
        return self.m(x)

In this way, it will be all good.

@tczhangzhi
Copy link
Contributor Author

In a conclusion, I believe that the existing design makes some sense, but it is not very robust and users need to follow a template project architecture. I will close this PR, if you have other questions, feel free to reopen it.

@tczhangzhi tczhangzhi closed this Jan 27, 2021
@QuanluZhang QuanluZhang reopened this Jan 28, 2021
@QuanluZhang
Copy link
Contributor

@tczhangzhi thanks a lot for reporting this issue and kindly providing your solution:).

First, by design we allow blackbox_module class to be put in the "main" python file. thanks for reporting the bug.
Second, in v2.0 non-serializable input argument of blackbox_module class is not supported. various input argument types will be supported in v2.1.

for your example, you can make the following two changes:
(1) update the "utils.py" as follows:

module_name = inspect.getmodule(frm[0]).__name__
if module_name == '__main__':
    main_file_path = Path(inspect.getsourcefile(frm[0]))
    if main_file_path.parents[0] != Path('.'):
        raise RuntimeError(f'you are using "{main_file_path}" to launch your experiment, '
                           f'please launch the experiment under the directory where "{main_file_path.name}" is located.')
    module_name = main_file_path.stem

we should handle blackbox_module in the "main" python file specially.
(2) modify your script as follows:

# definition
@blackbox_module
class LayerChoiceSequential(nn.Module):
    def __init__(self, in_proj, out_proj):
        super().__init__()
        self.m = nn.Sequential(nn.Linear(in_proj, out_proj))

    def forward(self, x):
        return self.m(x)

# instantiation
self.fc1 = nn.LayerChoice([
            LayerChoiceSequential(4*4*50, hidden_size)
            nn.Linear(4*4*50, hidden_size, bias=False)
])

encourage you to update your pr accordingly, or you have better solutions feel free to tell us:)

@tczhangzhi
Copy link
Contributor Author

@tczhangzhi thanks a lot for reporting this issue and kindly providing your solution:).

First, by design we allow blackbox_module class to be put in the "main" python file. thanks for reporting the bug.
Second, in v2.0 non-serializable input argument of blackbox_module class is not supported. various input argument types will be supported in v2.1.

for your example, you can make the following two changes:
(1) update the "utils.py" as follows:

module_name = inspect.getmodule(frm[0]).__name__
if module_name == '__main__':
    main_file_path = Path(inspect.getsourcefile(frm[0]))
    if main_file_path.parents[0] != Path('.'):
        raise RuntimeError(f'you are using "{main_file_path}" to launch your experiment, '
                           f'please launch the experiment under the directory where "{main_file_path.name}" is located.')
    module_name = main_file_path.stem

we should handle blackbox_module in the "main" python file specially.
(2) modify your script as follows:

# definition
@blackbox_module
class LayerChoiceSequential(nn.Module):
    def __init__(self, in_proj, out_proj):
        super().__init__()
        self.m = nn.Sequential(nn.Linear(in_proj, out_proj))

    def forward(self, x):
        return self.m(x)

# instantiation
self.fc1 = nn.LayerChoice([
            LayerChoiceSequential(4*4*50, hidden_size)
            nn.Linear(4*4*50, hidden_size, bias=False)
])

encourage you to update your pr accordingly, or you have better solutions feel free to tell us:)

Thanks for your fast and detailed reply and producing a warning for users is a wise choice. I totally agree with it and look forward to a better NNI.

@tczhangzhi
Copy link
Contributor Author

Many thanks to @QuanluZhang, I believe his idea may be the best practice to solve the location problem of the custom module.
There are two types of problem:

  1. module_name is set to __main__ while defining blackbox_module in the entry python file. And @QuanluZhang has worked it out.
  2. inspect.getmodule(frm[0]) is set to None while defining blackbox_module or trainer in the .ipynb file. It is easy to understand, cause inspect cannot attach an exact file_name IPython environment on the fly. Questions are the same when users try to use string to define modules and instance them with ModuleType and exec. However, as @QuanluZhang said, producing a warning for users is a wise choice. So we follow his suggestion and produce a user-friendly Error to make users understand what he should do.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants