Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not detecting the custom object #36

Closed
prasad01dalavi opened this issue May 13, 2020 · 9 comments
Closed

Not detecting the custom object #36

prasad01dalavi opened this issue May 13, 2020 · 9 comments

Comments

@prasad01dalavi
Copy link

have trained the model on about 126 images, 30% images contain two objects in a image

from detecto import core, utils, visualize


dataset = core.Dataset('../custom_dataset/')
model = core.Model(['stamp'])

model.fit(dataset)
model.save('model_weights_3.pth')
print(f'[INFO] Model Saved successfully!')
image = utils.read_image('../custom_dataset/page-15.jpg')
predictions = model.predict(image)
print(predictions)

Output:

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
[INFO] Model Saved successfully!
([], tensor([], size=(0, 4)), tensor([]))
@alankbi
Copy link
Owner

alankbi commented May 14, 2020

It looks like the model was unable to detect any stamps in that image. It could be that the image you used was very difficult for it to predict on, or possibly your model has poor accuracy even after training. You can check how well your model is doing by passing it a validation dataset and setting verbose=True when calling the fit method.

@prasad01dalavi
Copy link
Author

prasad01dalavi commented May 14, 2020

Hi, thanks for the reply
the image for prediction is out of training sample only

Have updated the code like below

from detecto import core, utils, visualize
from torchvision import transforms


augmentations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomHorizontalFlip(0.5),
    transforms.ColorJitter(saturation=0.5),
    transforms.ToTensor(),
    utils.normalize_transform(),
])


dataset = core.Dataset('../custom_dataset/', transform=augmentations)
model = core.Model(['stamp'])

# dataset = core.Dataset('images/', transform=augmentations)
# loader = core.DataLoader(dataset, batch_size=2, shuffle=True)

model.fit(dataset, verbose=True)
model.save('model_weights_5.pth')
print(f'[INFO] Model Saved successfully!')
image = utils.read_image('../custom_dataset/page-15.jpg')
predictions = model.predict(image)
print(predictions)

Got the issue which was i guess open,

Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%
160M/160M [01:46<00:00, 1.57MB/s]

Epoch 1 of 10
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
/pytorch/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
	nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
	nonzero(Tensor input, *, bool as_tuple)
Epoch 2 of 10
Epoch 3 of 10
Epoch 4 of 10
Epoch 5 of 10
Epoch 6 of 10
Epoch 7 of 10
Epoch 8 of 10
Epoch 9 of 10
Epoch 10 of 10
[INFO] Model Saved successfully!
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-6-918512dfe2ef> in <module>()
     23 print(f'[INFO] Model Saved successfully!')
     24 image = utils.read_image('../custom_dataset/page-15.jpg')
---> 25 predictions = model.predict(image)
     26 print(predictions)

7 frames
/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/_utils.py in decode(self, rel_codes, boxes)
    183             box_sum += val
    184         pred_boxes = self.decode_single(
--> 185             rel_codes.reshape(box_sum, -1), concat_boxes
    186         )
    187         return pred_boxes.reshape(box_sum, -1, 4)

RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

Input image shape for prediction is (4001, 2517, 3)

As per your suggestion to test it on validation set, have the following code

from detecto import core, utils, visualize
import matplotlib.pyplot as plt
from torchvision import transforms
import matplotlib.pyplot as plt


augmentations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomHorizontalFlip(0.5),
    transforms.ColorJitter(saturation=0.5),
    transforms.ToTensor(),
    utils.normalize_transform(),
])

dataset = core.Dataset('../custom_dataset/', transform=augmentations)

loader = core.DataLoader(dataset, batch_size=2, shuffle=True)

val_dataset = core.Dataset('../custom_dataset/')


stamp_model = core.Model.load('model_weights_5.pth', ['stamp'])

losses = stamp_model.fit(loader, val_dataset, epochs=10, learning_rate=0.001, 
                   lr_step_size=5, verbose=True)
                   
plt.plot(losses)
plt.show()

but could not find graph as losses is Nan

output:

 Epoch 1 of 10
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
Loss: nan
Epoch 2 of 10
Loss: nan
Epoch 3 of 10
Loss: nan
Epoch 4 of 10
Loss: nan
Epoch 5 of 10
Loss: nan
Epoch 6 of 10
Loss: nan
Epoch 7 of 10
Loss: nan
Epoch 8 of 10
Loss: nan
Epoch 9 of 10
Loss: nan
Epoch 10 of 10

@alankbi
Copy link
Owner

alankbi commented May 16, 2020

For the first error: see if the suggestions in #33 help. As for the nan losses, could you share what the output of len(dataset) and dataset[0] are?

@prasad01dalavi
Copy link
Author

prasad01dalavi commented May 16, 2020

Thanks for the reply alankbi!

Yes, I had seen #33 but it didnot help me either...

dataset[0]:

[INFO] dataset[0]: (tensor([[[2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         ...,
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489]],

        [[2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         ...,
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286]],

        [[2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400],
         [2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400],
         [2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400],
         ...,
         [2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400],
         [2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400],
         [2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400]]]), {'boxes': tensor([[ 380, 1371,  736, 1595]]), 'labels': 'stamp'})

Length of Dataset:

[INFO] Length of dataset = 171

@alankbi
Copy link
Owner

alankbi commented May 17, 2020

I'm trying to reproduce this error, as it seems a lot of people are having it. Could you provide me with as many details as you can regarding your environment? Python version, PyTorch/torchvision versions, code environment, etc.

In addition, if you're able to, it'd be helpful if you can send me your trained model file and some images through email. This will help me see whether the error is specific to certain use cases, as right now when I run the code on my end, no error occurs.

@jagilley
Copy link

Since I'm having the same issue, I'll share the dataset + trained model that's giving this error with you via Google Drive as well!

@prasad01dalavi
Copy link
Author

prasad01dalavi commented May 18, 2020

I have got one conclusion that
When we install depedencies with
pip install detecto

we have

torchvision==0.6.0+cu101
detecto==1.1.3
tensorflow==2.2.0

i didnot get any error, also prediction did not happen and got a blank tensor

when we install dependencies with
pip install -r requirements.txt

We get this error
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
and we have

torchvision              0.6.0+cu101    
tensorflow               2.2.0       

In both the cases Python Version: 3.6.9

@alankbi
Copy link
Owner

alankbi commented May 18, 2020

@prasad01dalavi so you're saying the error goes away when you run it with pip install detecto? If so, can you try the suggestions in my first comment again regarding setting verbose=True to see what the model loss is while training? Blank tensors are a sign that your model might not be trained well enough.

@jagilley spent some time going through your Drive folder to see if I could get it to work (I created a Colab file which you can browse through, but it's very messy). It seems like there's some issue with your dataset for some reason, which is causing nan losses to come up every time during training:

{'loss_classifier': tensor(nan, device='cuda:0', grad_fn=<NllLossBackward>), 'loss_box_reg': tensor(nan, device='cuda:0', grad_fn=<DivBackward0>), 'loss_objectness': tensor(nan, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_rpn_box_reg': tensor(inf, device='cuda:0', grad_fn=<DivBackward0>)}

I'm not sure why this is the case, but I can try looking deeper into it over the next few days. In the meantime though, hopefully this helps a bit in clearing up what exactly is happening.

@prasad01dalavi
Copy link
Author

prasad01dalavi commented May 18, 2020

you were absolutely right! there is a problem in dataset. one of the xml file filename was not matching with image name. i removed them, rechecked the dataset and file names.

continued with pip install detecto

and boom, it predicted my object with 0.96 confidence..thanks alot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants