Not detecting the custom object #36

prasad01dalavi · 2020-05-13T10:56:41Z

have trained the model on about 126 images, 30% images contain two objects in a image

from detecto import core, utils, visualize


dataset = core.Dataset('../custom_dataset/')
model = core.Model(['stamp'])

model.fit(dataset)
model.save('model_weights_3.pth')
print(f'[INFO] Model Saved successfully!')
image = utils.read_image('../custom_dataset/page-15.jpg')
predictions = model.predict(image)
print(predictions)

Output:

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
[INFO] Model Saved successfully!
([], tensor([], size=(0, 4)), tensor([]))

The text was updated successfully, but these errors were encountered:

alankbi · 2020-05-14T05:28:56Z

It looks like the model was unable to detect any stamps in that image. It could be that the image you used was very difficult for it to predict on, or possibly your model has poor accuracy even after training. You can check how well your model is doing by passing it a validation dataset and setting verbose=True when calling the fit method.

prasad01dalavi · 2020-05-14T08:02:24Z

Hi, thanks for the reply
the image for prediction is out of training sample only

Have updated the code like below

from detecto import core, utils, visualize
from torchvision import transforms


augmentations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomHorizontalFlip(0.5),
    transforms.ColorJitter(saturation=0.5),
    transforms.ToTensor(),
    utils.normalize_transform(),
])


dataset = core.Dataset('../custom_dataset/', transform=augmentations)
model = core.Model(['stamp'])

# dataset = core.Dataset('images/', transform=augmentations)
# loader = core.DataLoader(dataset, batch_size=2, shuffle=True)

model.fit(dataset, verbose=True)
model.save('model_weights_5.pth')
print(f'[INFO] Model Saved successfully!')
image = utils.read_image('../custom_dataset/page-15.jpg')
predictions = model.predict(image)
print(predictions)

Got the issue which was i guess open,

Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%
160M/160M [01:46<00:00, 1.57MB/s]

Epoch 1 of 10
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
/pytorch/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
	nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
	nonzero(Tensor input, *, bool as_tuple)
Epoch 2 of 10
Epoch 3 of 10
Epoch 4 of 10
Epoch 5 of 10
Epoch 6 of 10
Epoch 7 of 10
Epoch 8 of 10
Epoch 9 of 10
Epoch 10 of 10
[INFO] Model Saved successfully!
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-6-918512dfe2ef> in <module>()
     23 print(f'[INFO] Model Saved successfully!')
     24 image = utils.read_image('../custom_dataset/page-15.jpg')
---> 25 predictions = model.predict(image)
     26 print(predictions)

7 frames
/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/_utils.py in decode(self, rel_codes, boxes)
    183             box_sum += val
    184         pred_boxes = self.decode_single(
--> 185             rel_codes.reshape(box_sum, -1), concat_boxes
    186         )
    187         return pred_boxes.reshape(box_sum, -1, 4)

RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

Input image shape for prediction is (4001, 2517, 3)

As per your suggestion to test it on validation set, have the following code

from detecto import core, utils, visualize
import matplotlib.pyplot as plt
from torchvision import transforms
import matplotlib.pyplot as plt


augmentations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomHorizontalFlip(0.5),
    transforms.ColorJitter(saturation=0.5),
    transforms.ToTensor(),
    utils.normalize_transform(),
])

dataset = core.Dataset('../custom_dataset/', transform=augmentations)

loader = core.DataLoader(dataset, batch_size=2, shuffle=True)

val_dataset = core.Dataset('../custom_dataset/')


stamp_model = core.Model.load('model_weights_5.pth', ['stamp'])

losses = stamp_model.fit(loader, val_dataset, epochs=10, learning_rate=0.001, 
                   lr_step_size=5, verbose=True)
                   
plt.plot(losses)
plt.show()

but could not find graph as losses is Nan

output:

 Epoch 1 of 10
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
Loss: nan
Epoch 2 of 10
Loss: nan
Epoch 3 of 10
Loss: nan
Epoch 4 of 10
Loss: nan
Epoch 5 of 10
Loss: nan
Epoch 6 of 10
Loss: nan
Epoch 7 of 10
Loss: nan
Epoch 8 of 10
Loss: nan
Epoch 9 of 10
Loss: nan
Epoch 10 of 10

alankbi · 2020-05-16T07:46:47Z

For the first error: see if the suggestions in #33 help. As for the nan losses, could you share what the output of len(dataset) and dataset[0] are?

prasad01dalavi · 2020-05-16T14:05:10Z

Thanks for the reply alankbi!

Yes, I had seen #33 but it didnot help me either...

dataset[0]:

[INFO] dataset[0]: (tensor([[[2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         ...,
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489],
         [2.2489, 2.2489, 2.2489,  ..., 2.2489, 2.2489, 2.2489]],

        [[2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         ...,
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286],
         [2.4286, 2.4286, 2.4286,  ..., 2.4286, 2.4286, 2.4286]],

        [[2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400],
         [2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400],
         [2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400],
         ...,
         [2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400],
         [2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400],
         [2.6400, 2.6400, 2.6400,  ..., 2.6400, 2.6400, 2.6400]]]), {'boxes': tensor([[ 380, 1371,  736, 1595]]), 'labels': 'stamp'})

Length of Dataset:

[INFO] Length of dataset = 171

alankbi · 2020-05-17T04:01:55Z

I'm trying to reproduce this error, as it seems a lot of people are having it. Could you provide me with as many details as you can regarding your environment? Python version, PyTorch/torchvision versions, code environment, etc.

In addition, if you're able to, it'd be helpful if you can send me your trained model file and some images through email. This will help me see whether the error is specific to certain use cases, as right now when I run the code on my end, no error occurs.

jagilley · 2020-05-17T19:37:09Z

Since I'm having the same issue, I'll share the dataset + trained model that's giving this error with you via Google Drive as well!

prasad01dalavi · 2020-05-18T03:05:53Z

I have got one conclusion that
When we install depedencies with
pip install detecto

we have

torchvision==0.6.0+cu101
detecto==1.1.3
tensorflow==2.2.0

i didnot get any error, also prediction did not happen and got a blank tensor

when we install dependencies with
pip install -r requirements.txt

We get this error
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
and we have

torchvision              0.6.0+cu101    
tensorflow               2.2.0

In both the cases Python Version: 3.6.9

alankbi · 2020-05-18T07:38:45Z

@prasad01dalavi so you're saying the error goes away when you run it with pip install detecto? If so, can you try the suggestions in my first comment again regarding setting verbose=True to see what the model loss is while training? Blank tensors are a sign that your model might not be trained well enough.

@jagilley spent some time going through your Drive folder to see if I could get it to work (I created a Colab file which you can browse through, but it's very messy). It seems like there's some issue with your dataset for some reason, which is causing nan losses to come up every time during training:

{'loss_classifier': tensor(nan, device='cuda:0', grad_fn=<NllLossBackward>), 'loss_box_reg': tensor(nan, device='cuda:0', grad_fn=<DivBackward0>), 'loss_objectness': tensor(nan, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_rpn_box_reg': tensor(inf, device='cuda:0', grad_fn=<DivBackward0>)}

I'm not sure why this is the case, but I can try looking deeper into it over the next few days. In the meantime though, hopefully this helps a bit in clearing up what exactly is happening.

prasad01dalavi · 2020-05-18T10:55:14Z

you were absolutely right! there is a problem in dataset. one of the xml file filename was not matching with image name. i removed them, rechecked the dataset and file names.

continued with pip install detecto

and boom, it predicted my object with 0.96 confidence..thanks alot

prasad01dalavi closed this as completed May 18, 2020

This was referenced Jun 16, 2020

Runtime error during model.predict() call #33

Closed

Support for multiple objects in single image #40

Closed

alankbi mentioned this issue Jul 18, 2021

Losses from validation set are NAN #91

Open

alankbi mentioned this issue Jan 7, 2022

Loss is nan on validation #98

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not detecting the custom object #36

Not detecting the custom object #36

prasad01dalavi commented May 13, 2020

alankbi commented May 14, 2020

prasad01dalavi commented May 14, 2020 •

edited

Loading

alankbi commented May 16, 2020

prasad01dalavi commented May 16, 2020 •

edited

Loading

alankbi commented May 17, 2020

jagilley commented May 17, 2020

prasad01dalavi commented May 18, 2020 •

edited

Loading

alankbi commented May 18, 2020

prasad01dalavi commented May 18, 2020 •

edited

Loading

Not detecting the custom object #36

Not detecting the custom object #36

Comments

prasad01dalavi commented May 13, 2020

alankbi commented May 14, 2020

prasad01dalavi commented May 14, 2020 • edited Loading

alankbi commented May 16, 2020

prasad01dalavi commented May 16, 2020 • edited Loading

alankbi commented May 17, 2020

jagilley commented May 17, 2020

prasad01dalavi commented May 18, 2020 • edited Loading

alankbi commented May 18, 2020

prasad01dalavi commented May 18, 2020 • edited Loading

prasad01dalavi commented May 14, 2020 •

edited

Loading

prasad01dalavi commented May 16, 2020 •

edited

Loading

prasad01dalavi commented May 18, 2020 •

edited

Loading

prasad01dalavi commented May 18, 2020 •

edited

Loading