Can Yolo3 take different width-height-ratio images as training input? #800

jiqiyang · 2018-05-18T18:28:15Z

Images from VOC or some other datasets does not share exactly the same width-height ratio. For example, in VOC2012 some images are 334x500, some are 500x332, some are 486x500. In KITTY dataset, the width is always roughly 3 times of the height (1200x300).

I don't see any fully connected layers in yolo3. Does it mean that yolo3 can take different width-height-ration images as yolo3's training input?

Or do I need to crop images to the same size or apply SPP-Net technique to yolo3 before training. If SPP-Net is needed, before which yolo3 layer shall I apply the SPP-Net?

AlexeyAB · 2018-05-18T18:46:27Z

Yolo v1/v2/v3 can take different width/height/ratio of images as training/validation/test input.
Fully connected layers doesn't make network invariant to aspect-ratio. Fully connected layers only increase receptive field of each of final activation to the full-image-size. But in the Yolo v3 each final activation in the first [yolo] layer already has large receptive field.

jiqiyang · 2018-05-18T21:05:13Z

Thank you, @AlexeyAB .

Question 1

I also notice that there is a method of configuration on "dim" in darknet/src/detector.c#L87.

just change these 2 lines to: https://github.com/AlexeyAB/darknet/blob/5bc62b14e06a3fcfda4e3a19fba77589920eddee/src/detector.c#L87

    args.w = dim*2;    
    resize_network(nets + i, dim*2, dim);

(https://groups.google.com/forum/#!topic/darknet/HrkhOhxCgLk).

If Yolo v1/v2/v3 can take different width/height/ratio of images as training/validation/test input, then what is the point of configuring something like "dim*2"?

Does it mean that I should just keep it as the original when I am combining different width/height/ratio images as my training data?

args.w = dim;
args.h = dim;

for(i = 0; i < ngpus; ++i){
                resize_network(nets[i], dim, dim);
            }

Question 2

I am also confused by another thing. The instruction from Google groups mention that the "detector.c" should be in the src folder (https://github.com/AlexeyAB/darknet/tree/5bc62b14e06a3fcfda4e3a19fba77589920eddee/src), however I can only find "detector.c" from the examples folder (https://github.com/pjreddie/darknet/tree/master/examples). Should I just leave the detector.c in the examples folder if I am using pjreddie's yolo3 repo (https://github.com/pjreddie/darknet)?

AlexeyAB · 2018-05-18T22:53:05Z

Very simply put, Yolo can take different width/height/ratio of images as input data.
But the more width/height/ratio different (in training and testing datasets) - the worse it detect.
To avoid this, there is data augmentation:

jitter - randomly resizes image
random - randomly resizes network resize_network(nets[i], dim, dim);

Joseph moved it from src to the example folder, so just keep it there.

danieltwx · 2019-03-04T11:17:51Z

@AlexeyAB
Hi AlexeyAB, does it means that i do not have to resize my images as the network has
jitter - randomly resizes image
random - randomly resizes network resize_network(nets[i], dim, dim);

I am planning to train images on drones using YOLOv3, would want to ask if resizing the image would help the detector become more accurate. If so, what size would be the recommended size.

Appreciate ur help thank you!

AlexeyAB · 2019-03-04T11:44:55Z

@danieltwx Hi, you shouldn't resize images.

amankumarjain · 2019-04-02T19:34:02Z

@AlexeyAB
I have trained the network with cfg [net]: width=416, height=416, training dataset provided to network are of different different sizes as suggested by everyone to not resize images while training, network will do itself. (training done with loss=0.5)

while prediction some of the images are in ratio of 1:3, will model work for it, or do i need to resize to 1:1?

Appreciate ur help thank you!

MurreyCode · 2019-05-08T15:20:32Z

@AlexeyAB
Hi Alexey,

I'm training YOLOv3 on a dataset with just 1 object to be detected and classified per image (classes=4). The object is a rectangle that takes 80-95% of the image space almost always (it is a business card). The ratio of the images is 1:1.5 approx.

Given that the borders of the object are very close to the limits of the image (sometimes even touching them), I've set width=640, height=416 in my .cfg file, for the moment.

Is it safe to set both width and height at 416 as recommended? Or I'm risking losing valuable information due to the closeness of the object to the image limits?

Thanks for your great contribution and support to the community!

hoaquocphan · 2020-03-03T09:58:13Z

Hello,
I'm beginning in AI.
currently, I import Yolov3 to onnx,
so could anyone please share me the sample .cpp file to import yolov3.

Thanks

pullmyleg · 2020-04-06T05:02:59Z

@MurreyCode you don't need to adjust height and width differently in your config or resize your database images. YOLO architecture does it by itself keeping the aspect ratio safe (no information will be ignored) according to the resolution in .cfg file. For Example, if you have image size 1248 x 936, YOLO will resize it to 416 x 312 and then pad the extra space with black bars to fit into a 416 x 416 network.

erolgerceker · 2020-09-11T08:16:36Z

@MurreyCode you don't need to adjust height and width differently in your config or resize your database images. YOLO architecture does it by itself keeping the aspect ratio safe (no information will be ignored) according to the resolution in .cfg file. For Example, if you have image size 1248 x 936, YOLO will resize it to 416 x 312 and then pad the extra space with black bars to fit into a 416 x 416 network.

for which version ?

maheshmechengg · 2021-03-03T09:30:16Z

Hi Alexey,
Great work on yolo! Can i train my yolov5 for custom image size i.e. 720x720 or higher ? If yes do i need to train for more number if steps or as per classes i can stick to the steps calculated as in config file ?

Looking forward for your answer & help on this.
Regards,
Mahesh

pullmyleg · 2021-03-03T09:37:10Z

@maheshmechengg you can increase resolution as much as you like as long as it’s divisible by 32.

But you will need to decrease your batch size. As you increase your image training resolution the images take up more memory in your GPU so you need to decrease your batch size to allow them to fit on the GPU memory.

Decreasing batch size does slowly decrease accuracy. But in my experience a higher resolution (to an extent) and a decreased batch size results in better accuracy.

You adjust batch size by increasing subdivisions in your config as per the instructions for out of memory issue if or when these arise as you increase your resolution.

Number of steps or iterations does not need to increase in addition with a resolution increase.

maheshmechengg · 2021-03-03T10:34:41Z

@maheshmechengg you can increase resolution as much as you like as long as it’s divisible by 32.

But you will need to decrease your batch size. As you increase your image training resolution the images take up more memory in your GPU so you need to decrease your batch size to allow them to fit on the GPU memory.

Decreasing batch size does slowly decrease accuracy. But in my experience a higher resolution (to an extent) and a decreased batch size results in better accuracy.

You adjust batch size by increasing subdivisions in your config as per the instructions for out of memory issue if or when these arise as you increase your resolution.

Number of steps or iterations does not need to increase in addition with a resolution increase.

yes thanks, i did the same way as you said.

sekomer · 2021-11-17T15:31:59Z

Hi Alexey,

I have a question about .cfg file of YoloV3. How does changing width or height effect the model? Isn't it taking fixed shape images as input? When I increase the height and the width while testing the models performance it increases the detection score and decrease the fps and frankly I couldn't find the reason.

Thanks in advance.

pullmyleg · 2021-11-17T18:06:13Z

Hi Alexey,

I have a question about .cfg file of YoloV3. How does changing width or height effect the model? Isn't it taking fixed shape images as input? When I increase the height and the width while testing the models performance it increases the detection score and decrease the fps and frankly I couldn't find the reason.

Thanks in advance.

@sekomer increasing the height and width increases the amount of pixels the model can use to detect objects.

More pixels equates to better accuracy because there is more detail in the image for the model to utilise.

An image sized 100x100 px has far less detail then an image with 1000x1000px.

It runs slower because the model needs to scan across more pixels (residual blocks).

Suggest reading this: https://www.section.io/engineering-education/introduction-to-yolo-algorithm-for-object-detection/

sekomer · 2021-11-18T11:14:51Z

More pixels equates to better accuracy because there is more detail in the image for the model to utilise.

An image sized 100x100 px has far less detail then an image with 1000x1000px.

It runs slower because the model needs to scan across more pixels (residual blocks).

Suggest reading this: https://www.section.io/engineering-education/introduction-to-yolo-algorithm-for-object-detection/

First, thanks for your answer.

We're on the same page as what you said, but what I don't understand is what changes when I double the h and w values in the cfg file during testing. Is it splitting the image into 4 subimages and iterating over them or doing some other black magic? I want to learn this.

vongracia · 2022-06-02T12:31:08Z

@AlexeyAB

Hi people. I'm using yolov4 to train 5K images 3180 x 2160 for object detection, 1 class. The training seems to complete successfully ([email protected] = 98%) but the problem comes to inference some images. Training charts also look OK

The problem is that when I conduct inference, the predicted bounding box is shifted from the real object. Shifted around 100 pixels x 100 pixels in X, Y, respectively. The object seems to be recognized but the BB is not located exactly in the right position.

I have in my cyolov4-custom.cfg:
batch=64
subdivisions=32
width=416
height=416

Do you think this shift could be because my training images are non-squared (3180 x 2160 - 1:1.7 proportion), while and weight and height values in the .cfg are 1:1 proportion (416x416). Could this mistmatch be the responsible for such shift in the predicted bounding box?

Please any light or hints to clarify this would be extremely helpful, thanks
Take care

pullmyleg · 2022-06-02T21:05:48Z

Hi @vongracia

Few things:

416x416 is quite a low resolution.
Use a resolution that is in a similar aspect ratio. e.g. 640 x 480 (or larger if you want to increase mAP). Note resolution in your config must be divisible by 32.
In your config underneath the 'scales' line add: letter_box=1 this will maintain the aspect ratio of your images during training and likely solve your issue.

If the above doesn't solve your issue (it should) you can do the below to increase bounding box tightness:

to make the detected bounded boxes more accurate, you can add 3 parameters ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou to each [yolo] layer and train, it will increase [email protected], but decrease [email protected].

vongracia · 2022-06-15T07:51:45Z

Hi @pullmyleg @AlexeyAB

Thanks a lot for the answer. I've tried the things that you proposed and there is no apparent change in the inference. Still mismatch. See next.

This is the training chart.

The .cgf file (for training) look like this (where you see I changed width and height to be multiple of 32, and in proportion to the non-square training images: 3180x2160; and adding letter_box=1; ALSO I added: ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou and still no appreciable change....):

The cfg file (for inference) is the same but batch=1, subdivisions=1.

These are the metrics for the trained model:

Could you people give me more hints? I do not really understand what is going on here... metrics do not look bad, isn't it?

Thanks in advance!

pullmyleg · 2022-06-15T23:40:58Z

@vongracia

Are you using this repo? You should be using: https://github.com/AlexeyAB/darknet
Use a tiny yolo model. You can have a bigger batch size and training will be faster with the above repo. https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4-tiny-custom.cfg
What size image are you running inference on? The large images?
Can you show your data and labels in yolo_label or similar?
You should be training for 5000 - 10000 iterations.

saktheeswaranswan · 2022-09-06T13:34:54Z

@AlexeyAB

Hi people. I'm using yolov4 to train 5K images 3180 x 2160 for object detection, 1 class. The training seems to complete successfully ([email protected] = 98%) but the problem comes to inference some images. Training charts also look OK

The problem is that when I conduct inference, the predicted bounding box is shifted from the real object. Shifted around 100 pixels x 100 pixels in X, Y, respectively. The object seems to be recognized but the BB is not located exactly in the right position.

I have in my cyolov4-custom.cfg:
batch=64
subdivisions=32
width=416
height=416

Do you think this shift could be because my training images are non-squared (3180 x 2160 - 1:1.7 proportion), while and weight and height values in the .cfg are 1:1 proportion (416x416). Could this mistmatch be the responsible for such shift in the predicted bounding box?

Please any light or hints to clarify this would be extremely helpful, thanks
Take care

Can you publish the cfg files or tell how to train 360 degree camera

3180 x 2160 pixel images thanks in advance

pullmyleg · 2022-09-06T20:13:55Z

@saktheeswaranswan Are you using this repo? You should be using: https://github.com/AlexeyAB/darknet

roxroxroxrox · 2022-09-14T09:47:48Z

@vongracia
in this case , i recommend instance segmentation networks.

Renat2001 mentioned this issue Feb 25, 2024

How are images resized in Batch Operations? ultralytics/yolov5#12763

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can Yolo3 take different width-height-ratio images as training input? #800

Can Yolo3 take different width-height-ratio images as training input? #800

jiqiyang commented May 18, 2018

AlexeyAB commented May 18, 2018

jiqiyang commented May 18, 2018

AlexeyAB commented May 18, 2018

danieltwx commented Mar 4, 2019 •

edited

Loading

AlexeyAB commented Mar 4, 2019

amankumarjain commented Apr 2, 2019

MurreyCode commented May 8, 2019

hoaquocphan commented Mar 3, 2020

pullmyleg commented Apr 6, 2020

erolgerceker commented Sep 11, 2020

maheshmechengg commented Mar 3, 2021

pullmyleg commented Mar 3, 2021 •

edited

Loading

maheshmechengg commented Mar 3, 2021

sekomer commented Nov 17, 2021

pullmyleg commented Nov 17, 2021

sekomer commented Nov 18, 2021

vongracia commented Jun 2, 2022 •

edited

Loading

pullmyleg commented Jun 2, 2022

vongracia commented Jun 15, 2022

pullmyleg commented Jun 15, 2022

saktheeswaranswan commented Sep 6, 2022

pullmyleg commented Sep 6, 2022

roxroxroxrox commented Sep 14, 2022

Can Yolo3 take different width-height-ratio images as training input? #800

Can Yolo3 take different width-height-ratio images as training input? #800

Comments

jiqiyang commented May 18, 2018

AlexeyAB commented May 18, 2018

jiqiyang commented May 18, 2018

Question 1

Question 2

AlexeyAB commented May 18, 2018

danieltwx commented Mar 4, 2019 • edited Loading

AlexeyAB commented Mar 4, 2019

amankumarjain commented Apr 2, 2019

MurreyCode commented May 8, 2019

hoaquocphan commented Mar 3, 2020

pullmyleg commented Apr 6, 2020

erolgerceker commented Sep 11, 2020

maheshmechengg commented Mar 3, 2021

pullmyleg commented Mar 3, 2021 • edited Loading

maheshmechengg commented Mar 3, 2021

sekomer commented Nov 17, 2021

pullmyleg commented Nov 17, 2021

sekomer commented Nov 18, 2021

vongracia commented Jun 2, 2022 • edited Loading

pullmyleg commented Jun 2, 2022

vongracia commented Jun 15, 2022

pullmyleg commented Jun 15, 2022

saktheeswaranswan commented Sep 6, 2022

pullmyleg commented Sep 6, 2022

roxroxroxrox commented Sep 14, 2022

danieltwx commented Mar 4, 2019 •

edited

Loading

pullmyleg commented Mar 3, 2021 •

edited

Loading

vongracia commented Jun 2, 2022 •

edited

Loading