Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning/Sparsity Tutorial #304

Open
Tracked by #22
glenn-jocher opened this issue Jul 5, 2020 · 55 comments
Open
Tracked by #22

Pruning/Sparsity Tutorial #304

glenn-jocher opened this issue Jul 5, 2020 · 55 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 5, 2020

📚 This guide explains how to apply pruning to YOLOv5 🚀 models. UPDATED 25 September 2022.

Before You Start

Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Test Normally

Before pruning we want to establish a baseline performance to compare to. This command tests YOLOv5x on COCO val2017 at image size 640 pixels. yolov5x.pt is the largest and most accurate model available. Other options are yolov5s.pt, yolov5m.pt and yolov5l.pt, or you own checkpoint from training a custom dataset ./weights/best.pt. For details on all available models please see our README table.

$ python val.py --weights yolov5x.pt --data coco.yaml --img 640 --half

Output:

val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.0-224-g4c40933 torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
Model Summary: 444 layers, 86705005 parameters, 0 gradients
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     [email protected] [email protected]:.95: 100% 157/157 [01:12<00:00,  2.16it/s]
                 all       5000      36335      0.732      0.628      0.683      0.496
Speed: 0.1ms pre-process, 5.2ms inference, 1.7ms NMS per image at shape (32, 3, 640, 640)  # <--- base speed

Evaluating pycocotools mAP... saving runs/val/exp2/yolov5x_predictions.json...
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.507  # <--- base mAP
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.689
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.552
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.345
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.559
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.652
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.630
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.682
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.731
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.829
Results saved to runs/val/exp

Test YOLOv5x on COCO (0.30 sparsity)

We repeat the above test with a pruned model by using the torch_utils.prune() command. We update val.py to prune YOLOv5x to 0.3 sparsity:

Screenshot 2022-02-02 at 22 54 18

30% pruned output:

val: data=/content/yolov5/data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.0-224-g4c40933 torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
Model Summary: 444 layers, 86705005 parameters, 0 gradients
Pruning model...  0.3 global sparsity
val: Scanning '/content/datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupt: 100% 5000/5000 [00:00<?, ?it/s]
               Class     Images     Labels          P          R     [email protected] [email protected]:.95: 100% 157/157 [01:11<00:00,  2.19it/s]
                 all       5000      36335      0.724      0.614      0.671      0.478
Speed: 0.1ms pre-process, 5.2ms inference, 1.7ms NMS per image at shape (32, 3, 640, 640)  # <--- prune mAP

Evaluating pycocotools mAP... saving runs/val/exp3/yolov5x_predictions.json...
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.489  # <--- prune mAP
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.677
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.537
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.334
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.635
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.370
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.612
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.664
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.496
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.722
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.803
Results saved to runs/val/exp3

In the results we can observe that we have achieved a sparsity of 30% in our model after pruning, which means that 30% of the model's weight parameters in nn.Conv2d layers are equal to 0. Inference time is essentially unchanged, while the model's AP and AR scores a slightly reduced.

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher glenn-jocher added enhancement New feature or request documentation Improvements or additions to documentation labels Jul 5, 2020
@glenn-jocher glenn-jocher self-assigned this Jul 5, 2020
@lucasjinreal
Copy link

@glenn-jocher why the speed doesn't change at all after prune? Is that only remove the weight of conv but not changed the structure actually? how to save the pruned model and it's architecture for retraining?

@NanoCode012
Copy link
Contributor

Is there a guideline on how much we should prune by? What are the benefits to doing this?

@glenn-jocher
Copy link
Member Author

@jinfagang yes, structure is not changed at all, and parameter count is the same, it's just that some of the weights are 0 instead of near zero as they were before.

I suppose this would allow for effective kmeans quantization to lower bits (for smaller filesizes), but I'm not sure about any possible speed improvement. I think as long as the parameter count remains the same, the speed will remain the same.

@NanoCode012 no guidelines really, its just an experiment to see how many of the weights you can remove and what effect that has on performance. Honestly I don't really see any great applications at the moment based on my results above, but it's there in case anyone would like to explore it further.

@lucasjinreal
Copy link

lucasjinreal commented Jul 7, 2020

@glenn-jocher Looka like prune has a remove method which can remove weights:

prune.remove(module, 'weight')

and all weights and params saved in module.state_dict which can be used for new pruned model.

@glenn-jocher
Copy link
Member Author

@jinfagang yes, this .remove() method is deleting the original weights as there is a pruned copy also in the model. So before applying remove the model/module will have 2X the normal parameters, after using it it is back to it's normal parameter count.

You have to consider the shapes of the operations in the forward pass. For a convolution from say shape(1,128,20,20) to shape(1,256,20,20) you must have a weight matrix of shape 128x256. It's not possible to remove elements from a normal matrix or tensor, as it will always need 128*256 weights inside it.

There are special cases of sparse matrices in some packages/languages, it may be possible pytorch is converting the original tensor to a sparse tensor with the same shape, though I'm not sure if this is the case. Even if it were, any exported models (i.e. onnx, coreml, tensorrt) using these sparse matrices would need special support for them, or they would be handled as normal matrices.

@glenn-jocher
Copy link
Member Author

The current pruning method incorporates the line of code you mention already as well:

def prune(model, amount=0.3):
# Prune model to requested global sparsity
import torch.nn.utils.prune as prune
print('Pruning model... ', end='')
for name, m in model.named_modules():
if isinstance(m, nn.Conv2d):
prune.l1_unstructured(m, name='weight', amount=amount) # prune
prune.remove(m, 'weight') # make permanent
print(' %.3g global sparsity' % sparsity(model))

@lucasjinreal
Copy link

@glenn-jocher Nice. do u figure out how to obtain the pruned model architecture?

@glenn-jocher
Copy link
Member Author

@jinfagang well that's what I was saying, the architecture does not change. In my example above, the 128x256 convolution weights are still a 128x256 weights, it's just that some of their values that were previously near-zero have been set equal to zero during the pruning. The 128x256 matrix may or may not then be stored as a sparse matrix, which is a special type of matrix intended for use with data that contains mostly zeros, and saves memory (and maybe or maybe not also saves processing time).

TLDR the architecture is exactly the same when pruning, no layers are removed as far as I know, and the input and output shapes (and shapes of all intermediate layers) remain the same.

@lucasjinreal
Copy link

@glenn-jocher so the simplified model can not get it's new channel num and shape automatically, is there anyway to make it happen?

@Lornatang
Copy link
Contributor

@glenn-jocher First feel your work! Let me ask you, which paper or project address is your pruning based on?

@glenn-jocher
Copy link
Member Author

@Lornatang I based this pruning implementation off of the original pytorch pruning tutorial at the link below, but the idea to apply pruning here originally came from @jinfagang. I don't actually have any experience pruning models.
https://pytorch.org/tutorials/intermediate/pruning_tutorial.html

@jinfagang I modified detect.py to prune and save, and print updated model info:

    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    torch_utils.model_info(model)
    torch.save({'model': model}, 'model_normal.pt')

    torch_utils.prune(model, 0.3)
    torch_utils.model_info(model)
    torch.save({'model': model}, 'model_pruned.pt')

Output:

Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients, 17.5 GFLOPS
Pruning model...  0.299 global sparsity
Model Summary: 140 layers, 7.45958e+06 parameters, 7.45958e+06 gradients, 17.5 GFLOPS

Model sizes are here (for both yolov5s in FP32):
Screen Shot 2020-07-07 at 9 58 11 PM

@HenryWang628
Copy link

So maybe layer pruning or channel-level sparsity works better since it changes the architecture of the network?
I have seen a project like this:
https://github.com/tanluren/yolov3-channel-and-layer-pruning

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Jul 12, 2020

@HenryWang628 I see, thanks for the link. The tensorboard histograms are very nice. So it seems a more useful method would be channel prune, mAP drop > finetune x epochs, recover some lost mAP.

This all raises the question though, if you are going to go through all of this effort on a large model like YOLOv5x, why not just train a smaller model like YOLOv5s? The training time will be much faster, and you don't need the extra pruning and finetuning steps.

@glenn-jocher
Copy link
Member Author

For anyone interested, there is a detailed discussion on this here pytorch/tutorials#1054 (comment)

The author there says this:

I'm not familiar with your architecture, so you'll have to decide which parameters it makes sense to pool together and compare via global magnitude-based pruning; but let's assume, just for the sake of this simple example, that you only want to consider the convolutional layers identified by the logic of my if-statement below [if those aren't the weights you care about, please feel free to modify that logic as you wish].

Now, those layers happen to come with two parameters: "weight" and "bias". Let's say you are interested in the weights [if you care about the biases too, feel free to add them in as well in the parameters_to_prune]. Alright, how do we tell global_unstructured to prune those weights in a global manner? We do so by constructing parameters_to_prune as requested by that function [again, see docs and tutorial linked above].

parameter_to_prune = [
    (v, "weight") 
    for k, v in dict(model.named_modules()).items()
    if ((len(list(v.children())) == 0) and (k.endswith('conv')))
]

# now you can use global_unstructured pruning
prune.global_unstructured(parameter_to_prune, pruning_method=prune.L1Unstructured, amount=0.3)

To check that that succeeded, you can now look at the global sparsity across those layers, which should be 30%, as well as the individual per-layer sparsity:

# global sparsity
nparams = 0
pruned = 0
for k, v in dict(model.named_modules()).items():
    if ((len(list(v.children())) == 0) and (k.endswith('conv'))):
        nparams += v.weight.nelement()
        pruned += torch.sum(v.weight == 0)
print('Global sparsity across the pruned layers: {:.2f}%'.format( 100. * pruned / float(nparams)))
# ^^ should be 30%

# local sparsity
for k, v in dict(model.named_modules()).items():
    if ((len(list(v.children())) == 0) and (k.endswith('conv'))):
        print(
            "Sparsity in {}: {:.2f}%".format(
                k,
                100. * float(torch.sum(v.weight == 0))
                / float(v.weight.nelement())
            )
        )
# ^^ will be different for each layer

Originally posted by @mickypaganini in pytorch/tutorials#1054 (comment)

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Jul 14, 2020

More info from pytorch/tutorials#605 (comment)

Hi @cranmer,
Hopefully this tutorial will be included soon (cc: @soumith).

As is, this module is not intended (by itself) to help you with memory savings. All that pruning does is to replace some entries with zeroes. This itself doesn't buy you anything, unless you represent the sparse tensor in a smarter way (which this module itself doesn't handle for you). You can, however, rely on torch.sparse and other functionalities there to help you with that. To give you a concrete example:

import torch
import torch.nn.utils.prune as prune

t = torch.randn(100, 100)
torch.save(t, 'full.pth')

p = prune.L1Unstructured(amount=0.9)
pruned = p.prune(t)
torch.save(pruned, 'pruned.pth')

sparsified = pruned.to_sparse()
torch.save(sparsified, 'sparsified.pth')

When I ls, these are the sizes on disk:

21K sparsified.pth
40K pruned.pth
40K full.pth

By the way, before calling prune.remove, you can expect you memory footprint to be a lot higher than what you started out with, because for each pruned parameter you now have: the original parameter, the mask, and the pruned version of the tensor. Calling prune.remove brings you back to only having a single (now pruned) tensor per pruned parameter. Still, if you don't represent these pruned parameters smartly, the memory footprint at this point won't be any lower than what you started out with.

Originally posted by @mickypaganini in pytorch/tutorials#605 (comment)

@Lornatang
Copy link
Contributor

@glenn-jocher I think you can refer to https://github.com/vainf/torch-pruning, he has implemented this function in detail.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Sep 4, 2020
@github-actions github-actions bot closed this as completed Sep 9, 2020
@shoebNTU
Copy link

Hi, thank you everyone for the informative comments. Thanks Glen for this super-cool library. Not sure if there is a way to implement a line like - "sparsified = pruned.to_sparse()" (pytorch/tutorials#605 (comment)) for nn.conv2d?

I am trying to reduce the overall model weights. Eventually, I want to port this to a Jetson Nano. My understanding is that a smaller model yields --> faster speeds. Please correct me if my understanding is wrong. Thanks.

@glenn-jocher
Copy link
Member Author

@shoebNTU any speed benefits would depend on the capability of your hardware and drivers to exploit sparse matrices, so there is no single answer to your question.

@glenn-jocher glenn-jocher removed the Stale Stale and schedule for closing soon label Oct 8, 2020
@glenn-jocher glenn-jocher reopened this Oct 8, 2020
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@jayer95
Copy link

jayer95 commented Nov 1, 2022

@glenn-jocher
Hi,
Could I just ask you a question regarding why the pruning taught here is not sparsely trained?

I refer to the following projects:
https://github.com/midasklr/yolov5prune/tree/v5.0
https://github.com/midasklr/yolov5prune/tree/v6.0
https://github.com/tanluren/yolov3-channel-and-layer-pruning

As the sparse training epoch progresses, more and more gamma approaches 0 by looking at tensorboard bn.

bn_weights_hist

After training, pruning can be performed. A basic principle is that the threshold cannot be greater than the maximum gamma of any channel bn. Then prune according to the percentage.

@glenn-jocher
Copy link
Member Author

@jayer95 our tutorial is in need of updating! I wrote it myself a while ago. If you'd like to propose updates/fixes that would be awesome to help everyone :)

@jayer95
Copy link

jayer95 commented Nov 2, 2022

@glenn-jocher Sure, I got it :)

@DaphnaNanovel
Copy link

Hello, is it possible to retrain pruned model? We have trained yolov5 on our custom data, then pruned the model, and would like to retrain it on the same custom data. The naive attempt to perform normal training on the pruned model was not successful and the following error was caught:
model = Model(cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device) # create TypeError: 'DetectMultiBackend' object is not subscriptable

@Abanoub-G
Copy link

Abanoub-G commented Mar 2, 2023

Hi,

Thanks a lot for the tutorial and the very insightful conversation. I have successfully managed to prune and save yolov5s. However, when I come to run val.py on the saved model I get the following error:

File "models/yolov5/val.py", line 420, in <module>
    main(opt)
  File "models/yolov5/val.py", line 391, in main
    run(**vars(opt))
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "models/yolov5/val.py", line 142, in run
    model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
  File "/home/NetZIP/models/yolov5/models/common.py", line 345, in __init__
    model = attempt_load(weights if isinstance(weights, list) else w, device=device, inplace=True, fuse=fuse)
  File "/home/NetZIP/models/yolov5/models/experimental.py", line 88, in attempt_load
    model.append(ckpt.fuse().eval() if fuse and hasattr(ckpt, 'fuse') else ckpt.eval())  # model in eval mode
TypeError: 'bool' object is not callable

Note, the val.py works fine when I run it using the yolov5s.pt model, but throws out the error above when running the pruned saved model. I used the code provided earlier in this conversation to save the model (https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445).

I think the issue might be in how the model gets saved rather than the pruning, because I also tried just simply saving the yolov5s.pt model without the pruning using the save code provided here https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445 and it resulted in the same error when running val.py on it.

I have been looking at this for a while and can not seem to find what is causing this error or what is the issue with the saving method. The only thing I was able to spot is that the files inside the yolov5s.pt/data/ and yolov5s_fp_32_pruned.pt/data/ have different numerals. See attached screenshots below. Could this be the issue? if yes, any idea what is causing it and how to correct it please?

Thanks

yolov5_data
yolov5_prunded_data

@relaxtheo
Copy link

Hi,

Thanks a lot for the tutorial and the very insightful conversation. I have successfully managed to prune and save yolov5s. However, when I come to run val.py on the saved model I get the following error:

File "models/yolov5/val.py", line 420, in <module>
    main(opt)
  File "models/yolov5/val.py", line 391, in main
    run(**vars(opt))
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "models/yolov5/val.py", line 142, in run
    model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
  File "/home/NetZIP/models/yolov5/models/common.py", line 345, in __init__
    model = attempt_load(weights if isinstance(weights, list) else w, device=device, inplace=True, fuse=fuse)
  File "/home/NetZIP/models/yolov5/models/experimental.py", line 88, in attempt_load
    model.append(ckpt.fuse().eval() if fuse and hasattr(ckpt, 'fuse') else ckpt.eval())  # model in eval mode
TypeError: 'bool' object is not callable

Note, the val.py works fine when I run it using the yolov5s.pt model, but throws out the error above when running the pruned saved model. I used the code provided earlier in this conversation to save the model (https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445).

I think the issue might be in how the model gets saved rather than the pruning, because I also tried just simply saving the yolov5s.pt model without the pruning using the save code provided here https://docs.ultralytics.com/yolov5/tutorials/model_pruning_and_sparsity#issuecomment-655284445 and it resulted in the same error when running val.py on it.

I have been looking at this for a while and can not seem to find what is causing this error or what is the issue with the saving method. The only thing I was able to spot is that the files inside the yolov5s.pt/data/ and yolov5s_fp_32_pruned.pt/data/ have different numerals. See attached screenshots below. Could this be the issue? if yes, any idea what is causing it and how to correct it please?

Thanks

yolov5_data yolov5_prunded_data

I have same problem. In yolov5, the pt file is a ckpt, not just the model part. My ugly solution is create a new ckpt, and copy all options except the model from the original ckpt to new new ckpt, and set the pruned model to the new ckpt.

@glenn-jocher
Copy link
Member Author

@relaxtheo hi,

The error may be caused by how the model saves in the detect.py file. In YOLOv5, the .pt file is a checkpoint that contains the whole model, not just the model part. Therefore, when you save a pruned model, you're saving a checkpoint file that still contains the original unpruned parameters, which can cause issues with loading the pruned model.

One solution could be to create a new checkpoint file and manually copy all options except the model from the original checkpoint to the new checkpoint. Then, you can set the pruned model to the new checkpoint. This could help ensure that the pruned model is loaded correctly in val.py.

Alternatively, you could try using the latest version of YOLOv5, which may have some updates related to model pruning and loading. You can also check the saved model and make sure that it only contains the pruned weights and not the original unpruned weights.

I hope this helps! Let me know if you have any further questions.

@relaxtheo
Copy link

After the model

@relaxtheo hi,

The error may be caused by how the model saves in the detect.py file. In YOLOv5, the .pt file is a checkpoint that contains the whole model, not just the model part. Therefore, when you save a pruned model, you're saving a checkpoint file that still contains the original unpruned parameters, which can cause issues with loading the pruned model.

One solution could be to create a new checkpoint file and manually copy all options except the model from the original checkpoint to the new checkpoint. Then, you can set the pruned model to the new checkpoint. This could help ensure that the pruned model is loaded correctly in val.py.

Alternatively, you could try using the latest version of YOLOv5, which may have some updates related to model pruning and loading. You can also check the saved model and make sure that it only contains the pruned weights and not the original unpruned weights.

I hope this helps! Let me know if you have any further questions.

Thank you very much for the reply.

I am currently using v6.2, compare to the latest code, the prune method has no change, and seems a bit change in attempt_load function.

But what makes me confusing is what I can gain from this pruning? Seems model file size has no change, parameters number keeps same, inference speed has no change, so it seems I can only get a worse model with low inference performance without any gain

@glenn-jocher
Copy link
Member Author

@relaxtheo thank you for your response.

Model pruning can help reduce the computation required for inference by removing redundant and unnecessary parameters from the model. Although the file size and number of parameters may not change significantly, the inference speed can be improved if the pruning is performed correctly.

However, the effectiveness of pruning may depend on the specific model architecture and the amount of pruning applied. It's possible that in your case, the pruning method didn't achieve significant improvements in speed or performance.

If you're looking to improve the performance of your model, you may want to try other optimization techniques such as quantization or knowledge distillation. These methods can help reduce the size and computation required for inference, resulting in faster and more efficient models.

I hope this helps! If you have any further questions or concerns, please let me know.

@relaxtheo
Copy link

@relaxtheo thank you for your response.

Model pruning can help reduce the computation required for inference by removing redundant and unnecessary parameters from the model. Although the file size and number of parameters may not change significantly, the inference speed can be improved if the pruning is performed correctly.

However, the effectiveness of pruning may depend on the specific model architecture and the amount of pruning applied. It's possible that in your case, the pruning method didn't achieve significant improvements in speed or performance.

If you're looking to improve the performance of your model, you may want to try other optimization techniques such as quantization or knowledge distillation. These methods can help reduce the size and computation required for inference, resulting in faster and more efficient models.

I hope this helps! If you have any further questions or concerns, please let me know.

Thank you very much!

@glenn-jocher
Copy link
Member Author

@relaxtheo hi there,

Thanks for sharing your experience with model pruning in YOLOv5. While model pruning aims to reduce the computation required for inference by removing redundant and unnecessary parameters, the effectiveness of pruning may depend on various factors, including the specific model architecture and the amount of pruning applied. Therefore, it's possible that in your case, the pruning method you used didn't achieve significant improvements in speed or performance.

If you're looking to further optimize your model, you may want to consider other approaches such as quantization or knowledge distillation. These optimization techniques can help reduce the size and computation required for inference, resulting in faster and more efficient models.

Please let us know if you have any further questions or concerns. We're here to help!

Best, [Your name/Team name]

@bryanbocao
Copy link

bryanbocao commented May 7, 2023

After the model

@relaxtheo hi,
The error may be caused by how the model saves in the detect.py file. In YOLOv5, the .pt file is a checkpoint that contains the whole model, not just the model part. Therefore, when you save a pruned model, you're saving a checkpoint file that still contains the original unpruned parameters, which can cause issues with loading the pruned model.
One solution could be to create a new checkpoint file and manually copy all options except the model from the original checkpoint to the new checkpoint. Then, you can set the pruned model to the new checkpoint. This could help ensure that the pruned model is loaded correctly in val.py.
Alternatively, you could try using the latest version of YOLOv5, which may have some updates related to model pruning and loading. You can also check the saved model and make sure that it only contains the pruned weights and not the original unpruned weights.
I hope this helps! Let me know if you have any further questions.

Thank you very much for the reply.

I am currently using v6.2, compare to the latest code, the prune method has no change, and seems a bit change in attempt_load function.

But what makes me confusing is what I can gain from this pruning? Seems model file size has no change, parameters number keeps same, inference speed has no change, so it seems I can only get a worse model with low inference performance without any gain

@relaxtheo I think the current pruning method is specifically "unstructured pruning" (correct me if I am wrong) where filters with small weight magnitudes are set to 0s, but they are still stored in the model weight file (i.e. <model>.pth) and those zero values are not actually removed which still take some space in the disk. That's why the model file size is not changed. During inference, unless the code has an explicit way to accelerate like skipping those zeros, it will still do the same amount of computation on those parameters with zero values. But the advantage is that I treat it as an efficient way to estimate how the model performance can preserve and the potential to accelerate, so that I know when to actually prune the model in the next step.

The thing you are looking for might be "structure pruning" (https://github.com/VainF/Torch-Pruning) that actually removes those zeros after pruning to save both space and time, but it is not easy to implement due to the dependency among layers in various network architectures.

@glenn-jocher
Copy link
Member Author

@bryanbocao hi there,

Thank you for reaching out. You are correct that the current pruning method in YOLOv5 uses unstructured pruning, where filters with small weight magnitude are set to 0s, while they are still stored in the weight file. As a result, the model file size may not change significantly, and inference speed may not be improved unless the code has an explicit way to accelerate like skipping those zeros.

Structure pruning, on the other hand, removes those zeros after pruning to save both space and time. However, implementing structure pruning may not be easy due to the dependency among layers in various network architectures.

We appreciate your feedback on this issue, and we'll keep it in mind as we continue to improve YOLOv5. If you have any further questions or concerns, don't hesitate to let us know.

Best, [Your name/Team name]

@relaxtheo
Copy link

@bryanbocao @glenn-jocher Thank you all very much, I will try your recommendations

@glenn-jocher
Copy link
Member Author

@relaxtheo Thank you for reaching out, and we're glad to hear that our recommendations were helpful. Don't hesitate to let us know if you have any further questions or concerns. We're here to help!

@Mary14-design
Copy link

yolo

It may be a bit unrelated but I am similar error while trying to do training. I am still new to the yolo models. Can you help me please with solving it?

@glenn-jocher
Copy link
Member Author

@Mary14-design it seems like there might be an issue with the image link you've provided; it's not displaying correctly. However, I'm here to help you with your training issue. Could you please provide more details about the error message you're encountering during training with YOLOv5? This will help me understand the problem better and assist you accordingly. If you can copy and paste the error message or describe the issue in more detail, that would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests