Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: EfficientAd is slower than other models in anomalib #2150

Open
1 task done
haimat opened this issue Jun 24, 2024 · 5 comments
Open
1 task done

[Bug]: EfficientAd is slower than other models in anomalib #2150

haimat opened this issue Jun 24, 2024 · 5 comments

Comments

@haimat
Copy link

haimat commented Jun 24, 2024

Describe the bug

I had the impression that the EfficientAd model would be among the fastest in anomalib in terms of prediction times. To verify that I have trained three models, Padim, Fastflow, and EfficientAd, all with the same training data and an image dimension of 512x512 pixels. Then I have written a small script that loads these models, warms up the GPU, and then runs prediction on 100 images. I measure only the model forward time, no image loading or any pre- or post-processing.

With the models exported to ONNX I get these results (avg. model forwards times on 100 images):

  • EfficientAd: 0.0116 sec.
  • Fastflow: 0.0053 sec.
  • Padim: 0.0036 sec.

So in other words: The EfficientAd model is the slowest from these three, and Padim the fastest - I thought it would be the other way round. Am I missing something, or is this a bug in anomalib?

Dataset

Other (please specify in the text field below)

Model

Other (please specify in the field below)

Steps to reproduce the behavior

I trained three models on the same dataset, then predict 100 images with each of them and measure the avg. model forward / inferencing time, without pre- or post-processing.

OS information

OS information:

  • OS: Ubuntu 22.04
  • Python version: Python 3.10.12
  • Anomalib version: 1.1.0
  • PyTorch version: 2.2
  • CUDA/cuDNN version: 12.2
  • GPU models and configuration: 4x Nvidia A6000
  • Any other relevant information: I am using a custom dataset

Expected behavior

I would expect the EfficientAd net to be considerable faster than the other models.

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

-

Logs

-

Code of Conduct

  • I agree to follow this project's Code of Conduct
@alexriedel1
Copy link
Contributor

alexriedel1 commented Jun 25, 2024

Hi,
can you show how you measure the timing?
In plain pytorch on 256x256 images I have the following speed measurements on a GTX 1660 Ti:
Padim: 6.4ms
EfficientAD S: 64.1ms
Fastflow: 33ms

When measuring this implementation of EfficientAD which claims to reach the paper result timing stats, I get the same speed of 64ms per image on my GPU. This makes me think that the speed of EfficientAD in anomalib isn't slower as it should be.

The authors of EfficientAD state that
For each method, we remove unnecessary parts for the timing, such as the computation of losses during inference, and use float16 precision for all networks. Switching from float32 to float16 for the inference of EfficientAD does not change the anomaly detection results for the 32 anomaly detection scenarios evaluated in this paper. In latency-critical applications, padding in the PDN architecture of EfficientAD can be disabled. This speeds up the forward pass of the PDN architecture by 80 µs without impairing the detection of anomalies. We time EfficientAD without padding and therefore report the anomaly detection results for this setting in the experimental results of this paper

So you should be sure to set padding=False and use half-precision. Especially half-precision matters for some kinds of GPUs

@alexriedel1
Copy link
Contributor

alexriedel1 commented Jun 26, 2024

I was curious and made some more experiments. half precision really matters for example for a T4 GPU.
anomalib EfficientAD refers to the anomalib implementation, nelson refers to this implementation


256 x 256 image size
Anomalib EfficientAD S full precision 24ms
Anomalib EfficientAD S half precision 8.9ms
nelson EfficientAD S full precision 21.5ms
nelson EfficientAD S half precision 7.4ms

Anomalib Fastflow half precision Resnet18 25.23ms
Anomalib Fastflow full precision Resnet18 23.37ms

512 x 512 image size
Anomalib EfficientAD S full precision 161ms
Anomalib EfficientAD S half precision 30ms
nelson EfficientAD S full precision 153ms
nelson EfficientAD S half precision 27ms

Anomalib Fastflow half precision Resnet18 26.1ms
Anomalib Fastflow full precision Resnet18 24.9ms

Anomalib Fastflow full precision Resnet50 108ms
Anomalib Fastflow half precision Resnet50 52ms

What I assume from these results (and isn't big news): Half precision matters especially for convolution intensive models. Image size matters. The choice of GPU matters. The EfficientAD authors might not have made a fair comparison between the models because I have the feeling they didn't use half precision for all the others they compare their inference speed with.

@haimat
Copy link
Author

haimat commented Jun 26, 2024

@alexriedel1 Thanks for your response.
I will try to reproduce your experiments and get back here to you soon!

@haimat
Copy link
Author

haimat commented Jul 4, 2024

@alexriedel1 When I look at your testing results, it is clear that regardless of the image size even with half precision the EfficientAd model is slower than the full precision Fastflow model. That is quite a surprise ...

I have exported my models to ONNX and then converted to TensorRT on Nvidia.
For the latter I have enabled half precision.

How have you turned on or off the half precision mode?
Also, how can you define the Resnet type for the Fastlow model?

@blaz-r
Copy link
Contributor

blaz-r commented Jul 18, 2024

I think that half precision makes quite a big difference due to tensor cores that operate with FP16. I'm not sure if having the data in fp32 even guarantees that it's not turned to fp16 (either by pytorch or cuda) for tensor cores behind the scenes. However, it really seems like the speed greatly depends on the image size, and with compute heavy models the GPU plays quite a big role as well (H100 for example has significantly faster tensor cores that work with FP16).

The EfficientAD authors might not have made a fair comparison between the models because I have the feeling they didn't use half precision for all the others they compare their inference speed with.

I'm not sure how exactly they did that, but I think every model they used can be set to FP16, BUT some really don't benefit much from that (probably due to tensor cores mentioned above that mostly do MMA operations).

To answer the other two questions:
I think you can put the model to fp16 by simply using model.to(torch.float16). Most of models just need that, some I think also need manually placing some of variables inside the model to float16.

For FastFlow model, you can specify the ResNet model either by config file or by passing the backbone name to constructor:

def __init__(
self,
backbone: str = "resnet18",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants