Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about padFilterWeights op. #3740

Closed
theNefelibata opened this issue Mar 26, 2024 · 12 comments
Closed

Question about padFilterWeights op. #3740

theNefelibata opened this issue Mar 26, 2024 · 12 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@theNefelibata
Copy link

There are many nvifer1:: rt:: cuda:: padFilterWeights calls in my model, and I found that this is before conv2d op. I want to know what this function do and if there is any way to avoid it? thank you.

@zerollzeng
Copy link
Collaborator

Where did you observe the call? build phase or inference phase? from the name looks like it just pad the weights so that it can fit the format require by a performant kernel, which should be needed.

@zerollzeng zerollzeng self-assigned this Mar 28, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Mar 28, 2024
@theNefelibata
Copy link
Author

I observed this call during the inference phase, and I think it should be because I used the weights and bias of conv2d as inputs to the model, so there is this call before each conv2d layer. Is there a way to do this operation in advance?

@zerollzeng
Copy link
Collaborator

@nvpohanh Is this expected?

@nvpohanh
Copy link
Collaborator

nvpohanh commented Apr 7, 2024

@theNefelibata Could you make the weights/bias constants? Or do they have to be network inputs?

@theNefelibata
Copy link
Author

@theNefelibata Could you make the weights/bias constants? Or do they have to be network inputs?

they have to be inputs.

@nvpohanh
Copy link
Collaborator

nvpohanh commented Apr 8, 2024

Then the padFilterWeights kernels are expected because we need to pad the weights for the Conv kernels to run. If the weights were constants, that could have been done offline.

@theNefelibata
Copy link
Author

Then the padFilterWeights kernels are expected because we need to pad the weights for the Conv kernels to run. If the weights were constants, that could have been done offline.

Can I do this operation manually?

@nvpohanh
Copy link
Collaborator

nvpohanh commented Apr 8, 2024

they have to be inputs.

If the weights have to be network inputs, is it because you need to change the weights for each inference? Or do you only need to change the weights once and then run multiple inferences with the same set of weights?

If the use case is the latter, then I would recommend using the Refit feature instead of marking weights as network inputs: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#refitting-engine-c

This would allow you to refit the weights once at runtime and then run multiple inferences with the refitted weights without the need to padFilterWeights for every inference

@theNefelibata
Copy link
Author

I have tried Refit, which can affect the inference speed

@zerollzeng
Copy link
Collaborator

I'm interested about in what user case that the weights has to be changed in each inference, @theNefelibata could you please share you use case? Thanks!

@theNefelibata
Copy link
Author

I'm interested about in what user case that the weights has to be changed in each inference, @theNefelibata could you please share you use case? Thanks!

I am trying alternative solutions of Refit.

@zerollzeng
Copy link
Collaborator

Got it, thanks! Can we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants