Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing batch size does not improve efficiency #65

Open
mrpositron opened this issue Aug 2, 2022 · 3 comments
Open

Increasing batch size does not improve efficiency #65

mrpositron opened this issue Aug 2, 2022 · 3 comments
Assignees

Comments

@mrpositron
Copy link

mrpositron commented Aug 2, 2022

When I increase batch size, the inference time on TensorRT does not change. Basically if inference time on the batch with size 8 took 20ms. Inference on batch size 16 just takes 40ms. I am not sure why this is happening ...

I have converted EfficientNet backbone from TF to ONNX, and then to TensorRT. In TF I specified the batch size as follows:

# save backbone model w/ full signature!

@tf.function()
def my_predict(my_prediction_inputs, **kwargs):
    prediction = mod(my_prediction_inputs, training=False)
    return {"prediction": prediction}

my_signatures = my_predict.get_concrete_function(
   my_prediction_inputs=tf.TensorSpec([batch_size, 256, 256, 3], dtype=tf.float32, name="image")
)

tf.saved_model.save(mod, bbone_name, signatures=my_signatures)

Converting TensorFlow model to ONNX

$ python -m tf2onnx.convert --saved-model mods/effnet-l/bbone --output mods/effnet-l/bbone.onnx

Converting ONNX model to TensorRT and saving it.

import engine as eng
import argparse
from onnx import ModelProto
import tensorrt as trt

base_dir = "mods/effnet-l"
# base_dir = "mods/resnet152/"
onnx_path = base_dir+"/bbone.onnx"
engine_name =  base_dir+"/bbone.plan"

batch_size = 8

model = ModelProto()
with open(onnx_path, "rb") as f:
    model.ParseFromString(f.read())

shape = [batch_size, 256, 256, 3]

engine = eng.build_engine(onnx_path, shape=shape)
eng.save_engine(engine, engine_name) 

Here is an inference code for TensorRT.

Everything works properly. The problem is the speed. Basically, if I increase batch size twice it will just increase inference time twice. Thus, it is not changing total inference time.

std::vector<float> EffnetBBone::convert_mat_to_fvec(cv::Mat mat)
{
    std::vector<float> array;
    if (mat.isContinuous())
    {
        array.assign((float *)mat.data, (float *)mat.data + mat.total() * mat.channels());
    }
    else
    {
        for (int i = 0; i < mat.rows; ++i)
        {
            array.insert(array.end(), mat.ptr<float>(i), mat.ptr<float>(i) + mat.cols * mat.channels());
        }
    }
    return array;
}



EffnetBBone::EffnetBBone(std::string base_dir, bool half_precision)
{
    onnx_net = new Trt();
    if (half_precision)
    {
        onnx_net->EnableFP16();
    }
    onnx_net->BuildEngine(base_dir + "/bbone.onnx", base_dir + "/bbone.plan")
    onnx_net->SetLogLevel((int)Severity::kINTERNAL_ERROR);
    
}

std::vector<float> EffnetBBone::run_batch(std::vector<cv::Mat> batch_img, bool normalized)
{
    cv::Mat crop;
    std::vector<float> batch_fvec;
    int size = batch_img.size() * (327680 / 4) ;
    std::vector<float> output(size);
    for (int i = 0; i < batch_img.size(); i++)
    {
        std::vector<float> fvec;
        crop = batch_img[i];
        cv::Mat img_f32;
        
        crop.convertTo(img_f32, CV_32F);
        if (normalized == false){
            img_f32 = img_f32 / 256.f;
        }   
        fvec = convert_mat_to_fvec(img_f32);
        batch_fvec.insert(batch_fvec.end(), fvec.begin(), fvec.end());
    }
    
    onnx_net->CopyFromHostToDevice(batch_fvec, inputBindIndex);
    bool state = onnx_net->Forward();
    assert(state == true);
    onnx_net->CopyFromDeviceToHost(output, outputBindIndex);
    return output;
}

Screenshots
If applicable, add screenshots to help explain your problem.

System environment (please complete the following information):

  • Device: GeForce RTX 3090
  • OS: Ubuntu 20.04
  • Driver version: 470.103.01
  • CUDA version: 11.2
  • TensorRT version: 8.4.0
  • Others:
@zerollzeng
Copy link
Owner

When I increase batch size, the inference time on TensorRT does not change.
-> With onnx, TensorRT uses explicit batch, which means if you want to use dynamic batch size, in your onnx model the batch dimension must be unknown, and you need to set the optimization profile for the inputs. Before calling Forward(), you need to set the profile for enqueue.

@mrpositron
Copy link
Author

Thanks for your reply!

But I am not using the dynamic batch size. I specify the batch size when I convert the model.

@zerollzeng
Copy link
Owner

set batch size won't work for the onnx model, it's only for caffe and uff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants