-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Feature Request] Support fp16 for C Predict API #14159
Comments
Hi, are you planning to train on C++? Currently, we do have Python supporting float16 if you initialize the dtype to that. |
interested as well. |
Grab someone may know about C++: @leleamol here |
@lanking520 I use python to train a mxnet model (such as resnet-50 ) with |
I use example/image-classification/predict-cpp/image-classification-predict.cc to infer this trained fp16 model, but it can not work and shows error:
|
have you solved it?
|
@IvyGongoogle |
@leleamol so @IvyGongoogle did use an example which uses |
Indeed, I concur, there is a need for fp16 computation, as shown in this forum thread. https://discuss.mxnet.io/t/network-in-float16/3710 |
Hi, Once the conversion pass is added in the backend this should be easy to do. I am currently working on this: #14584 . Stay tuned. |
@anirudh2290 thanks a lot for the effort. very interested as well in float16 inference in C++ API |
I modify the
Then I can successfully infer this trained fp16 model using C++ api, and the speed is twice faster then with the fp32 when I infer a resentv1-50 cnn model . But when I infer a ocr recognition model, I find the speed is same with the fp32. What causes this ? |
@IvyGongoogle Are your inputs and weights in fp16. Your change should work to run fp16 inference. What batch size are you using and what is the model ? For smaller batch sizes you may not see a big speedup. Also what hardware are you running it on ? |
@IvyGongoogle @anirudh2290 just to make sure are we talking about the real C++ API as documented in https://github.com/apache/incubator-mxnet/tree/master/cpp-package or https://github.com/apache/incubator-mxnet/tree/master/src/c_api ? Could you guys maybe elaborate a bit on the backend process ? I'm missing the point regarding why this API does not support FP16 as the whole backend of MXNet (Python being bound to this) is based on C/C++... very appreciated |
The backend of MXNet is exposed via C APIs for different modules like ndarray, executor, symbol, dependency engine etc. The frontend bindings implement wrappers which internally call the C API to support different frontends. CPP-Package is a frontend binding just like Python. Thus once the support is added in backend you should have the C API for frontends to call, but the frontend interface for example |
@anirudh2290 why does the float16 inference still work with Python if that is exactly a wrapper to the C API? as seen on https://mxnet.incubator.apache.org/versions/master/faq/float16.html |
if you see the mixed precision doc it asks you to modify symbol code and add cast layers at the start and before softmax. This probably can be done in another frontends too if you are writing the model by yourself. But in most cases you are loading a pre-trained model and if you have a pre-trained model there is no easy way to do this. Also, it gets more complicated when you want to introduce different cast layers not just at start and before softmax but in other places in the computation graph, and you want to customize it to test accuracy. (See the PR on AMP here: #14173 , it has specific customizable lists for ops to run in fp16 or fp32 https://github.com/apache/incubator-mxnet/pull/14173/files#diff-b79bfa3e02355c43ca5b195ef67172a5) |
I can give a quick summary of my experience with fp16 and C++ so far: I believe the best way to do this is as in other front-ends such as python. As @anirudh2290 mentions, add casts around numerically sensitive and insensitive sections (batchnorms and softmaxs for example should be fp32, convs and FC should be fp16) and then make sure your inputs are in fp16. You should then be able to run inference as normal (and you should see that fp16 operations are properly running). One caveat is that depending on your graph the time spent casting inputs may be more than the time you save using fp16. That's where AMP and TensorRT integration can help. They'll fuse many operators that are numerically sensitive, removing them from the computation graph, which means you'll get larger sections of a graph that you can run in fp16 mode. They'll also fuse casting operations into numerical operations which saves you from doing two full memory copies on your tensors when casting. These methods should be a much more practical way of running fp16 for inference (with C++). |
@anirudh2290 @KellenSunderland Sorry, my test results show that when using fp16, the speed is twice faster then with the fp32 when inference a resentv1-50 cnn model, but not works with ocr recognition model. I have updated the above comment. But if you have experiences with ocr inference using mxnet fp16, please give me some advises. |
@leleamol those are two different APIs, correct? |
But TensorRT doesn't support dynamic/variable input shape, right? |
@IvyGongoogle when i change c_predict_api.cc as you mentioned, i find the inference result is different with python's. what is the resason. |
@xizi I can get the same result. Please try it more. |
@IvyGongoogle , this is the python inference result |
@IvyGongoogle i have solved it by add a cast layer after the final ouput layer. |
hello, there is some materials about how to train a mxnet model with fp16,
but I can not find how to infer a batch data using fp16 by c++ api, can you give some advises?
looking forward to your reply.
The text was updated successfully, but these errors were encountered: