[IE CLDNN] Improve network outputs detection in quantized FP16+INT8 IR to avoid converting them to FP16 precision #3407
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fallback to FP16 for non-quantized layers in quantized FP16+INT8 IR introduced in #941 shouldn't happen for network outputs. However, mechanism used to detect them was only checking if given layer has no next layers - it does not take into account the fact that even network output layers can still be used in other parts of the graph (e.g. 1st output form TopK primitive may become network output, while 2nd output from the same primitive may still be used in the graph). As a result, such FP32 outputs will be converted to FP16 precision inside clDNN, and since they were forced to have FP32 precision during model read, we end up with memory data type misalignment error.
This patch is meant to improve mechanism used to detection of network outputs for such cases to avoid converting them to FP16 precision.
Alternatively, we can keep current IE network output detection mechanism and make sure that the precision chosen for clDNN output primitives will be the same as forced during model read, something along the lines of #3405.
JIRA: CVS-43902