This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Enhance gpu quantization #14094
Merged
Merged
Enhance gpu quantization #14094
Changes from 2 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
17415db
enhance gpu quantization
a240ec9
fix test and improve error message
519da15
resolve conflict
6b06d50
add check srctype to quantized_conv.cu
00d8099
improve infer type
44d959e
fix lint
ab68668
add dtype check in quantize
8406726
revert check in python level and quantized_conv
7b43c18
Revert "add dtype check in quantize"
71999ef
Merge remote-tracking branch 'upstream/master' into enhance-gpu-quant…
845c063
add dtype check in quantize
9c44eb5
fix quantize test case
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -450,6 +450,16 @@ def get_fp32_sym_with_multiple_outputs(length=1): | |
@with_seed() | ||
def test_quantize_model(): | ||
def check_quantize_model(qdtype): | ||
if is_test_for_native_cpu(): | ||
print('skipped testing quantize_model for native cpu since it is not supported yet') | ||
return | ||
elif qdtype == 'int8' and is_test_for_mkldnn(): | ||
print('skipped testing quantize_model for mkldnn cpu int8 since it is not supported yet') | ||
return | ||
elif qdtype == 'uint8' and is_test_for_gpu(): | ||
print('skipped testing quantize_model for gpu uint8 since it is not supported yet') | ||
return | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add else clause. |
||
|
||
def check_params(params, qparams, qsym=None): | ||
if qsym is None: | ||
assert len(params) == len(qparams) | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to add this error to backend like in the case for MKLDNN with int8 so that we dont have to add error handling to other frontends when we support quantization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, only python frontend support quantization and in fact calibration progress will not use backend specific quantized operator. So I think it's good to add error message in this place currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In
QuantizeCompute
(quantize-inl.h
) you can check if std::is_same<xpu,gpu>::value and check for param.out_type and throw exception.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this modification can work since infer type error
mxnet.base.MXNetError: [02:07:55] /home/ubuntu/experimentals/1.4_release/src/operator/quantization/../tensor/matrix_op-inl.h:250: Check failed: src.type_flag_ == ret.type_flag_ (3 vs. 5)
will occur beforeQuantizeCompute
and we cannot get the ctx information duringinfer
stage. So I think it's good to interrupt this action during the calibration stage.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnt that called from the forward pass of quantized_conv ? The quantize forward pass should execute before this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add
check src_type
inquantized_conv.cu
, please take a review again.