-
Couldn't load subscription status.
- Fork 0
[OVQuantizer] Apply Fixes and Integrate into the Llama Example Workflow #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OVQuantizer] Apply Fixes and Integrate into the Llama Example Workflow #9
Conversation
| :param target_node: FX node representing a weighted operation (e.g., Linear, Conv). | ||
| :param nncf_graph: NNCFGraph used to determine weight port indices. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| def _get_weight_edge( | ||
| target_node: torch.fx.Node, | ||
| nncf_graph: NNCFGraph, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ): | |
| ) -> tuple[torch.fx.Node, torch.fx.Node]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| :param graph: The underlying FX graph. | ||
| :param nncf_graph: The corresponding NNCF graph. | ||
| :param node_vs_torch_annotation: A mapping of FX nodes to quantization annotations. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| model: torch.fx.GraphModule, | ||
| graph: torch.fx.Graph, | ||
| nncf_graph: NNCFGraph, | ||
| node_vs_torch_annotation: DefaultDict[torch.fx.Node, QuantizationAnnotation], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please create the defaultdicts in each function separately and remove the node_vs_torch_annotation parameter?
| else: | ||
| return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| else: | |
| return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype) | |
| return INT8SymmetricWeightsDecompressor(scale, original_weight.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| q_weight: torch.Tensor, | ||
| original_weight: torch.Tensor, | ||
| ) -> BaseWeightsDecompressor: | ||
| if zero_point is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we invert the condition here? IMHO is None is clearer than is not None :)
| if zero_point is not None: | |
| if zero_point is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| q_weight: torch.Tensor, | ||
| original_weight: torch.Tensor, | ||
| ) -> BaseWeightsDecompressor: | ||
| if zero_point is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comment as above regarding the condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| observer: Type[UniformQuantizationObserverBase] | ||
|
|
||
| extra_args: Dict[str, Any] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the wc_param as an actual keyword here. A dict is not needed here
| observer: Type[UniformQuantizationObserverBase] | |
| extra_args: Dict[str, Any] = {} | |
| observer: Type[WeightObserverBase] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, Done
| ) | ||
| return QuantizationSpec( | ||
| dtype=dtype, | ||
| observer_or_fake_quant_ctr=observer.with_args(**extra_args), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call the constructor directly here?
| return qnn_quantizer, quant_dtype | ||
|
|
||
|
|
||
| def get_ov_quantizer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ignored scope in this function is very-very model specific. I suggest to name this function get_ov_quantizer_for_modelname and to add a small docstring to it
Co-authored-by: Daniil Lyakhov <[email protected]>
21c43fe
into
cavusmustafa:openvino_llama_support
Summary
OpenVINO Quantizer is refactored and mixed precision by manually setting ignored scope is added.
To use this openvino quantizer path,
--pt2e_quantize openvino_8da4wcan be used for INT4 weight compression and--pt2e_quantize openvino_8da8wfor INT8 weight compression.