Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCS Update optimization docs with NNCF PTQ changes and deprecation of POT #17398

Merged
merged 99 commits into from
May 19, 2023
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
bdecc94
Update model_optimization_guide.md
MaximProshin May 6, 2023
16c2815
Update model_optimization_guide.md
MaximProshin May 6, 2023
f22f6c7
Update model_optimization_guide.md
MaximProshin May 6, 2023
f516b36
Update model_optimization_guide.md
MaximProshin May 6, 2023
c47cd01
Update model_optimization_guide.md
MaximProshin May 6, 2023
cb4dd95
Update model_optimization_guide.md
MaximProshin May 6, 2023
a7ced37
Update model_optimization_guide.md
MaximProshin May 6, 2023
d993aa1
Update home.rst
MaximProshin May 6, 2023
23388ee
Update ptq_introduction.md
MaximProshin May 6, 2023
40ecdb2
Update Introduction.md
MaximProshin May 6, 2023
cf8fb1f
Update Introduction.md
MaximProshin May 6, 2023
01cf75e
Update Introduction.md
MaximProshin May 6, 2023
4f94f65
Update ptq_introduction.md
MaximProshin May 6, 2023
1fd7027
Update ptq_introduction.md
MaximProshin May 6, 2023
50c4d0b
Update basic_quantization_flow.md
MaximProshin May 6, 2023
7aaac92
Update basic_quantization_flow.md
MaximProshin May 6, 2023
4a7e9ea
Update basic_quantization_flow.md
MaximProshin May 7, 2023
4c396d9
Update quantization_w_accuracy_control.md
MaximProshin May 7, 2023
a11b566
Update quantization_w_accuracy_control.md
MaximProshin May 7, 2023
f50bff7
Update quantization_w_accuracy_control.md
MaximProshin May 7, 2023
2886ed4
Update quantization_w_accuracy_control.md
MaximProshin May 7, 2023
3956dd0
Update quantization_w_accuracy_control.md
MaximProshin May 7, 2023
180e0f5
Update quantization_w_accuracy_control.md
MaximProshin May 7, 2023
c9dee08
Update quantization_w_accuracy_control.md
MaximProshin May 7, 2023
d19b2dd
Update quantization_w_accuracy_control.md
MaximProshin May 7, 2023
ab5cc02
Update quantization_w_accuracy_control.md
MaximProshin May 7, 2023
227f294
Update basic_quantization_flow.md
MaximProshin May 7, 2023
a709d4e
Update basic_quantization_flow.md
MaximProshin May 7, 2023
83ba861
Update quantization_w_accuracy_control.md
MaximProshin May 7, 2023
178ce95
Update basic_quantization_flow.md
MaximProshin May 7, 2023
46d8a6d
Update basic_quantization_flow.md
MaximProshin May 7, 2023
65d096a
Update model_optimization_guide.md
MaximProshin May 8, 2023
94410aa
Update ptq_introduction.md
MaximProshin May 8, 2023
a3e2d93
Update quantization_w_accuracy_control.md
MaximProshin May 8, 2023
73e5415
Update model_optimization_guide.md
MaximProshin May 8, 2023
31e3260
Update quantization_w_accuracy_control.md
MaximProshin May 8, 2023
1bdc5d5
Update model_optimization_guide.md
MaximProshin May 8, 2023
3438758
Update quantization_w_accuracy_control.md
MaximProshin May 8, 2023
426a3f7
Update model_optimization_guide.md
MaximProshin May 8, 2023
80bb362
Update Introduction.md
MaximProshin May 8, 2023
e053339
Update basic_quantization_flow.md
MaximProshin May 8, 2023
c0263a3
Update basic_quantization_flow.md
MaximProshin May 8, 2023
f44e9fa
Update quantization_w_accuracy_control.md
MaximProshin May 8, 2023
d4ee04d
Update ptq_introduction.md
MaximProshin May 9, 2023
15975ba
Update Introduction.md
MaximProshin May 9, 2023
a8e23c5
Update model_optimization_guide.md
MaximProshin May 11, 2023
adc2fb6
Update basic_quantization_flow.md
MaximProshin May 11, 2023
4931ac2
Update quantization_w_accuracy_control.md
MaximProshin May 11, 2023
20ad788
Update quantization_w_accuracy_control.md
MaximProshin May 11, 2023
8e4a1da
Update quantization_w_accuracy_control.md
MaximProshin May 11, 2023
75c9cff
Update Introduction.md
MaximProshin May 11, 2023
3484315
Update FrequentlyAskedQuestions.md
MaximProshin May 11, 2023
abfff0b
Update model_optimization_guide.md
MaximProshin May 11, 2023
3d7e028
Update Introduction.md
MaximProshin May 11, 2023
d4a330b
Update model_optimization_guide.md
MaximProshin May 11, 2023
86ee0cb
Update model_optimization_guide.md
MaximProshin May 11, 2023
8811267
Update model_optimization_guide.md
MaximProshin May 12, 2023
365dba9
Update model_optimization_guide.md
MaximProshin May 12, 2023
c2bd494
Update model_optimization_guide.md
MaximProshin May 12, 2023
992532e
Update ptq_introduction.md
MaximProshin May 12, 2023
aeede28
Update ptq_introduction.md
MaximProshin May 12, 2023
9510506
added code snippet (#1)
alexsu52 May 15, 2023
f86efa5
Update basic_quantization_flow.md
MaximProshin May 15, 2023
c6f0627
Update basic_quantization_flow.md
MaximProshin May 15, 2023
cf3eb93
Update quantization_w_accuracy_control.md
MaximProshin May 15, 2023
f1eb2cc
Update basic_quantization_flow.md
MaximProshin May 15, 2023
db29bff
Update basic_quantization_flow.md
MaximProshin May 15, 2023
2984914
Update ptq_introduction.md
MaximProshin May 15, 2023
d633886
Update model_optimization_guide.md
MaximProshin May 15, 2023
d9a29f2
Update basic_quantization_flow.md
MaximProshin May 15, 2023
82b0b7b
Update ptq_introduction.md
MaximProshin May 15, 2023
0602ebc
Update quantization_w_accuracy_control.md
MaximProshin May 15, 2023
fa73991
Update basic_quantization_flow.md
MaximProshin May 15, 2023
0f1d08d
Update basic_quantization_flow.md
MaximProshin May 15, 2023
a0b31eb
Update basic_quantization_flow.md
MaximProshin May 16, 2023
e7041d6
Update ptq_introduction.md
MaximProshin May 16, 2023
2fdff8d
Update ptq_introduction.md
MaximProshin May 16, 2023
822db41
Delete ptq_introduction.md
MaximProshin May 16, 2023
276698d
Update FrequentlyAskedQuestions.md
MaximProshin May 16, 2023
1254af1
Update Introduction.md
MaximProshin May 16, 2023
b27a6aa
Update quantization_w_accuracy_control.md
MaximProshin May 16, 2023
66c13a3
Update introduction.md
MaximProshin May 16, 2023
0c97070
Update basic_quantization_flow.md code blocks
tsavina May 17, 2023
ca1bc8c
Update quantization_w_accuracy_control.md code snippets
tsavina May 17, 2023
005a759
Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py
MaximProshin May 18, 2023
6ed85ab
Update model_optimization_guide.md
MaximProshin May 19, 2023
e093d73
Optimization docs proofreading (#2)
tsavina May 19, 2023
68ac01a
Update basic_quantization_flow.md
MaximProshin May 19, 2023
578e148
Update quantization_w_accuracy_control.md
MaximProshin May 19, 2023
6462797
Update images (#3)
tsavina May 19, 2023
22d2a98
Update model_optimization_guide.md
MaximProshin May 19, 2023
6da75da
Update docs/optimization_guide/nncf/ptq/code/ptq_tensorflow.py
MaximProshin May 19, 2023
ade83fd
Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py
MaximProshin May 19, 2023
50cad15
Update docs/optimization_guide/nncf/ptq/code/ptq_onnx.py
MaximProshin May 19, 2023
0c2a803
Update docs/optimization_guide/nncf/ptq/code/ptq_aa_openvino.py
MaximProshin May 19, 2023
ad5a3f7
Update docs/optimization_guide/nncf/ptq/code/ptq_openvino.py
MaximProshin May 19, 2023
ffcf095
table format fix
tsavina May 19, 2023
d7c9170
Update headers
tsavina May 19, 2023
068c2a6
Update qat.md code blocks
tsavina May 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/home.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ You can integrate and offload to accelerators additional operations for pre- and
Model Quantization and Compression
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Boost your model’s speed even further with quantization and other state-of-the-art compression techniques available in OpenVINO’s Post-Training Optimization Tool and Neural Network Compression Framework. These techniques also reduce your model size and memory requirements, allowing it to be deployed on resource-constrained edge hardware.
Boost your model’s speed even further with quantization and other state-of-the-art compression techniques available in OpenVINO’s Neural Network Compression Framework. These techniques also reduce your model size and memory requirements, allowing it to be deployed on resource-constrained edge hardware.

.. panels::
:card: homepage-panels
Expand Down
25 changes: 6 additions & 19 deletions docs/optimization_guide/model_optimization_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,37 +8,24 @@

ptq_introduction
tmo_introduction
(Experimental) Protecting Model <pot_ranger_README>


Model optimization is an optional offline step of improving final model performance by applying special optimization methods, such as quantization, pruning, preprocessing optimization, etc. OpenVINO provides several tools to optimize models at different steps of model development:
Model optimization is an optional offline step of improving the final model performance and reducing the model size by applying special optimization methods, such as 8-bit quantization, pruning, etc. OpenVINO offers two optimization paths implemented in `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf>`__:
AlexKoff88 marked this conversation as resolved.
Show resolved Hide resolved

- :doc:`Model Optimizer <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>` implements most of the optimization parameters to a model by default. Yet, you are free to configure mean/scale values, batch size, RGB vs BGR input channels, and other parameters to speed up preprocess of a model (:doc:`Embedding Preprocessing Computation <openvino_docs_MO_DG_Additional_Optimization_Use_Cases>`).
- :doc:`Post-training Quantization <ptq_introduction>` is designed to optimize the inference of deep learning models by applying the post-training 8-bit integer quantization that does not require model retraining or fine-tuning.

- :doc:`Post-training Quantization <pot_introduction>` is designed to optimize inference of deep learning models by applying post-training methods that do not require model retraining or fine-tuning, for example, post-training 8-bit integer quantization.
- :doc:`Training-time Optimization <tmo_introduction>`, a suite of advanced methods for training-time model optimization within the DL framework, such as PyTorch and TensorFlow 2.x. It supports methods like Quantization-aware Training, Structured and Unstructured Pruning, etc.

- :doc:`Training-time Optimization <nncf_ptq_introduction>`, a suite of advanced methods for training-time model optimization within the DL framework, such as PyTorch and TensorFlow 2.x. It supports methods, like Quantization-aware Training and Filter Pruning. NNCF-optimized models can be inferred with OpenVINO using all the available workflows.
.. note:: OpenVINO also supports optimized models (for example, quantized) from source frameworks such as PyTorch, TensorFlow, and ONNX (in Q/DQ format).
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved

Post-training Quantization is the fastest way to optimize a model and should be applied first, but it is limited in terms of achievable accuracy-performance trade-off. In this case, Training-time Optimization is an option.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved

Detailed workflow:
##################

To understand which development optimization tool you need, refer to the diagram:
Once the model is optimized using the aforementioned methods, it can be used for inference using the regular OpenVINO inference workflow. No changes to the inference code are required.

.. image:: _static/images/DEVELOPMENT_FLOW_V3_crunch.svg

Post-training methods are limited in terms of achievable accuracy-performance trade-off for optimizing models. In this case, training-time optimization with NNCF is an option.

Once the model is optimized using the aforementioned tools it can be used for inference using the regular OpenVINO inference workflow. No changes to the inference code are required.

.. image:: _static/images/WHAT_TO_USE.svg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this picture is confusing in document context.
There are two pictures one after another, which looks a bit like a draft. And it is not easy to distill message from this picture.
If I look at it I would think that training time quantization has best performance since pruning and sparsity are shown higher.
Secondly, methods are shown separately and perception that people can get is that it is either one or another.

I wonder if we should change the picture completely to show performance vs. accuracy chart or something similar? We can probably discuss verbally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pictures were updated in #17421 but I believe it doesn't address your comment. let's discuss how to change these pictures.


Post-training methods are limited in terms of achievable accuracy, which may degrade for certain scenarios. In such cases, training-time optimization with NNCF may give better results.

Once the model has been optimized using the aforementioned tools, it can be used for inference using the regular OpenVINO inference workflow. No changes to the code are required.

If you are not familiar with model optimization methods, refer to :doc:`post-training methods <pot_introduction>`.

Additional Resources
####################
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved

Expand Down
59 changes: 33 additions & 26 deletions docs/optimization_guide/nncf/ptq/basic_quantization_flow.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
Introduction
####################

The basic quantization flow is the simplest way to apply 8-bit quantization to the model. It is available for models in the following frameworks: PyTorch, TensorFlow 2.x, ONNX, and OpenVINO. The basic quantization flow is based on the following steps:
The basic quantization flow is the simplest way to apply 8-bit quantization to the model. It is available for models in the following frameworks: OpenVINO, PyTorch, TensorFlow 2.x, and ONNX. The basic quantization flow is based on the following steps:

* Set up an environment and install dependencies.
* Prepare the **calibration dataset** that is used to estimate quantization parameters of the activations within the model.
* Prepare the representative **calibration dataset** that is used to estimate quantization parameters of the activations within the model.
* Call the quantization API to apply 8-bit quantization to the model.

Set up an Environment
Expand All @@ -29,25 +29,25 @@ Install all the packages required to instantiate the model object, for example,
Prepare a Calibration Dataset
#############################

At this step, create an instance of the ``nncf.Dataset`` class that represents the calibration dataset. The ``nncf.Dataset`` class can be a wrapper over the framework dataset object that is used for model training or validation. The class constructor receives the dataset object and the transformation function. For example, if you use PyTorch, you can pass an instance of the ``torch.utils.data.DataLoader`` object.
At this step, create an instance of the ``nncf.Dataset`` class that represents the calibration dataset. The ``nncf.Dataset`` class can be a wrapper over the framework dataset object that is used for model training or validation. The class constructor receives the dataset object and the transformation function.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved

The transformation function is a function that takes a sample from the dataset and returns data that can be passed to the model for inference. For example, this function can take a tuple of a data tensor and labels tensor, and return the former while ignoring the latter. The transformation function is used to avoid modifying the dataset code to make it compatible with the quantization API. The function is applied to each sample from the dataset before passing it to the model for inference. The following code snippet shows how to create an instance of the ``nncf.Dataset`` class:

.. tab:: PyTorch
.. tab:: OpenVINO

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_torch.py
.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_openvino.py
:language: python
:fragment: [dataset]

.. tab:: ONNX
.. tab:: PyTorch

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_onnx.py
.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_torch.py
:language: python
:fragment: [dataset]

.. tab:: OpenVINO
.. tab:: ONNX

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_openvino.py
.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_onnx.py
:language: python
:fragment: [dataset]

Expand All @@ -57,30 +57,29 @@ The transformation function is a function that takes a sample from the dataset a
:language: python
:fragment: [dataset]


If there is no framework dataset object, you can create your own entity that implements the ``Iterable`` interface in Python and returns data samples feasible for inference. In this case, a transformation function is not required.
If there is no framework dataset object, you can create your own entity that implements the ``Iterable`` interface in Python, for example the list of images, and returns data samples feasible for inference. In this case, a transformation function is not required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably sounds nitpicky, but it seems the dataset should not return data samples feasible for inference, because they should not include the batch dimension (because that's added by the data loader) and OpenVINO inference fails without that added. I think it would be good to be explicit about this (and I would love an example that does exactly this: load a list of images).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Helena, I absolutely agree with you that for example the list of images is confusing.The main meaning of this sentence that nncf.Dataset supports for any iterable Python object over a dataset and user can implement or use any entity that implements Python iterable interface . If custom or framework dataset returns data samples feasible for inference then a transformation function is not required.

I would recommend to reformulate this sentence:

If there is no framework dataset object, you can create your own entity that implements the ``Iterable`` interface in Python. If dataset object returns data samples feasible for inference then a transformation function is not required.

I don't seen any reason to add example here because example will include the following code:

model_inputs = [...] # model(model_inputs[0])

calibration_dataset = nncf.Dataset(model_inputs)

Not sure if this will bring any additional information. If you think otherwise, we can do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helena-intel , if you have a suggestion how to better formulate it, feel free to share it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an example of such an iterable would be very useful, also because the PTQ OpenVINO example use an Ultralytics or PyTorch dataloader, and if you're not familiar with them, it may not be obvious what they return, how they handle batching, etc. I'll try to create the most basic NNCF PTQ example today. We can see how to incorporate that in either docs or examples, but that can be in a future PR.



Run a Quantized Model
Quantize a Model
#####################

Once the dataset is ready and the model object is instantiated, you can apply 8-bit quantization to it:

.. tab:: PyTorch
.. tab:: OpenVINO

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_torch.py
.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_openvino.py
:language: python
:fragment: [quantization]

.. tab:: ONNX
.. tab:: PyTorch

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_onnx.py
.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_torch.py
:language: python
:fragment: [quantization]

.. tab:: OpenVINO
.. tab:: ONNX

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_openvino.py
.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_onnx.py
:language: python
:fragment: [quantization]

Expand All @@ -93,14 +92,14 @@ Once the dataset is ready and the model object is instantiated, you can apply 8-

.. note:: The ``model`` is an instance of the ``torch.nn.Module`` class for PyTorch, ``onnx.ModelProto`` for ONNX, and ``openvino.runtime.Model`` for OpenVINO.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved

After that the model can be exported into th OpenVINO Intermediate Representation if needed and run faster with OpenVINO.
After that the model can be exported into the OpenVINO Intermediate Representation if needed and run faster with OpenVINO.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved

Tune quantization parameters
############################

``nncf.quantize()`` function has several parameters that allow to tune quantization process to get more accurate model. Below is the list of parameters and their description:
``nncf.quantize()`` function has several optional parameters that allow tuning the quantization process to get a more accurate model. Below is the list of parameters and their description:

* ``model_type`` - used to specify quantization scheme required for specific type of the model. For example, **Transformer** models (BERT, distillBERT, etc.) require a special quantization scheme to preserve accuracy after quantization.
* ``model_type`` - used to specify quantization scheme required for specific type of the model. ``Transformer`` is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, DistilBERT, etc.). ``None`` is default, i.e. no specific scheme is defined.

.. code-block:: sh

Expand All @@ -115,7 +114,7 @@ Tune quantization parameters

nncf.quantize(model, dataset, preset=nncf.Preset.MIXED)

* ``fast_bias_correction`` - enables more accurate bias (error) correction algorithm that can be used to improve accuracy of the model. This parameter is available only for OpenVINO representation. ``True`` is used by default.
* ``fast_bias_correction`` - when set to ``False``, enables a more accurate bias (error) correction algorithm that can be used to improve the accuracy of the model. This parameter is available only for OpenVINO and ONNX representations. ``True`` is used by default to minimize quantization time.

.. code-block:: sh

Expand All @@ -127,7 +126,7 @@ Tune quantization parameters

nncf.quantize(model, dataset, subset_size=1000)

* ``ignored_scope`` - this parameter can be used to exclude some layers from quantization process. For example, if you want to exclude the last layer of the model from quantization. Below are some examples of how to use this parameter:
* ``ignored_scope`` - this parameter can be used to exclude some layers from the quantization process to preserve the model accuracy. For example, when you want to exclude the last layer of the model from quantization. Below are some examples of how to use this parameter:

* Exclude by layer name:

Expand All @@ -150,12 +149,20 @@ Tune quantization parameters
regex = '.*layer_.*'
nncf.quantize(model, dataset, ignored_scope=nncf.IgnoredScope(patterns=regex))

* ``target_device`` - defines the target device, the specificity of which will be taken into account during optimization. The following values are supported: ``ANY`` (default), ``CPU``, ``CPU_SPR``, ``GPU``, and ``VPU``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to learn more about the effects of setting this parameter. I recently noticed a tiny accuracy degredation on SPR compared to ICL. Will specifying target_device=CPU_SPR fix this? Will that cause lower accuracy on ICL though? (Not asking for an answer, but that's what I wonder when I read this). Will this influence accuracy, performance or both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU_SPR option improves model performance on SPR device due to removal of some FQs. It doesn't improve the accuracy, at least it doesn't solve the issue with bf16 enabled by default which I guess is the main reason of the accuracy degradation in your case.


* ``advanced_parameters`` - used to specify advanced quantization parameters for fine-tuning the quantization algorithm. Defined by ``nncf.AdvancedQuantizationParameters`` class. ``None`` is default.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved

If the accuracy of the quantized model is not satisfactory, you can try to use the :doc:`Quantization with accuracy control <quantization_w_accuracy_control>` flow.

See also
####################
Examples of how to apply NNCF post-training quantization:
############################################################

* `Example of basic quantization flow in PyTorch <https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/torch/mobilenet_v2>`__
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
* `Post-Training Quantization of MobileNet v2 OpenVINO Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/openvino/mobilenet_v2/README.md>`__
* `Post-Training Quantization of YOLOv8 OpenVINO Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/openvino/yolov8/README.md>`__
* `Post-Training Quantization of MobileNet v2 PyTorch Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/torch/mobilenet_v2/README.md>`__
* `Post-Training Quantization of SSD PyTorch Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/torch/ssd300_vgg16/README.md>`__
* `Post-Training Quantization of MobileNet v2 ONNX Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/onnx/mobilenet_v2/README.md>`__
* `Post-Training Quantization of MobileNet v2 TensorFlow Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/tensorflow/mobilenet_v2/README.md>`__

@endsphinxdirective
10 changes: 5 additions & 5 deletions docs/optimization_guide/nncf/ptq/ptq_introduction.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Post-training Quantization with NNCF (new) {#nncf_ptq_introduction}
# Post-training Quantization with NNCF {#nncf_ptq_introduction}

@sphinxdirective

Expand All @@ -10,17 +10,17 @@
quantization_w_accuracy_control


Neural Network Compression Framework (NNCF) provides a new post-training quantization API available in Python that is aimed at reusing the code for model training or validation that is usually available with the model in the source framework, for example, PyTorch or TensroFlow. The API is cross-framework and currently supports models representing in the following frameworks: PyTorch, TensorFlow 2.x, ONNX, and OpenVINO.
Neural Network Compression Framework (NNCF) provides a post-training quantization API available in Python that is aimed at reusing the code for model training or validation that is usually available with the model in the source framework, for example, PyTorch or TensroFlow. The NNCF API is cross-framework and currently supports models in the following frameworks: OpenVINO, PyTorch, TensorFlow 2.x, and ONNX. Currently, post-training quantization for models in OpenVINO Intermediate Representation is the most mature in terms of supported methods and models coverage.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved

This API has two main capabilities to apply 8-bit post-training quantization:

* :doc:`Basic quantization <basic_quantization_flow>` - the simplest quantization flow that allows to apply 8-bit integer quantization to the model.
* :doc:`Quantization with accuracy control <quantization_w_accuracy_control>` - the most advanced quantization flow that allows to apply 8-bit quantization to the model with accuracy control.
* :doc:`Basic quantization <basic_quantization_flow>` - the simplest quantization flow that allows applying 8-bit integer quantization to the model.
* :doc:`Quantization with accuracy control <quantization_w_accuracy_control>` - the most advanced quantization flow that allows applying 8-bit quantization to the model with accuracy control.
MaximProshin marked this conversation as resolved.
Show resolved Hide resolved

Additional Resources
####################

* `NNCF GitHub <https://github.com/openvinotoolkit/nncf>`__
* :doc:`Optimizing Models at Training Time <tmo_introduction>`
* `NNCF GitHub <https://github.com/openvinotoolkit/nncf>`__

@endsphinxdirective
Loading