openvinotoolkit · tsavina · May 19, 2023 · May 6, 2023 · May 6, 2023 · May 6, 2023
@@ -69,7 +69,7 @@ You can integrate and offload to accelerators additional operations for pre- and
 Model Quantization and Compression
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Boost your model’s speed even further with quantization and other state-of-the-art compression techniques available in OpenVINO’s Post-Training Optimization Tool and Neural Network Compression Framework. These techniques also reduce your model size and memory requirements, allowing it to be deployed on resource-constrained edge hardware. 
+Boost your model’s speed even further with quantization and other state-of-the-art compression techniques available in OpenVINO’s Neural Network Compression Framework. These techniques also reduce your model size and memory requirements, allowing it to be deployed on resource-constrained edge hardware. 
 
 .. panels::
    :card: homepage-panels

@@ -8,37 +8,24 @@
 
    ptq_introduction
    tmo_introduction
-   (Experimental) Protecting Model <pot_ranger_README>
 
 
-Model optimization is an optional offline step of improving final model performance by applying special optimization methods, such as quantization, pruning, preprocessing optimization, etc. OpenVINO provides several tools to optimize models at different steps of model development:
+Model optimization is an optional offline step of improving the final model performance and reducing the model size by applying special optimization methods, such as 8-bit quantization, pruning, etc. OpenVINO offers two optimization paths implemented in `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf>`__:
 
-- :doc:`Model Optimizer <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>` implements most of the optimization parameters to a model by default. Yet, you are free to configure mean/scale values, batch size, RGB vs BGR input channels, and other parameters to speed up preprocess of a model (:doc:`Embedding Preprocessing Computation <openvino_docs_MO_DG_Additional_Optimization_Use_Cases>`).
+- :doc:`Post-training Quantization <ptq_introduction>` is designed to optimize the inference of deep learning models by applying the post-training 8-bit integer quantization that does not require model retraining or fine-tuning. 
 
-- :doc:`Post-training Quantization <pot_introduction>` is designed to optimize inference of deep learning models by applying post-training methods that do not require model retraining or fine-tuning, for example, post-training 8-bit integer quantization.
+- :doc:`Training-time Optimization <tmo_introduction>`, a suite of advanced methods for training-time model optimization within the DL framework, such as PyTorch and TensorFlow 2.x. It supports methods like Quantization-aware Training, Structured and Unstructured Pruning, etc. 
 
-- :doc:`Training-time Optimization <nncf_ptq_introduction>`, a suite of advanced methods for training-time model optimization within the DL framework, such as PyTorch and TensorFlow 2.x. It supports methods, like Quantization-aware Training and Filter Pruning. NNCF-optimized models can be inferred with OpenVINO using all the available workflows.
+.. note:: OpenVINO also supports optimized models (for example, quantized) from source frameworks such as PyTorch, TensorFlow, and ONNX (in Q/DQ format).
 
+Post-training Quantization is the fastest way to optimize a model and should be applied first, but it is limited in terms of achievable accuracy-performance trade-off. In this case, Training-time Optimization is an option.
 
-Detailed workflow:
-##################
-
-To understand which development optimization tool you need, refer to the diagram:
+Once the model is optimized using the aforementioned methods, it can be used for inference using the regular OpenVINO inference workflow. No changes to the inference code are required.
 
 .. image:: _static/images/DEVELOPMENT_FLOW_V3_crunch.svg
 
-Post-training methods are limited in terms of achievable accuracy-performance trade-off for optimizing models. In this case, training-time optimization with NNCF is an option.
-
-Once the model is optimized using the aforementioned tools it can be used for inference using the regular OpenVINO inference workflow. No changes to the inference code are required.
-
 .. image:: _static/images/WHAT_TO_USE.svg
 
-Post-training methods are limited in terms of achievable accuracy, which may degrade for certain scenarios.  In such cases, training-time optimization with NNCF may give better results.
-
-Once the model has been optimized using the aforementioned tools, it can be used for inference using the regular OpenVINO inference workflow. No changes to the code are required.
-
-If you are not familiar with model optimization methods, refer to :doc:`post-training methods <pot_introduction>`.
-
 Additional Resources
 ####################
 

@@ -5,10 +5,10 @@
 Introduction
 ####################
 
-The basic quantization flow is the simplest way to apply 8-bit quantization to the model. It is available for models in the following frameworks: PyTorch, TensorFlow 2.x, ONNX, and OpenVINO. The basic quantization flow is based on the following steps:
+The basic quantization flow is the simplest way to apply 8-bit quantization to the model. It is available for models in the following frameworks: OpenVINO, PyTorch, TensorFlow 2.x, and ONNX. The basic quantization flow is based on the following steps:
 
 * Set up an environment and install dependencies.
-* Prepare the **calibration dataset** that is used to estimate quantization parameters of the activations within the model.
+* Prepare the representative **calibration dataset** that is used to estimate quantization parameters of the activations within the model.
 * Call the quantization API to apply 8-bit quantization to the model.
 
 Set up an Environment
@@ -29,25 +29,25 @@ Install all the packages required to instantiate the model object, for example,
 Prepare a Calibration Dataset
 #############################
 
-At this step, create an instance of the ``nncf.Dataset`` class that represents the calibration dataset. The ``nncf.Dataset`` class can be a wrapper over the framework dataset object that is used for model training or validation. The class constructor receives the dataset object and the transformation function. For example, if you use PyTorch, you can pass an instance of the ``torch.utils.data.DataLoader`` object.
+At this step, create an instance of the ``nncf.Dataset`` class that represents the calibration dataset. The ``nncf.Dataset`` class can be a wrapper over the framework dataset object that is used for model training or validation. The class constructor receives the dataset object and the transformation function.
 
 The transformation function is a function that takes a sample from the dataset and returns data that can be passed to the model for inference. For example, this function can take a tuple of a data tensor and labels tensor, and return the former while ignoring the latter. The transformation function is used to avoid modifying the dataset code to make it compatible with the quantization API. The function is applied to each sample from the dataset before passing it to the model for inference. The following code snippet shows how to create an instance of the ``nncf.Dataset`` class:
 
-.. tab:: PyTorch
+.. tab:: OpenVINO
 
-    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_torch.py
+    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_openvino.py
        :language: python
        :fragment: [dataset]
 
-.. tab:: ONNX
+.. tab:: PyTorch
 
-    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_onnx.py
+    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_torch.py
        :language: python
        :fragment: [dataset]
 
-.. tab:: OpenVINO
+.. tab:: ONNX
 
-    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_openvino.py
+    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_onnx.py
        :language: python
        :fragment: [dataset]
 
@@ -57,30 +57,29 @@ The transformation function is a function that takes a sample from the dataset a
        :language: python
        :fragment: [dataset]
 
-
-If there is no framework dataset object, you can create your own entity that implements the ``Iterable`` interface in Python and returns data samples feasible for inference. In this case, a transformation function is not required.
+If there is no framework dataset object, you can create your own entity that implements the ``Iterable`` interface in Python, for example the list of images, and returns data samples feasible for inference. In this case, a transformation function is not required.
 
 
-Run a Quantized Model
+Quantize a Model
 #####################
 
 Once the dataset is ready and the model object is instantiated, you can apply 8-bit quantization to it:
 
-.. tab:: PyTorch
+.. tab:: OpenVINO
 
-    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_torch.py
+    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_openvino.py
        :language: python
        :fragment: [quantization]
 
-.. tab:: ONNX
+.. tab:: PyTorch
 
-    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_onnx.py
+    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_torch.py
        :language: python
        :fragment: [quantization]
 
-.. tab:: OpenVINO
+.. tab:: ONNX
 
-    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_openvino.py
+    .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_onnx.py
        :language: python
        :fragment: [quantization]
 
@@ -93,14 +92,14 @@ Once the dataset is ready and the model object is instantiated, you can apply 8-
 
 .. note:: The ``model`` is an instance of the ``torch.nn.Module`` class for PyTorch, ``onnx.ModelProto`` for ONNX, and ``openvino.runtime.Model`` for OpenVINO.
 
-After that the model can be exported into th OpenVINO Intermediate Representation if needed and run faster with OpenVINO.
+After that the model can be exported into the OpenVINO Intermediate Representation if needed and run faster with OpenVINO.
 
 Tune quantization parameters
 ############################
 
-``nncf.quantize()`` function has several parameters that allow to tune quantization process to get more accurate model. Below is the list of parameters and their description:
+``nncf.quantize()`` function has several optional parameters that allow tuning the quantization process to get a more accurate model. Below is the list of parameters and their description:
 
-* ``model_type`` - used to specify quantization scheme required for specific type of the model. For example, **Transformer** models (BERT, distillBERT, etc.) require a special quantization scheme to preserve accuracy after quantization.
+* ``model_type`` - used to specify quantization scheme required for specific type of the model. ``Transformer`` is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, DistilBERT, etc.). ``None`` is default, i.e. no specific scheme is defined.
 
   .. code-block:: sh
 
@@ -115,7 +114,7 @@ Tune quantization parameters
 
        nncf.quantize(model, dataset, preset=nncf.Preset.MIXED)
 
-* ``fast_bias_correction`` - enables more accurate bias (error) correction algorithm that can be used to improve accuracy of the model. This parameter is available only for OpenVINO representation. ``True`` is used by default.
+* ``fast_bias_correction`` - when set to ``False``, enables a more accurate bias (error) correction algorithm that can be used to improve the accuracy of the model. This parameter is available only for OpenVINO and ONNX representations. ``True`` is used by default to minimize quantization time.
 
   .. code-block:: sh
 
@@ -127,7 +126,7 @@ Tune quantization parameters
 
      nncf.quantize(model, dataset, subset_size=1000)
 
-* ``ignored_scope`` - this parameter can be used to exclude some layers from quantization process. For example, if you want to exclude the last layer of the model from quantization. Below are some examples of how to use this parameter:
+* ``ignored_scope`` - this parameter can be used to exclude some layers from the quantization process to preserve the model accuracy. For example, when you want to exclude the last layer of the model from quantization. Below are some examples of how to use this parameter:
 
   * Exclude by layer name:
 
@@ -150,12 +149,20 @@ Tune quantization parameters
        regex = '.*layer_.*'
        nncf.quantize(model, dataset, ignored_scope=nncf.IgnoredScope(patterns=regex))
 
+* ``target_device`` - defines the target device, the specificity of which will be taken into account during optimization. The following values are supported: ``ANY`` (default), ``CPU``, ``CPU_SPR``, ``GPU``, and ``VPU``.
+
+* ``advanced_parameters`` - used to specify advanced quantization parameters for fine-tuning the quantization algorithm. Defined by ``nncf.AdvancedQuantizationParameters`` class. ``None`` is default.
 
 If the accuracy of the quantized model is not satisfactory, you can try to use the :doc:`Quantization with accuracy control <quantization_w_accuracy_control>` flow.
 
-See also
-####################
+Examples of how to apply NNCF post-training quantization:
+############################################################
 
-* `Example of basic quantization flow in PyTorch <https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/torch/mobilenet_v2>`__
+* `Post-Training Quantization of MobileNet v2 OpenVINO Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/openvino/mobilenet_v2/README.md>`__
+* `Post-Training Quantization of YOLOv8 OpenVINO Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/openvino/yolov8/README.md>`__
+* `Post-Training Quantization of MobileNet v2 PyTorch Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/torch/mobilenet_v2/README.md>`__
+* `Post-Training Quantization of SSD PyTorch Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/torch/ssd300_vgg16/README.md>`__
+* `Post-Training Quantization of MobileNet v2 ONNX Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/onnx/mobilenet_v2/README.md>`__
+* `Post-Training Quantization of MobileNet v2 TensorFlow Model <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/tensorflow/mobilenet_v2/README.md>`__
 
 @endsphinxdirective
@@ -1,4 +1,4 @@
-# Post-training Quantization with NNCF (new) {#nncf_ptq_introduction}
+# Post-training Quantization with NNCF {#nncf_ptq_introduction}
 
 @sphinxdirective
 
@@ -10,17 +10,17 @@
    quantization_w_accuracy_control
 
 
-Neural Network Compression Framework (NNCF) provides a new post-training quantization API available in Python that is aimed at reusing the code for model training or validation that is usually available with the model in the source framework, for example, PyTorch or TensroFlow. The API is cross-framework and currently supports models representing in the following frameworks: PyTorch, TensorFlow 2.x, ONNX, and OpenVINO.
+Neural Network Compression Framework (NNCF) provides a post-training quantization API available in Python that is aimed at reusing the code for model training or validation that is usually available with the model in the source framework, for example, PyTorch or TensroFlow. The NNCF API is cross-framework and currently supports models in the following frameworks: OpenVINO, PyTorch, TensorFlow 2.x, and ONNX. Currently, post-training quantization for models in OpenVINO Intermediate Representation is the most mature in terms of supported methods and models coverage. 
 
 This API has two main capabilities to apply 8-bit post-training quantization:
 
-* :doc:`Basic quantization <basic_quantization_flow>` - the simplest quantization flow that allows to apply 8-bit integer quantization to the model.
-* :doc:`Quantization with accuracy control <quantization_w_accuracy_control>` - the most advanced quantization flow that allows to apply 8-bit quantization to the model with accuracy control.
+* :doc:`Basic quantization <basic_quantization_flow>` - the simplest quantization flow that allows applying 8-bit integer quantization to the model.
+* :doc:`Quantization with accuracy control <quantization_w_accuracy_control>` - the most advanced quantization flow that allows applying 8-bit quantization to the model with accuracy control.
 
 Additional Resources
 ####################
 
-* `NNCF GitHub <https://github.com/openvinotoolkit/nncf>`__
 * :doc:`Optimizing Models at Training Time <tmo_introduction>`
+* `NNCF GitHub <https://github.com/openvinotoolkit/nncf>`__
 
 @endsphinxdirective