Merge master (#22)

* Leaky relu transformation refactor (openvinotoolkit#2640) * Refactored LeakyRelu transformation * Added unit test for LeakyRelu transformation + removed duplicate test function valued_const * nGraph implementation of NMS-5 (without `evaluate()`) (openvinotoolkit#2651) * Written nGraph NMS-5 without evaluate(). * Used NGRAPH_RTTI_DECLARATION. * setupvars.sh: Updated setting pyenv error to warning. (openvinotoolkit#2663) * Fix itt build (openvinotoolkit#2662) * Loop-5 operation specification (openvinotoolkit#2291) The Loop-5 operation specification * Time tests improvements (openvinotoolkit#2642) * Remove extra functions from run_timetest.py * Add `log.debug` of raw and aggregated statistics in run_timetest.py * Implement storing of models locally for test_timetest.py * Fixed CVS-35316 (openvinotoolkit#2072) * Extend MO for operation GatherND (openvinotoolkit#2540) * Extend MO for operation GatherND * Update documentation * Rename GatherNd.py to gathernd.py Signed-off-by: Roman Kazantsev <[email protected]> * Add hsigmoid op to ngraph (openvinotoolkit#2647) * [IE CLDNN] Fixes for GatherTree and ReverseSequence (openvinotoolkit#2660) * ReorgYolo reference implementation (openvinotoolkit#2384) * Align ReorgYolo to the spec (vector strides -> int stride) * ReorgYolo ref impl * ReorgYolo evaluate method * ReorgYolo tests * Tests update * Style apply * Add some coments * Code refactor * Comment update * Style apply * Build fix, mark evaluate as override * Revert "Align ReorgYolo to the spec (vector strides -> int stride)" * Use int_executable instead of evaluate * Use char* instead of templates * Code refactor * Comment update * Code review comment * Add constructor aligned with spec * Update shape validation * Update attributes tests * Add type_prop tests * Update backend tests * Add single layer tests * Update the spec * Remove wrong transformation test * Add ReorgYolo to evaluates_map * code style Co-authored-by: Evgeny Lazarev <[email protected]> Co-authored-by: Vladimir Gavrilov <[email protected]> Co-authored-by: Artyom Anokhov <[email protected]> Co-authored-by: Andrey Somsikov <[email protected]> Co-authored-by: Vitaliy Urusovskij <[email protected]> Co-authored-by: Anastasiya Ageeva <[email protected]> Co-authored-by: Roman Kazantsev <[email protected]> Co-authored-by: iliya mironov <[email protected]> Co-authored-by: Vladimir Paramuzov <[email protected]> Co-authored-by: Katarzyna Mitrus <[email protected]>
mikhail-treskin · Oct 15, 2020 · a5153c2 · a5153c2
1 parent 8e50854
commit a5153c2
Show file tree

Hide file tree

Showing 67 changed files with 2,557 additions and 342 deletions.
diff --git a/docs/MO_DG/prepare_model/Supported_Frameworks_Layers.md b/docs/MO_DG/prepare_model/Supported_Frameworks_Layers.md
@@ -158,7 +158,7 @@ Standard TensorFlow\* operations:
 | FloorDiv | No |
 | FusedBatchNorm | No |
 | Gather | No |
-| GatherNd | Supported if it can be replaced with Gather |
+| GatherNd | No |
 | GatherV2 | No |
 | Greater | No |
 | GreaterEqual | No |
@@ -337,6 +337,7 @@ Standard ONNX\* operators:
 | Floor | No |
 | GRU | No |
 | Gather | No |
+| GatherND | No |
 | GatherTree | No |
 | Gemm | No |
 | GlobalAveragePool | No |

diff --git a/docs/doxygen/ie_docs.xml b/docs/doxygen/ie_docs.xml
@@ -175,6 +175,7 @@
                         <tab type="user" title="LogicalOr-1" url="@ref openvino_docs_ops_logical_LogicalOr_1"/>
                         <tab type="user" title="LogicalXor-1" url="@ref openvino_docs_ops_logical_LogicalXor_1"/>
                         <tab type="user" title="LogSoftmax-5" url="@ref openvino_docs_ops_activation_LogSoftmax_5"/>
+                        <tab type="user" title="Loop-5" url="@ref openvino_docs_ops_infrastructure_Loop_5"/>
                         <tab type="user" title="MVN-1" url="@ref openvino_docs_ops_normalization_MVN_1"/>
                         <tab type="user" title="MatMul-1" url="@ref openvino_docs_ops_matrix_MatMul_1"/>
                         <tab type="user" title="MaxPool-1" url="@ref openvino_docs_ops_pooling_MaxPool_1"/>

diff --git a/docs/install_guides/deployment-manager-tool.md b/docs/install_guides/deployment-manager-tool.md
@@ -39,10 +39,10 @@ Interactive mode provides a user-friendly command-line interface that will guide
    ./deployment_manager.py
    ``` 
 2. The target device selection dialog is displayed:
-![Deployment Manager selection dialog](../img/selection_dialog.png "Deployment Manager selection dialog")
+![Deployment Manager selection dialog](../img/selection_dialog.png)
 Use the options provided on the screen to complete selection of the target devices and press **Enter** to proceed to the package generation dialog. if you want to interrupt the generation process and exit the program, type **q** and press **Enter**.
 3. Once you accept the selection, the package generation dialog is displayed:
-![Deployment Manager configuration dialog](../img/configuration_dialog.png "Deployment Manager configuration dialog")
+![Deployment Manager configuration dialog](../img/configuration_dialog.png)
    1. The target devices you have selected at the previous step appear on the screen. If you want to change the selection, type **b** and press **Enter** to go back to the previous screen. 
 
    2. Use the options provided to configure the generation process, or use the default settings.

diff --git a/docs/ops/detection/ReorgYolo_1.md b/docs/ops/detection/ReorgYolo_1.md
@@ -22,7 +22,7 @@
 
 **Inputs**:
 
-*   **1**: 4D input tensor of any type and shape `[N, C, H, W]`. `H` and `W` should be divisible by `stride`. Required.
+*   **1**: 4D input tensor of any type and shape `[N, C, H, W]`. `H` and `W` should be divisible by `stride` and `C >= (stride*stride)`. **Required.**
 
 **Outputs**:
 
@@ -31,7 +31,7 @@
 **Example**
 
 ```xml
-<layer id="89" name="ExtractImagePatches" type="ReorgYolo">
+<layer id="89" name="reorg" type="ReorgYolo">
     <data stride="2"/>
     <input>
         <port id="0">
@@ -50,4 +50,4 @@
         </port>
     </output>
 </layer>
-```
+```
diff --git a/docs/ops/infrastructure/Loop_5.md b/docs/ops/infrastructure/Loop_5.md
@@ -0,0 +1,181 @@
+## Loop <a name="Loop"></a> {#openvino_docs_ops_infrastructure_Loop_5}
+
+**Versioned name**: *Loop-5*
+
+**Category**: Infrastructure
+
+**Short description**: *Loop* operation performs recurrent execution of the network, which is described in the `body`, iterating through the data. 
+The operation has similar semantic to the ONNX* Loop [operation](https://github.com/onnx/onnx/blob/master/docs/Changelog.md#Loop-13).
+
+**Detailed description**
+
+The body of the Loop can be executed 0 or more times depending on the values passed to the Loop operation inputs called "trip count", "execution condition" and input of the Loop body called "current iteration".
+
+These Loop operation inputs have the following meaning:
+1. Trip count is an integer scalar or 1D tensor with 1 element input specifying maximum number of iterations. To simulate infinite loop Constant `-1` can be provided as input.
+2. Loop execution condition input is a boolean scalar or 1D tensor with 1 element input specifying whether to run the first loop iteration or not. Note, that the body of the Loop must yield the condition value for the consecutive iterations.
+
+There are several combinations of these two inputs `(trip_count, execution condition)` which are described in the following code snippet:
+
+```
+  input (-1, true) // infinite loop
+      bool cond = true;
+      for (int i = 0; cond; ++i) 
+      {
+          cond = true; // sub-graph calculating condition must always return "true"!
+      }
+
+  input (-1, cond) // while loop
+      bool cond = ...;
+      for (int i = 0; cond; ++i) 
+      {
+          cond = ...;
+      }
+
+  input (-1, true) // do-while loop
+      bool cond = true;
+      for (int i = 0; cond; ++i) 
+      {
+          cond = ...;
+      }
+
+  input (trip_count, true) // for loop
+      int trip_count = ...;
+      bool cond = true;
+      for (int i = 0; i < trip_count; ++i) 
+      {
+          cond = true; // sub-graph calculating condition must always return "true"!
+      }
+
+  input (trip_count, cond) // for with condition
+      int trip_count = ...;
+      bool cond = ...;
+      for (int i = 0; i < trip_count && cond; ++i) 
+      {
+          cond = ...;
+      }
+```
+
+1. One of the body graph inputs called "current iteration" is an integer scalar or 1D integer tensor with 1 number specifying current iteration number. The iteration number starts from 0 and incremented by one for each iteration. This input is optional and may not exist if the iteration number value is not used in the body.
+2. One of the body graph outputs is called "condition" is a boolean scalar or 1D tensor with 1 element. This value is used to decide whenever to perform the next iteration or not.
+
+Loop operation description in the IR has regular sections: `input` and `output`. They connect Loop body to the outer graph and specify condition(s).
+Loop operation description in the IR also has several special sections: `body`, `port_map` and `back_edges` similar to the ones from the TensorIterator operation but having some important features described below.
+
+1. The body operation getting an input from the main graph should have an entry in the `port_map` section of the Loop operation. These edges connect input ports of the Loop with the body `Parameter`s.
+2. The body operation producing tensor to be used in the subsequent iterations (like in RNN models) should have a back edge described in the `back_edges` section of the operation. The back edge connects the respective body `Parameter` and `Result` operations. For such a case the Loop operation node provides input for the first iteration, while corresponding Loop operation output produces the tensor computed during the last iteration.
+3. Output tensors produced by a particular body operation across all iterations can be concatenated and returned as a Loop operation output (this is a "scan output" according to the ONNX* Loop operation [specification](https://github.com/onnx/onnx/blob/master/docs/Changelog.md#Loop-13)). The corresponding `output` entry in the `port_map` should have `axis` attribute specifying the axis to concatenate. Therefore, outputs from operations corresponding to `output` entries in the `port_map` without `axis` attribute are returned "as is" (without concatenation).
+4. There is one body `Parameter` operation not connected through the `port_map`. This is a "current iteration" input. The Loop operation is responsible for providing the appropriate value for each iteration.
+5. Connection of nodes inside the Loop body with the main graph should be done through `Parameter` and `Result` body operations. No other ways to connect graphs are allowed.
+
+**Loop attributes**:
+
+* **Body**:
+
+    `body` is a network that will be recurrently executed. The network is described operation by operation as a typical IR network.
+
+    * **Body attributes**:
+
+            No attributes available.
+
+* **Port map**:
+
+    *port_map* is a set of rules to map input or output data tensors of `Loop` operation onto `body` data tensors. The `port_map` entries can be` input` and `output`. Each entry describes a corresponding mapping rule.
+
+    * **Port map attributes**:
+
+        * *external_port_id*
+            * **Description**: *external_port_id* is a port ID of the `Loop` operation.
+            * **Range of values**: IDs of the *Loop* outputs
+            * **Type**: `int`
+            * **Default value**: None
+            * **Required**: *yes*
+
+        * *internal_layer_id*
+
+            * **Description**: *internal_layer_id* is a `Parameter` or `Result` operation ID inside the `body` network to map to.
+            * **Range of values**: IDs of the `Parameter` operations inside in the *Loop* operation
+            * **Type**: `int`
+            * **Default value**: None
+            * **Required**: *yes*
+
+        * *axis*
+
+            * **Description**: *axis* is an axis to concatenate the body `Result` output across all iterations. Can be specified for `output` entry only.
+            * **Range of values**: an integer. Negative value means counting dimension from the end.
+            * **Type**: `int`
+            * **Default value**: None
+            * **Required**: *no*
+
+* **Back edges**:
+
+    *back_edges* is a set of rules to transfer tensor values from `body` outputs at one iteration to `body` parameters at the next iteration. Back edge connects some `Result` operation in the `body` to `Parameter` operation in the same `body`.
+
+    * **Back edge attributes**:
+
+        * *from-layer*
+
+            * **Description**: *from-layer* is a `Result` operation ID inside the `body` network.
+            * **Range of values**: IDs of the `Result` operations inside the *Loop*
+            * **Type**: `int`
+            * **Default value**: None
+            * **Required**: *yes*
+
+        * *to-layer*
+
+            * **Description**: *to-layer* is a `Parameter` operation ID inside the `body` network to end mapping.
+            * **Range of values**: IDs of the `Parameter` operations inside the *Loop*
+            * **Type**: `int`
+            * **Default value**: None
+            * **Required**: *yes*
+
+**Loop Inputs**
+
+* **Trip count**: A scalar or 1D tensor with 1 element of `int64` or `int32` type specifying maximum number of iterations. *Required*.
+
+* **ExecutionCondition**: A scalar or 1D tensor with 1 element of `boolean` type specifying whether to execute the first iteration or not. `True` value means to execute the 1st iteration. *Required*.
+
+* **Multiple other inputs**: tensors of different types and shapes. *Optional*.
+
+**Loop Outputs**
+
+* **Multiple outputs**: Results of execution of the `body`. Tensors of any type and shape.
+
+
+**Body Inputs**
+
+* **Multiple inputs**: tensors of different types and shapes except the one corresponding to the current iteration number. This input is marked in the port_map with attribute `purpose = "current_iteration"` and produces a scalar or 1D tensor with 1 element of `int64` or `int32` type. *Optional*.
+
+
+**Body Outputs**
+
+* **Multiple outputs**: Results of execution of the `body`. Tensors of any type and shape except the one corresponding to the output with execution condition. This output is marked in the port_map with attribute `purpose = "execution_condition"` and is mandatory and produces a scalar or 1D tensor with 1 element of `boolean` type. Other outputs are optional. 
+
+
+**Examples**
+
+*Example 1: a typical Loop structure*
+```xml
+<layer type="Loop" ... >
+    <input> ... </input>
+    <output> ... </output>
+    <port_map>
+        <input external_port_id="0" internal_layer_id="0"/>
+        <input external_port_id="1" internal_layer_id="1"/>
+        <input external_port_id="-1" internal_layer_id="2" purpose="current_iteration"/>
+        ...
+        <output external_port_id="3" internal_layer_id="4"/>
+        <output external_port_id="4" internal_layer_id="10" axis="1"/>
+        <output external_port_id="-1" internal_layer_id="22" purpose="execution_condition"/>
+        ...
+    </port_map>
+    <back_edges>
+        <edge from-layer="1" to-layer="5"/>
+        ...
+    </back_edges>
+    <body>
+        <layers> ... </layers>
+        <edges> ... </edges>
+    </body>
+</layer>
+```
diff --git a/docs/ops/opset5.md b/docs/ops/opset5.md
@@ -76,6 +76,7 @@ declared in `namespace opset5`.
 * [LogicalOr](logical/LogicalOr_1.md)
 * [LogicalXor](logical/LogicalXor_1.md)
 * [LogSoftmax](activation/LogSoftmax_5.md)
+* [Loop](infrastructure/Loop_5.md)
 * [LRN](normalization/LRN_1.md)
 * [LSTMCell](sequence/LSTMCell_1.md)
 * [LSTMSequence](sequence/LSTMSequence_1.md)

diff --git a/inference-engine/src/cldnn_engine/cldnn_common_utils.h b/inference-engine/src/cldnn_engine/cldnn_common_utils.h
@@ -41,6 +41,7 @@ const auto CldnnTensorFromIEDims = [](const InferenceEngine::SizeVector& dims, i
 inline cldnn::data_types DataTypeFromPrecision(InferenceEngine::Precision p) {
     switch (p) {
     case Precision::I16:
+    case Precision::U16:
     case Precision::FP32:
         return cldnn::data_types::f32;
     case Precision::FP16:

diff --git a/inference-engine/src/cldnn_engine/cldnn_engine.cpp b/inference-engine/src/cldnn_engine/cldnn_engine.cpp
@@ -196,10 +196,15 @@ clDNNEngine::clDNNEngine() : m_defaultContext(nullptr) {
 auto check_inputs = [](InferenceEngine::InputsDataMap _networkInputs) {
     for (auto ii : _networkInputs) {
         auto input_precision = ii.second->getTensorDesc().getPrecision();
-        if (input_precision != InferenceEngine::Precision::FP16 && input_precision != InferenceEngine::Precision::I16
-            && input_precision != InferenceEngine::Precision::FP32 && input_precision != InferenceEngine::Precision::U8
-            && input_precision != InferenceEngine::Precision::I32 && input_precision != InferenceEngine::Precision::I64
-            && input_precision != InferenceEngine::Precision::I8 && input_precision != InferenceEngine::Precision::BOOL) {
+        if (input_precision != InferenceEngine::Precision::FP16 &&
+            input_precision != InferenceEngine::Precision::FP32 &&
+            input_precision != InferenceEngine::Precision::U8 &&
+            input_precision != InferenceEngine::Precision::I8 &&
+            input_precision != InferenceEngine::Precision::I16 &&
+            input_precision != InferenceEngine::Precision::U16 &&
+            input_precision != InferenceEngine::Precision::I32 &&
+            input_precision != InferenceEngine::Precision::I64 &&
+            input_precision != InferenceEngine::Precision::BOOL) {
             THROW_IE_EXCEPTION << NOT_IMPLEMENTED_str
                 << "Input image format " << input_precision << " is not supported yet...";
         }

diff --git a/inference-engine/src/cldnn_engine/cldnn_infer_request.cpp b/inference-engine/src/cldnn_engine/cldnn_infer_request.cpp
@@ -41,6 +41,11 @@ Blob::Ptr CLDNNInferRequest::createInputBlob(const TensorDesc& desc, uint8_t* me
             return make_shared_blob<int16_t>(desc, reinterpret_cast<int16_t*>(mem_ptr));
         else
             return make_shared_blob<int16_t>(desc);
+    case Precision::U16:
+        if (mem_ptr != nullptr)
+            return make_shared_blob<uint16_t>(desc, reinterpret_cast<uint16_t*>(mem_ptr));
+        else
+            return make_shared_blob<uint16_t>(desc);
     case Precision::I32:
         if (mem_ptr != nullptr)
             return make_shared_blob<int32_t>(desc, reinterpret_cast<int32_t*>(mem_ptr));
@@ -586,7 +591,7 @@ void CLDNNInferRequest::AllocateInputs() {
             cldnn::pointer<uint8_t> mem_ptr = inputsMemory.at(name).pointer<uint8_t>();
             _inputs[name] = createInputBlob(desc, mem_ptr.data());
 
-            if (desc.getPrecision() == Precision::I16) {
+            if (desc.getPrecision() == Precision::I16 || desc.getPrecision() == Precision::U16) {
                 cldnn::layout layout_fp32 = layout;
                 layout_fp32.data_type = cldnn::data_types::f32;
                 input_alloc(name + fp32_suffix, layout_fp32);
@@ -609,7 +614,7 @@ void CLDNNInferRequest::AllocateInputsDyn() {
         }
 
         Blob::Ptr inputBlob = createInputBlob(desc);
-        if (desc.getPrecision() == Precision::I16) {
+        if (desc.getPrecision() == Precision::I16 || desc.getPrecision() == Precision::U16) {
             desc.setPrecision(Precision::FP32);
             auto fp32inputBlob = InferenceEngine::make_shared_blob<float>(desc);
             fp32inputBlob->allocate();
@@ -910,11 +915,16 @@ void CLDNNInferRequest::PrepareInput(const cldnn::primitive_id &inputName, const
     if (inputBlob.is<gpu::ClBlob>()) {
         // no need to check for reuse
         _nw_ptr->set_input_data(internalName, memory);
-    } else if (prec == Precision::I16) {
+    } else if (prec == Precision::I16 || prec == Precision::U16) {
         // clDNN doesn't support I16 input precision, so we always have to convert input data to fp32 precision
         const cldnn::memory& fp32_mem = inputsMemory.at(inputName+fp32_suffix);
         cldnn::pointer<float> ptr = fp32_mem.pointer<float>();
-        copyToFloat<int16_t>(ptr.data(), &inputBlob);
+        if (prec == Precision::I16) {
+            copyToFloat<int16_t>(ptr.data(), &inputBlob);
+        } else {
+            copyToFloat<uint16_t>(ptr.data(), &inputBlob);
+        }
+
         _nw_ptr->set_input_data(internalName, fp32_mem);
     } else if (is_same_buffer(inputBlob, memory)) {
         // If input memory was allocated by cldnn engine and wasn't overwritten by user set_input_data method won't copy input data.

diff --git a/inference-engine/src/legacy_api/include/legacy/ngraph_ops/nms_ie.hpp b/inference-engine/src/legacy_api/include/legacy/ngraph_ops/nms_ie.hpp
@@ -58,5 +58,33 @@ class INFERENCE_ENGINE_API_CLASS(NonMaxSuppressionIE2) : public NonMaxSuppressio
     std::shared_ptr<Node> clone_with_new_inputs(const OutputVector & new_args) const override;
 };
 
+class INFERENCE_ENGINE_API_CLASS(NonMaxSuppressionIE3) : public Op {
+public:
+    NGRAPH_RTTI_DECLARATION;
+
+    NonMaxSuppressionIE3(const Output<Node>& boxes,
+                         const Output<Node>& scores,
+                         const Output<Node>& max_output_boxes_per_class,
+                         const Output<Node>& iou_threshold,
+                         const Output<Node>& score_threshold,
+                         const Output<Node>& soft_nms_sigma,
+                         int center_point_box,
+                         bool sort_result_descending,
+                         const ngraph::element::Type& output_type = ngraph::element::i64);
+
+    void validate_and_infer_types() override;
+
+    bool visit_attributes(AttributeVisitor& visitor) override;
+
+    std::shared_ptr<Node> clone_with_new_inputs(const OutputVector & new_args) const override;
+
+    int m_center_point_box;
+    bool m_sort_result_descending = true;
+    element::Type m_output_type;
+
+private:
+    int64_t max_boxes_output_from_input() const;
+};
+
 }  // namespace op
 }  // namespace ngraph