[GPU] Add GPU plugin metric to get statistics of GPU memory allocated by engine #7758

andrew-k-park · 2021-09-30T08:02:57Z

Details:

Add GPU plugin metric to get statistics of GPU memory allocated by engine for each allocation type

Tickets:

56842

inference-engine/thirdparty/clDNN/runtime/engine.cpp

inference-engine/src/inference_engine/include/ie/gpu/gpu_config.hpp

andrew-k-park · 2021-10-08T03:13:55Z

@vladimir-paramuzov @iefode @avladimi Could you review this PR?

andrew-k-park · 2021-10-12T07:12:10Z

@iefode @avladimi Ping again. Could you review this PR?

myshevts · 2021-10-12T10:57:25Z

inference-engine/src/cldnn_engine/cldnn_executable_network.cpp

+        if (m_context != nullptr) {
+            auto impl = getContextImpl(m_context);
+            impl->acquire_lock();
+            std::shared_ptr<cldnn::engine> eng = impl->GetEngine();


I'm not a clDNNPlugin expert, but where is a loop over multiples graphs (streams)?

If this is NOT per-network, but overall (plugin) mem footprint as the https://github.com/openvinotoolkit/openvino/pull/7758/files#diff-0931e76736f996144670536e6a3712a65ad7277637ff564b7511fe8b827223a1R112 suggests... then the metric should be for the plugin, not for the ExecNetwork

The engine object of cldnn library already stores the accumulated memory usage in multiple graphs. And since the requirement of this PR is to query memory usage statistics after LoadNetwork, we decided to implement it as ExecNetwork API.

@andrew-k-park I think @myshevts has fair point and @ilya-lavrenov mentioned the same thing in jira. If it's device-level statistics, then the query should be added to Core.
For executable network we can track memory consumption of this particular network, but currently it's just memory usage snapshot for the whole device.

As I understand

[This PR] statistics for ExecNetwork Metric : this will show the entire memory footprint of the networks including all streams requested for that networks.

statistics for Core Metric if we add in the future: this will show the entire memory footprint of all networks & all streams in the target device.

In this sense I could not understand what @vladimir-paramuzov mentioned as "but currently it's just memory usage snapshot for the whole device." ...

@yeonbok
As I can see, current impl works as follows:

auto net1 = ie.LoadNetwork(model1); auto stat1 = net1.GetMetric(GPU_METRIC_KEY(MEMORY_STATISTICS)); // mem consumption for net1 only auto net2 = ie.LoadNetwork(model2); auto stat2 = net2.GetMetric(GPU_METRIC_KEY(MEMORY_STATISTICS)); // mem consumption for net1 + net2

stat2 contains unexpected values for metric in executable network.

How it should work:

auto net1 = ie.LoadNetwork(model1); auto stat1 = net1.GetMetric(GPU_METRIC_KEY(MEMORY_STATISTICS)); // mem consumption for net1 only auto net2 = ie.LoadNetwork(model2); auto stat2 = net2.GetMetric(GPU_METRIC_KEY(MEMORY_STATISTICS)); // mem consumption for net2 only auto stat_global = core.GetMetric(GPU_METRIC_KEY(MEMORY_STATISTICS)); // mem consumption for net1+net2

The latter was our intention, too. @andrew-k-park Could yuu confirm the behavior?

@yeonbok Based on the current implementation, multiple executable networks have same context impl, So the first case is correct. @vladimir-paramuzov My question is, is the latter your expected implementation to meet the requirement?

I thought that the context impl is created for each ExecNetwork. If it is not true and the current behavior is not as we intended, I think we need to come up with an additional update so that we could be alinged with the initial concept. Current behavior is somewhat strange in the sense that it is implemented in Network. Lets come up with additional fix. Thanks for the review @myshevts @vladimir-paramuzov

myshevts · 2021-10-12T11:03:52Z

inference-engine/thirdparty/clDNN/api/cldnn/runtime/engine.hpp

@@ -109,6 +109,10 @@ class engine {
    /// Returns the amount of GPU memory specified allocation @p type that currently used by the engine
    uint64_t get_used_device_memory(allocation_type type) const;

+    /// Returns statistics of GPU memory allocated by engine in current process for all allocation types.


if this is current process only, how do I track the overall GPU device memory utilization (e.g. how much free me I have to load another network)?

Currently, It's impossible to query information about overall GPU memory through multiple networks or multiple processes because only total size of memory information can retrieved from GPU device, and memory usage is managed by engine object in cldnn library through the recently called LoadNetwork API, So memory from multiple loaded networks must be tracked on application side.

vladimir-paramuzov · 2021-10-13T07:41:55Z

inference-engine/thirdparty/clDNN/runtime/engine.cpp

        memory_usage = iter->second.load();
    }
    return memory_usage;
 }

+void engine::get_memory_statistics(std::map<std::string, uint64_t>* statistics) const {


btw, why statistics is an argument of this method? can it be

std::map<std::string, uint64_t> engine::get_memory_statistics() const

?

…gine for each allocation type Signed-off-by: Andrew Kwangwoong Park <[email protected]>

Signed-off-by: Andrew Kwangwoong Park <[email protected]>

andrew-k-park added the category: GPU OpenVINO GPU plugin label Sep 30, 2021

andrew-k-park added this to the 2022.1 milestone Sep 30, 2021

andrew-k-park requested review from geunhwan, yeonbok, kelvinchoi-intel and ahnyoung-paul September 30, 2021 08:38

yeonbok changed the title ~~[GPU] Add GPU plugin metric to get statistics of GPU memory allocated by engine~~ WIP: Do not review [GPU] Add GPU plugin metric to get statistics of GPU memory allocated by engine Sep 30, 2021

andrew-k-park force-pushed the mem_statistics branch 5 times, most recently from 3c2b682 to b08b35b Compare October 7, 2021 09:28

andrew-k-park changed the title ~~WIP: Do not review [GPU] Add GPU plugin metric to get statistics of GPU memory allocated by engine~~ [GPU] Add GPU plugin metric to get statistics of GPU memory allocated by engine Oct 7, 2021

andrew-k-park marked this pull request as ready for review October 7, 2021 09:30

andrew-k-park requested review from a team as code owners October 7, 2021 09:30

andrew-k-park requested review from a team and avladimi and removed request for a team, geunhwan, yeonbok, kelvinchoi-intel and ahnyoung-paul October 7, 2021 09:30

andrew-k-park assigned vladimir-paramuzov Oct 7, 2021

andrew-k-park added the category: inference OpenVINO Runtime library - Inference label Oct 7, 2021

ilya-lavrenov approved these changes Oct 7, 2021

View reviewed changes

yeonbok reviewed Oct 7, 2021

View reviewed changes

inference-engine/thirdparty/clDNN/runtime/engine.cpp Outdated Show resolved Hide resolved

yeonbok reviewed Oct 7, 2021

View reviewed changes

inference-engine/src/inference_engine/include/ie/gpu/gpu_config.hpp Show resolved Hide resolved

andrew-k-park force-pushed the mem_statistics branch from e1bc728 to 910d95d Compare October 8, 2021 00:42

andrew-k-park force-pushed the mem_statistics branch from 910d95d to ff02018 Compare October 8, 2021 07:06

vladimir-paramuzov approved these changes Oct 8, 2021

View reviewed changes

vladimir-paramuzov enabled auto-merge (squash) October 8, 2021 08:27

andrew-k-park force-pushed the mem_statistics branch from ff02018 to 66d3134 Compare October 12, 2021 00:25

andrew-k-park added category: docs OpenVINO documentation category: IE Tests OpenVINO Test: plugins and common labels Oct 12, 2021

andrew-k-park force-pushed the mem_statistics branch from 66d3134 to 95c3795 Compare October 12, 2021 09:53

myshevts reviewed Oct 12, 2021

View reviewed changes

andrew-k-park force-pushed the mem_statistics branch from 95c3795 to a1ea29a Compare October 13, 2021 00:29

vladimir-paramuzov merged commit 4e2cc3e into openvinotoolkit:master Oct 13, 2021

vladimir-paramuzov reviewed Oct 13, 2021

View reviewed changes

andrew-k-park added 2 commits October 13, 2021 18:23

Add GPU plugin metric to get statistics of GPU memory allocated by en…

9bcd299

…gine for each allocation type Signed-off-by: Andrew Kwangwoong Park <[email protected]>

Use << operatorfor allocation_type and update docs

a1ea29a

Signed-off-by: Andrew Kwangwoong Park <[email protected]>

andrew-k-park mentioned this pull request Nov 5, 2021

[GPU] Add IE Core GPU plugin metric to query overall memory statistics for GPU device #8421

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Add GPU plugin metric to get statistics of GPU memory allocated by engine #7758

[GPU] Add GPU plugin metric to get statistics of GPU memory allocated by engine #7758

andrew-k-park commented Sep 30, 2021

andrew-k-park commented Oct 8, 2021

andrew-k-park commented Oct 12, 2021

myshevts Oct 12, 2021

myshevts Oct 12, 2021

andrew-k-park Oct 13, 2021

vladimir-paramuzov Oct 13, 2021 •

edited

Loading

yeonbok Oct 13, 2021

vladimir-paramuzov Oct 13, 2021 •

edited

Loading

yeonbok Oct 13, 2021

andrew-k-park Oct 13, 2021

yeonbok Oct 13, 2021

myshevts Oct 12, 2021 •

edited

Loading

andrew-k-park Oct 13, 2021

vladimir-paramuzov Oct 13, 2021

[GPU] Add GPU plugin metric to get statistics of GPU memory allocated by engine #7758

[GPU] Add GPU plugin metric to get statistics of GPU memory allocated by engine #7758

Conversation

andrew-k-park commented Sep 30, 2021

Details:

Tickets:

andrew-k-park commented Oct 8, 2021

andrew-k-park commented Oct 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladimir-paramuzov Oct 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladimir-paramuzov Oct 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

myshevts Oct 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladimir-paramuzov Oct 13, 2021 •

edited

Loading

vladimir-paramuzov Oct 13, 2021 •

edited

Loading

myshevts Oct 12, 2021 •

edited

Loading