[Runtime] Enable set_input_zero_copy in GraphRuntime #3416

yinghai · 2019-06-22T06:11:47Z

When integrating with other framework such as PyTorch and etc, it's more desirable to avoid unnecessary copies of activations and weights when hooking up TVM runtime and run.

cc: @ajtulloch @hlu1

yinghai · 2019-06-25T22:57:30Z

@kevinthesun @ajtulloch

hlu1 · 2019-06-27T23:35:01Z

Can you add test cases for set_input_zero_copy?

yinghai · 2019-06-29T04:11:47Z

Will do

yinghai · 2019-07-08T16:40:10Z

@hlu1 Added tests.

hlu1 · 2019-07-10T00:38:34Z

src/runtime/graph/graph_runtime.cc

  for (size_t i = 0; i < data_entry_.size(); ++i) {
    int storage_id = attrs_.storage_id[i];
    CHECK_LT(static_cast<size_t>(storage_id), storage_pool_.size());
    data_entry_[i] =
        storage_pool_[storage_id].CreateView(attrs_.shape[i], vtype[i]);
+    const DLTensor* tmp = data_entry_[i].operator->();
+    data_alignment_[i] = GetDataAlignment(*tmp);
+    dltensor_entry_shapes_[i].resize(tmp->ndim);


The shape info is stored in attrs_.shape. There is no need to save it into dltensor_entry_shapes_

hlu1 · 2019-07-10T00:43:09Z

src/runtime/graph/graph_runtime.cc

+
+  // check the consistency of input shape
+  CHECK_EQ(data_alignment_[eid], GetDataAlignment(*data_ref));
+  CHECK(reinterpret_cast<size_t>(data_ref->data) % kAllocAlignment == 0);


Why is this necessary?

TVM assumes 64-byte alignment of the memory address. Failing to that will cause subtle coredumps, for example vmovaps on unaligned memory. When we setup external input, we need to guard that because we don't know how the memory is allocated.

hlu1 · 2019-07-10T00:45:43Z

src/runtime/graph/graph_runtime.cc

+      CHECK_EQ(shape[i], data_ref->shape[i]);
+    }
+  } else {
+    int64_t acc_prev =


Do we hit this case? Shouldn't the shapes match exactly?

Yeah, we don't need this any more.

hlu1 · 2019-07-10T00:56:05Z

src/runtime/graph/graph_runtime.cc

+ * \param index The input index.
+ * \param data_ref The input data that is referred.
+ */
+void GraphRuntime::SetInputZeroCopy(int index, DLTensor* data_ref) {


Please check if the device_type and device_id match as well, for the heterogenous case.

hlu1 · 2019-07-10T04:10:40Z

src/runtime/graph/graph_runtime.cc

@@ -206,6 +244,12 @@ void GraphRuntime::LoadParams(dmlc::Stream* strm) {
    CHECK_EQ(data_entry_[eid].use_count(), 1);
    data_entry_[eid] = other.GetInput(GetInputIndex(names[i]));
    CHECK_GT(data_entry_[eid].use_count(), 1);
+    const DLTensor* tmp = data_entry_[eid].operator->();


I'm not sure if the ShareParams() function needs to be updated here. If the params are set by sharing with another graph runtime, they do not need to set again. The shape and alignment should be the same as before the sharing.

I met with tests failure if I didn't do this.

hlu1 · 2019-07-10T04:47:02Z

src/runtime/graph/graph_runtime.cc

+  }
+
+  // Update the data pointer for each argument of each op
+  for (auto& op_arg : op_args_) {


I'm thinking, maybe we can change the API to accept an array of DLTensors, update the entries in the data_entries_ for all of them, and then call SetupOpExecs(). That way, you don't need to save the input_entry_ids in op_args_ and the code can be much cleaner.

I had an implementation like this but it turned out the be more difficult.

To update data_entries_, you need to create a DLManagedTensor, which is more small allocation.

Running SetupOpExecs() seems a bit heavy.

hlu1 · 2019-07-10T21:15:57Z

@ZihengJiang, @eqy, @kevinthesun, @icemelon9, can you guys take a look?

hlu1

Looks good to me.

tqchen · 2019-07-11T01:22:58Z

include/tvm/runtime/ndarray.h

@@ -33,6 +33,8 @@
 namespace tvm {
 namespace runtime {

+size_t GetDataAlignment(const DLTensor& arr);


Previously this API was private so it was written in a not very thoughtful way. I am debating whether or not we should include it as a public API. Perhaps just have two inlined version of this function for now internally. As the behavior was mainly return kTempAllocaAlign. If it is a public API, we need to properly document it.

Yeah, I can make them local functions.

zhiics · 2019-07-10T22:29:06Z

src/runtime/graph/graph_runtime.cc

+  const DLTensor* old_t = data_entry_[eid].operator->();
+
+  // check the consistency of input
+  CHECK_EQ(data_alignment_[eid], GetDataAlignment(*data_ref));


I am wondering if we need to introduce data_alignment_. It looks that we can get alignment from data_entry_[eid] as well, right?

Yeah, I originally just wanted to avoid computing this repeatedly. What do you think? I don't have a strong opinion about this.

Yeah, I think we probably should not introduce extra members when not really necessary. The compute is cheap and used by the other field as well.

It's extra dozen bytes of memory, but the trade-off is that you can avoid doing the compute for each inference (we are talking about million). Sounds reasonable?

yinghai · 2019-07-15T15:57:13Z

@tqchen ping

* Enable set_input_zero_copy in GraphRuntime * Fix LoadParams * Fix * lint * Fix remote context issue * Fix * Remove LOG * Remove unused variables * Add tests * works * More test scenarios * make it simpler * Remove unnecessary changes * Address comments * More comments * Address comments * Fix build

tiandiao123 · 2021-02-11T08:32:34Z

Does it support gpu zero copy, we tried zero-copy with gpu context which generates error

yinghai force-pushed the runtime branch from d923923 to 718d09b Compare June 24, 2019 16:32

Enable set_input_zero_copy in GraphRuntime

cc6d164

yinghai force-pushed the runtime branch from 718d09b to cc6d164 Compare June 25, 2019 21:17

Fix LoadParams

4a84628

yinghai changed the title ~~[RFC] Enable set_input_zero_copy in GraphRuntime~~ [Runtime] Enable set_input_zero_copy in GraphRuntime Jun 25, 2019

Yinghai Lu added 6 commits June 25, 2019 23:55

Fix

319d302

lint

08e2694

Fix remote context issue

aa3e1e3

Fix

60cae5e

Remove LOG

8cca007

Remove unused variables

8f78fa5

tqchen added the status: need review label Jun 27, 2019

tqchen added the status: need test case need test cases to cover the change label Jul 2, 2019

Yinghai Lu added 3 commits July 5, 2019 10:00

Add tests

0e92320

works

655b0fc

More test scenarios

aed94b6

Yinghai Lu added 2 commits July 8, 2019 23:55

make it simpler

6f1e860

Remove unnecessary changes

34120d6

hlu1 suggested changes Jul 10, 2019

View reviewed changes

hlu1 reviewed Jul 10, 2019

View reviewed changes

Yinghai Lu added 2 commits July 10, 2019 00:41

Address comments

ab5637c

More comments

ff39998

hlu1 approved these changes Jul 10, 2019

View reviewed changes

tqchen requested changes Jul 11, 2019

View reviewed changes

zhiics reviewed Jul 11, 2019

View reviewed changes

Yinghai Lu added 2 commits July 10, 2019 22:03

Address comments

39eea1b

Fix build

57c7043

tqchen approved these changes Jul 15, 2019

View reviewed changes

tqchen merged commit afd4b3e into apache:master Jul 15, 2019

yinghai deleted the runtime branch July 15, 2019 20:21

tqchen added status: accepted and removed status: need test case need test cases to cover the change status: need review labels Jul 18, 2019

yongsun mentioned this pull request Jul 19, 2019

[Runtime] Allow parameter sharing between modules #3489

Merged

hlu1 mentioned this pull request Aug 19, 2019

[runtime] reduce set_input and set_input_zero_copy overhead #3805

Merged

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

kimjh12 mentioned this pull request May 7, 2021

[Filter/tvm] set input without memory copy nnstreamer/nnstreamer#3249

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Runtime] Enable set_input_zero_copy in GraphRuntime #3416

[Runtime] Enable set_input_zero_copy in GraphRuntime #3416

yinghai commented Jun 22, 2019

yinghai commented Jun 25, 2019

hlu1 commented Jun 27, 2019

yinghai commented Jun 29, 2019

yinghai commented Jul 8, 2019

hlu1 Jul 10, 2019

hlu1 Jul 10, 2019

yinghai Jul 10, 2019

hlu1 Jul 10, 2019

yinghai Jul 10, 2019

hlu1 Jul 10, 2019

hlu1 Jul 10, 2019

yinghai Jul 10, 2019

hlu1 Jul 10, 2019

yinghai Jul 10, 2019

hlu1 commented Jul 10, 2019

hlu1 left a comment

tqchen Jul 11, 2019

yinghai Jul 11, 2019

zhiics Jul 10, 2019

yinghai Jul 11, 2019 •

edited

Loading

zhiics Jul 11, 2019 •

edited

Loading

yinghai Jul 11, 2019

yinghai commented Jul 15, 2019

tiandiao123 commented Feb 11, 2021

[Runtime] Enable set_input_zero_copy in GraphRuntime #3416

[Runtime] Enable set_input_zero_copy in GraphRuntime #3416

Conversation

yinghai commented Jun 22, 2019

yinghai commented Jun 25, 2019

hlu1 commented Jun 27, 2019

yinghai commented Jun 29, 2019

yinghai commented Jul 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlu1 commented Jul 10, 2019

hlu1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yinghai Jul 11, 2019 • edited Loading

Choose a reason for hiding this comment

zhiics Jul 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yinghai commented Jul 15, 2019

tiandiao123 commented Feb 11, 2021

yinghai Jul 11, 2019 •

edited

Loading

zhiics Jul 11, 2019 •

edited

Loading