TVM bridge support to JIT NDArray Function by TVM #9880

tqchen · 2018-02-25T05:19:13Z

Support wrap TVM compiled function as an NDArray function. This enables use TVM as RTC module for MX's async function

Example

def test():
    import mxnet as mx
    import topi
    import tvm
    import numpy as np
    from tvm.contrib.mxnet import to_mxnet_func

    # build a TVM function through topi
    n = 20
    shape = (20,)
    scale = tvm.var("scale", dtype="float32")
    x = tvm.placeholder(shape)
    y = tvm.placeholder(shape)
    z = topi.broadcast_add(x, y)
    zz = tvm.compute(shape, lambda *i: z(*i) * scale)

    # build the function
    target = tvm.target.cuda()
    with target:
        s = topi.generic.schedule_injective(zz)
        f = tvm.build(s, [x, y, zz, scale])

    # get a mxnet version, that runs on async engine
    mxf = to_mxnet_func(f, const_loc=[0, 1])

    ctx = mx.gpu(0)
    xx = mx.nd.uniform(shape=shape, ctx=ctx)
    yy = mx.nd.uniform(shape=shape, ctx=ctx)
    zz = mx.nd.empty(shape=shape, ctx=ctx)

    # invoke myf: this runs in mxnet engine
    mxf(xx, yy, zz, 10.0)

    np.testing.assert_allclose(
        zz.asnumpy(), (xx.asnumpy() + yy.asnumpy()) * 10)

Technical Details

The bridge is quite natural as MXNet already uses DLTensor representation, which is used by TVM. The hard part is that we need to use MXNet's engine to run the compiled function, instead of running them directly.

Since TVM relies on LLVM, it is a bit too early to directly introduce this dependency. This PR does this differently. The TVM bridge depends on a header only component of TVM and does not have to link against tvm runtime.

When a user has TVM installed in their environment, TVM queries the MXTVMBridge function to get the wrapper logic and use it to run MXNet's function asynchronously. When a user does not have TVM installed, the additional logic won't add any additional link dependencies.

Because of this optional linking logic, I did not include test case for MXNet's CI. But have verified that the code works locally on GPU and CPU case here

Restriction

MXNet and TVM need to be built with same C++ ABI (because we pass around PackedFunc). This is somewhat a restriction but makes the code sharing easier by using the PackedFunc system. This usually can be achieved by using the same c++ compiler. For example, (g++4.8 and g++5.0 are not compatible, usually, the latest version of clang is compatible with latest version of g++), running incompatible ABI will cause undefined behavior in the code and possible segfault. This restriction can be possibly removed by forcing a pure C ABI, but requires additional work and may also affect the conciseness of code.

tqchen · 2018-02-25T05:25:39Z

based on an early version of work by @ZihengJiang

tqchen · 2018-02-25T05:25:45Z

cc @piiswrong @sxjscience

tqchen · 2018-02-25T05:27:07Z

TVM's side of PR apache/tvm#930

tqchen · 2018-02-25T17:33:05Z

The test now pass, @piiswrong @szha can you review?

Support wrap TVM compiled function as a NDArray function.

szha

there should be at least one test for the integration, and should be part of CI.

szha · 2018-02-25T19:37:50Z

python/mxnet/ndarray/ndarray.py

    # pylint: disable= no-member, undefined-variable

+    @property
+    def _tvm_handle(self):
+        return self.handle.value


what's this for?

This is a handle exposed for PackedFunc convention interface of TVM, to allow arbitrary positional arguments calls without adding new C API. Specifically, the wrapped function is a TVM PackedFunc that will recognize NDArray as an extension object, and pass the address of NDArray handles correctly to the arguments.

It is later received in here https://github.com/apache/incubator-mxnet/pull/9880/files#diff-3aa2a3c799e125e086769bc1d5f6490aR74

tqchen · 2018-02-25T19:48:53Z

I have detailed my reasoning of but yet adding test-case to this PR. The TVM bridge depends on a header only component of TVM and does not have to link against the tvm runtime. So merging this won't introduce any additional burdens to MXNet's runtime.

This feature can only be used when TVM and MXNet are both available in the system.

If we are open to bring TVM(with LLVM dependency) as part of CI, we can propose another PR to change the Jenkinsfile (to add LLVM as part of build), and bring the testcase into mxnet CI

marcoabreu

I'm heavily against splitting development and testing into two stages or PRs. Every PR must be covered by proper tests and thus the modifications to the jenkinsfile have to be part of this PR in order to get it approved by me.

While I'm aware that adding the TVM bridge is less impactful than for example the mkldnn PR, that case has clearly shown that tolerating merging PRs which are not entirely tested is harmful for the project and should be absolute exceptions.

tqchen · 2018-02-25T20:23:24Z

Just to be clear, it is the way TVM bridge works that starts this special situation. This PR requires joint changes in both repo, and the feature won't be available until changes in TVM and MXNet are both made.

Unlike mkldnn, the user do not have to switch on USE_TVM as hard dependency, but can directly use this feature when both TVM and MXNet are available. When user do not have TVM, this won't affect user at all(unlike in cases like mkldnn, which requires user to install mkldnn by default)

I can add a minimum MXNet side of testcase that verifies the bridge function exist. A more thourough test, however, would require bring TVM's build to MXNet's CI, which is a decision that I think need more discussions.

tqchen · 2018-02-25T20:29:17Z

This being said, I totally agree that having proper testing is important. That is why there is already test-cases that get locally verified for these changes (which is also optional test that only runs when both are available in TVM side of changes). So the correctness and quality of the code change is covered by the test_mxnet_bridge.py.

The only question is that if we want to directly bring that testcase to this PR now, that involves bring TVM's build into current jenkins pipeline, which I think deserves some discussion before we do so.

marcoabreu · 2018-02-25T20:31:58Z

I don't see any issues in building a dependency, we're doing this for a lot of cases. The test execution would be part of the integration test stage while any compilation happens during build stage.

Well if we want to advertise MXNet being compatible with TVM, then it should be properly tested. What kind of discussion would you expect?

tqchen · 2018-02-25T20:37:35Z

I just mean the cost of building TVM's LLVM dependency. I don't want to directly introduce additional burden to the CI while this being purely optional. Anyway, I get your point and will see if we can do a test with TVM's minimum dependency build

marcoabreu · 2018-02-25T20:45:00Z

Don't worry about that. We are currently looking into ccache integration which should reduce the impact by a lot - especially if only GCC but not nvcc is being used.

tqchen · 2018-02-25T23:27:29Z

Testcase and ci added

marcoabreu

Please see my comments. Also, please add more test to cover all added APIs

marcoabreu · 2018-02-26T00:11:26Z

tests/python/gpu/test_tvm_bridge.py

+        import tvm.contrib.mxnet
+        import topi
+    except ImportError:
+        return


Print message that test is not run because of missing tv

marcoabreu · 2018-02-26T00:12:28Z

tests/ci_build/install/ubuntu_install_tvm.sh

+echo USE_RPC=1 >> config.mk
+echo USE_GRAPH_RUNTIME=1 >> config.mk
+echo CUDA_PATH=/usr/local/cuda >> config.mk
+make -j10


Please make use of all CPU cores

marcoabreu · 2018-02-26T00:14:07Z

tests/ci_build/install/ubuntu_install_tvm.sh

+
+# Build and install TVM
+cd /tmp
+git clone https://github.com/dmlc/tvm/ --recursive


Are you aware that the result of this script is being cached indefinitely? In that case, it would be better to specify a stable version instead of Master as otherwise environments may differ on different slaves

i am aware of that, change to used a fixed tag

tqchen · 2018-02-26T03:49:47Z

@marcoabreu addressed the comments. The current test case already covers the API use-case of CPU and GPU of the async engine wrapping.

cjolivier01 · 2018-02-26T03:54:12Z

include/mxnet/tensor_blob.h

 namespace mxnet {

+// redefine DLPack enumeration to be backward compatible.
+const int kCPU = kDLCPU;


should this be constexpr? what keeps it from generating an integer in the data segment for each file compiled?

good catch, will change to constexpr

cjolivier01 · 2018-02-26T03:54:56Z

include/mxnet/tensor_blob.h

+const int kCPU = kDLCPU;
+const int kGPU = kDLGPU;
+// extension type code under TVM function.
+const int kTVMNDArrayTypeCode = 19;


would it make sense to make it an enumerator?

mostly because the 19 seems arbitrary? and maybe extensible to other numbers in the future? in that case, an enum could help to manage accidental overlap.
although my assumptions here may not be correct.

This enumerator is allocated in the TVM side and reserved for MXNet and NNVM project, so it is not arbitrary chosen https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L97

It is picked to be the last reserved enumerator for NNVM

will add a comment about this in the updated code

cjolivier01 · 2018-02-26T04:52:39Z

src/nnvm/tvm_bridge.cc

+        // by default assume we mutate the array.
+        if (const_loc_ptr < const_loc.size() &&
+            i == const_loc[const_loc_ptr]) {
+          const_vars->push_back(nd.var());


is this called a lot in performance-sensitive areas? should we do a reserve()?

(for all vectors here)

we don't know the size of vector before hand

cjolivier01 · 2018-02-26T04:54:33Z

src/nnvm/tvm_bridge.cc

+  // sorted position of constant arguments
+  std::vector<int> const_loc;
+  for (int i = 0; i < num_const; ++i) {
+    const_loc.push_back(wrap_args[i + 3].operator int());


This is not on critical path(function construction instead of running)

cjolivier01 · 2018-02-26T07:54:20Z

LGTM

marcoabreu · 2018-02-26T15:07:01Z

tests/python/gpu/test_tvm_bridge.py

+        import tvm.contrib.mxnet
+        import topi
+    except ImportError:
+        print("TVM bridge test skipped because TVM is missing...")


Use logging.warn instead

marcoabreu

Besides that one minor requested change about logging, approved from my side

piiswrong · 2018-02-26T18:23:24Z

src/nnvm/tvm_bridge.cc

+
+// C callback that can be used by TVM to extract
+// the WrapAsyncCall function.
+extern "C" MXNET_DLL int MXTVMBridge(TVMFunctionHandle pregister) {


Is this a CAPI? Should it be put in the c api folder?

this is queried by TVM, so not publicly facing C API. I feel that it is better to put in here. but we can move to c_api folder

piiswrong · 2018-02-26T18:26:56Z

tests/python/gpu/test_tvm_bridge.py

+            f = tvm.build(s, [x, y, zz, scale])
+
+        # get a mxnet version
+        mxf = tvm.contrib.mxnet.to_mxnet_func(f, const_loc=[0, 1])


This design feels weird. MXNet and TVM are mutually referencing each other.
mxnet backend is referencing TVM with TVMBridge and tvm is referencing mxnet with contrib.mxnet

The benefit of this design, is that MXNet do not have to link libtvm.so which comes with llvm dependency and it will work out of box when mxnet and tvm are both available

Logically, we can view this as both tvm and mxnet depend on header only fraction of TVM runtime, and TVM contrib library calls MXNet to retrieve the bridge function

Sorry to chime in here, was just wondering: Does TVM look for this function specifically, or does mxnet call a TVM function beforehand, passing it a pointer to this callback (which would seem less circular).

TVM looks for this function specifically. By this function I mean TVMBridge

tqchen · 2018-02-26T20:18:28Z

@piiswrong Can you check and merge, or provide a list of action items that you think should change?

marcoabreu

Would it be possible to document the number 19 a bit more and add documentation that allows cross referencing? Maybe add a link to the TVM implementation that actually defines this number.

tqchen · 2018-02-26T23:18:55Z

I added comment in the declaration point

marcoabreu · 2018-02-26T23:22:36Z

Not in the Python part and there's no link to where this number actually comes from. I personally would not know where to look in TVM

tqchen · 2018-02-26T23:51:15Z

OK, will do an update in the python part as well

tqchen · 2018-02-26T23:57:10Z

Thanks for the reviews! If there is no requests to change things today. I am going to merge this in tomorrow

piiswrong · 2018-02-27T02:08:25Z

src/nnvm/tvm_bridge.cc

+ *
+ *  We do require TVM and MXNet to be built with same C++ ABI of std::function
+ */
+#define TVM_RUNTIME_HEADER_ONLY 1


should you undefine it at the end?

since it is in cc file, this is not necessary

piiswrong · 2018-02-27T02:11:38Z

I understand that this is a quick hack to plug TVM support to mxnet NDArray and I think we should merge it for the time being.
But for the record, I think we need better integrations in the future.

tqchen · 2018-02-27T02:41:36Z

This design is certainly unconventional and takes me a while to come up with it, in a sense that it tries to achieve something that would otherwise impossible to achieve. Introducing TVM support to MXNet NDArray without a direct link dependency so a user can try it out anytime by using the default binary distro.

The complication and benefit bough by MXNet's async execution makes the callback wrapping style case a bit tricky. If the function call had been simple CUDA calls things would be simpler. But thankfully the logic is clear enough.

With this we can do things like write customized update rule in TVM while still JIT compile to get the best speedups

tornadomeet · 2018-02-28T07:04:14Z

build with commit 48749a5d43864a41653ccd8746cdccf1477b2ae4, error exits during make

tvm/runtiime/packed_func.h, No such file or directory

tqchen · 2018-02-28T16:58:12Z

@tornadomeet do git submodule update --recursive

tqchen · 2018-02-28T17:00:14Z

for the first time, do git submodule update --recursive --init

tornadomeet · 2018-03-01T01:10:06Z

@tqchen yes, that fixed.

jwfromm · 2018-03-10T20:40:16Z

Really like this feature, but am having some trouble getting to work. Even with MXNet and TVM both built from source, I'm getting the error

AttributeError: /opt/conda/lib/python3.6/site-packages/mxnet/libmxnet.so: undefined symbol: MXTVMBridge

When I try to call to_mxnet_func. Is there some other dependency needed to make it work?

tqchen · 2018-03-10T21:23:38Z

@jwfromm We do need to build MXNet from source for now before next release.

This should work out of the box if you build MXNet and TVM from source.

The error is likely due to the fact that your /opt/conda/lib/python3.6/site-packages/mxnet/libmxnet.so do not yet contain this symbol. Please confirm that you build from the latest source correctly.

Specifically, please confirm the tvm_bridge.cc is built and linked into the shared library https://github.com/apache/incubator-mxnet/blob/master/src/nnvm/tvm_bridge.cc#L174 and use nm -g to see if the symbol is indeed there

jwfromm · 2018-03-11T00:21:49Z

You're absolutely right, I was accidentally overwriting the built from source library with a pip install. Works great now!

* TVM bridge support. Support wrap TVM compiled function as a NDArray function. * Testcases and CI to include TVM as dependency * address review comments * Add more comments, change to constexpr * change to log warn * update comment on the type code

merrymercy · 2018-09-16T15:08:40Z

@tqchen
I can use this feature by building mxnet from source. But I met errors when using packages from pip (v1.2, v1.3)

sometimes the error is

terminate called after throwing an instance of 'dmlc::Error'
  what():  TVMCall CFunc Error:
Traceback (most recent call last):
  File "tvm/_ffi/_cython/./function.pxi", line 38, in tvm._ffi._cy3.core.tvm_callback
TypeError: _list() missing 2 required positional arguments: 'name' and 'func'

It seems we cannot call

https://github.com/apache/incubator-mxnet/blob/ff39cf11fdbc2036b59a88f0173e4a6ff481aa08/src/nnvm/tvm_bridge.cc#L178
correctly.

sometimes the error is


Segmentation fault: 11

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x1d86a2) [0x7f8c64e396a2]
[bt] (1) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x32eb91e) [0x7f8c67f4c91e]
[bt] (2) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f8c88d584b0]
[bt] (3) /sampa/home/mercy/tvm/python/tvm/_ffi/_cy3/core.cpython-35m-x86_64-linux-gnu.so(+0x15b2c) [0x7f8bf07b8b2c]
[bt] (4) /sampa/home/mercy/tvm/build/libtvm.so(+0x60ea72) [0x7f8c1f639a72]
[bt] (5) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(MXTVMBridge+0xd8) [0x7f8c67a46638]
[bt] (6) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f8c87bace20]
[bt] (7) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7f8c87bac88b]
[bt] (8) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7f8c87ba701a]
[bt] (9) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(+0x9fcb) [0x7f8c87b9afcb]

wkcn · 2019-01-30T02:43:18Z

@tqchen Hi!
I would like to use MXTVMBridge to accelerate my project, but there is a flaky problem.

I wrote a minimum reproducible example.
In this example, tvm_packed_func.h is simplyfied from TVM.

Steps to reproduce

clone the example
change the path of libmxnet.so in the line 55 of the code.
make
./test

In the function SetMXTVMBridge of the example,
args.num_args is wrong and args.values[0].v_str is sometimes wrong when MXNet is installed by pip, namely pip install mxnet --pre.
However, the result is right when MXNet is built by myself, make -j 5 USE_OPENCV=1 USE_BLAS=openblas in the latest MXNet source.
The problem may be similar to the bug @merrymercy met.

Thanks.

tqchen requested review from cjolivier01 and szha as code owners February 25, 2018 05:19

tqchen changed the title ~~TVM bridge support~~ TVM bridge support to JIT NDArray Function by TVM Feb 25, 2018

tqchen mentioned this pull request Feb 25, 2018

MXNet NDArray bridge. apache/tvm#930

Merged

tqchen requested a review from marcoabreu as a code owner February 25, 2018 05:43

TVM bridge support.

46504e5

Support wrap TVM compiled function as a NDArray function.

tqchen force-pushed the master branch from 5e9d511 to 46504e5 Compare February 25, 2018 17:35

szha reviewed Feb 25, 2018

View reviewed changes

marcoabreu suggested changes Feb 25, 2018

View reviewed changes

Testcases and CI to include TVM as dependency

5aa96c4

tqchen force-pushed the master branch from d63dbd1 to 5aa96c4 Compare February 25, 2018 22:11

marcoabreu suggested changes Feb 26, 2018

View reviewed changes

address review comments

385589a

cjolivier01 reviewed Feb 26, 2018

View reviewed changes

Add more comments, change to constexpr

6a69ff5

cjolivier01 approved these changes Feb 26, 2018

View reviewed changes

marcoabreu reviewed Feb 26, 2018

View reviewed changes

marcoabreu approved these changes Feb 26, 2018

View reviewed changes

piiswrong reviewed Feb 26, 2018

View reviewed changes

change to log warn

ed9ed9d

szha approved these changes Feb 26, 2018

View reviewed changes

marcoabreu reviewed Feb 26, 2018

View reviewed changes

update comment on the type code

7d4524a

marcoabreu approved these changes Feb 27, 2018

View reviewed changes

piiswrong reviewed Feb 27, 2018

View reviewed changes

tqchen merged commit 3545697 into apache:master Feb 27, 2018

nhynes mentioned this pull request Mar 20, 2018

[question] Convert MXNet NDArray to TVM NDArray #10162

Closed

wkcn mentioned this pull request Jan 30, 2019

The problem of ABI compatibility in MXTVMBridge #14030

Closed

TVM bridge support to JIT NDArray Function by TVM #9880

TVM bridge support to JIT NDArray Function by TVM #9880

Conversation

tqchen commented Feb 25, 2018 • edited Loading

Example

Technical Details

Restriction

tqchen commented Feb 25, 2018

tqchen commented Feb 25, 2018 • edited Loading

tqchen commented Feb 25, 2018

tqchen commented Feb 25, 2018 • edited Loading

szha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Feb 25, 2018

marcoabreu left a comment

Choose a reason for hiding this comment

tqchen commented Feb 25, 2018

tqchen commented Feb 25, 2018

marcoabreu commented Feb 25, 2018

tqchen commented Feb 25, 2018

marcoabreu commented Feb 25, 2018

tqchen commented Feb 25, 2018

marcoabreu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Feb 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjolivier01 commented Feb 26, 2018

Choose a reason for hiding this comment

marcoabreu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen Feb 26, 2018 • edited Loading

Choose a reason for hiding this comment

tqchen commented Feb 26, 2018

marcoabreu left a comment

Choose a reason for hiding this comment

tqchen commented Feb 26, 2018

marcoabreu commented Feb 26, 2018

tqchen commented Feb 26, 2018

tqchen commented Feb 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piiswrong commented Feb 27, 2018

tqchen commented Feb 27, 2018

tornadomeet commented Feb 28, 2018 • edited Loading

tqchen commented Feb 28, 2018 • edited Loading

tqchen commented Feb 28, 2018

tornadomeet commented Mar 1, 2018

jwfromm commented Mar 10, 2018

tqchen commented Mar 10, 2018 • edited Loading

jwfromm commented Mar 11, 2018

merrymercy commented Sep 16, 2018

wkcn commented Jan 30, 2019

Steps to reproduce

tqchen commented Feb 25, 2018 •

edited

Loading

tqchen commented Feb 25, 2018 •

edited

Loading

tqchen commented Feb 25, 2018 •

edited

Loading

tqchen commented Feb 26, 2018 •

edited

Loading

tqchen Feb 26, 2018 •

edited

Loading

tornadomeet commented Feb 28, 2018 •

edited

Loading

tqchen commented Feb 28, 2018 •

edited

Loading

tqchen commented Mar 10, 2018 •

edited

Loading