Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

TVM bridge support to JIT NDArray Function by TVM #9880

Merged
merged 6 commits into from
Feb 27, 2018
Merged

Conversation

tqchen
Copy link
Member

@tqchen tqchen commented Feb 25, 2018

Support wrap TVM compiled function as an NDArray function. This enables use TVM as RTC module for MX's async function

Example

def test():
    import mxnet as mx
    import topi
    import tvm
    import numpy as np
    from tvm.contrib.mxnet import to_mxnet_func

    # build a TVM function through topi
    n = 20
    shape = (20,)
    scale = tvm.var("scale", dtype="float32")
    x = tvm.placeholder(shape)
    y = tvm.placeholder(shape)
    z = topi.broadcast_add(x, y)
    zz = tvm.compute(shape, lambda *i: z(*i) * scale)

    # build the function
    target = tvm.target.cuda()
    with target:
        s = topi.generic.schedule_injective(zz)
        f = tvm.build(s, [x, y, zz, scale])

    # get a mxnet version, that runs on async engine
    mxf = to_mxnet_func(f, const_loc=[0, 1])

    ctx = mx.gpu(0)
    xx = mx.nd.uniform(shape=shape, ctx=ctx)
    yy = mx.nd.uniform(shape=shape, ctx=ctx)
    zz = mx.nd.empty(shape=shape, ctx=ctx)

    # invoke myf: this runs in mxnet engine
    mxf(xx, yy, zz, 10.0)

    np.testing.assert_allclose(
        zz.asnumpy(), (xx.asnumpy() + yy.asnumpy()) * 10)

Technical Details

The bridge is quite natural as MXNet already uses DLTensor representation, which is used by TVM. The hard part is that we need to use MXNet's engine to run the compiled function, instead of running them directly.

Since TVM relies on LLVM, it is a bit too early to directly introduce this dependency. This PR does this differently. The TVM bridge depends on a header only component of TVM and does not have to link against tvm runtime.

When a user has TVM installed in their environment, TVM queries the MXTVMBridge function to get the wrapper logic and use it to run MXNet's function asynchronously. When a user does not have TVM installed, the additional logic won't add any additional link dependencies.

Because of this optional linking logic, I did not include test case for MXNet's CI. But have verified that the code works locally on GPU and CPU case here

Restriction

MXNet and TVM need to be built with same C++ ABI (because we pass around PackedFunc). This is somewhat a restriction but makes the code sharing easier by using the PackedFunc system. This usually can be achieved by using the same c++ compiler. For example, (g++4.8 and g++5.0 are not compatible, usually, the latest version of clang is compatible with latest version of g++), running incompatible ABI will cause undefined behavior in the code and possible segfault. This restriction can be possibly removed by forcing a pure C ABI, but requires additional work and may also affect the conciseness of code.

@tqchen
Copy link
Member Author

tqchen commented Feb 25, 2018

based on an early version of work by @ZihengJiang

@tqchen
Copy link
Member Author

tqchen commented Feb 25, 2018

cc @piiswrong @sxjscience

@tqchen tqchen changed the title TVM bridge support TVM bridge support to JIT NDArray Function by TVM Feb 25, 2018
@tqchen
Copy link
Member Author

tqchen commented Feb 25, 2018

TVM's side of PR apache/tvm#930

@tqchen
Copy link
Member Author

tqchen commented Feb 25, 2018

The test now pass, @piiswrong @szha can you review?

Support wrap TVM compiled function as a NDArray function.
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be at least one test for the integration, and should be part of CI.

# pylint: disable= no-member, undefined-variable

@property
def _tvm_handle(self):
return self.handle.value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a handle exposed for PackedFunc convention interface of TVM, to allow arbitrary positional arguments calls without adding new C API. Specifically, the wrapped function is a TVM PackedFunc that will recognize NDArray as an extension object, and pass the address of NDArray handles correctly to the arguments.

It is later received in here https://github.com/apache/incubator-mxnet/pull/9880/files#diff-3aa2a3c799e125e086769bc1d5f6490aR74

@tqchen
Copy link
Member Author

tqchen commented Feb 25, 2018

I have detailed my reasoning of but yet adding test-case to this PR. The TVM bridge depends on a header only component of TVM and does not have to link against the tvm runtime. So merging this won't introduce any additional burdens to MXNet's runtime.

This feature can only be used when TVM and MXNet are both available in the system.

If we are open to bring TVM(with LLVM dependency) as part of CI, we can propose another PR to change the Jenkinsfile (to add LLVM as part of build), and bring the testcase into mxnet CI

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm heavily against splitting development and testing into two stages or PRs. Every PR must be covered by proper tests and thus the modifications to the jenkinsfile have to be part of this PR in order to get it approved by me.

While I'm aware that adding the TVM bridge is less impactful than for example the mkldnn PR, that case has clearly shown that tolerating merging PRs which are not entirely tested is harmful for the project and should be absolute exceptions.

@tqchen
Copy link
Member Author

tqchen commented Feb 25, 2018

Just to be clear, it is the way TVM bridge works that starts this special situation. This PR requires joint changes in both repo, and the feature won't be available until changes in TVM and MXNet are both made.

Unlike mkldnn, the user do not have to switch on USE_TVM as hard dependency, but can directly use this feature when both TVM and MXNet are available. When user do not have TVM, this won't affect user at all(unlike in cases like mkldnn, which requires user to install mkldnn by default)

I can add a minimum MXNet side of testcase that verifies the bridge function exist. A more thourough test, however, would require bring TVM's build to MXNet's CI, which is a decision that I think need more discussions.

@tqchen
Copy link
Member Author

tqchen commented Feb 25, 2018

This being said, I totally agree that having proper testing is important. That is why there is already test-cases that get locally verified for these changes (which is also optional test that only runs when both are available in TVM side of changes). So the correctness and quality of the code change is covered by the test_mxnet_bridge.py.

The only question is that if we want to directly bring that testcase to this PR now, that involves bring TVM's build into current jenkins pipeline, which I think deserves some discussion before we do so.

@marcoabreu
Copy link
Contributor

I don't see any issues in building a dependency, we're doing this for a lot of cases. The test execution would be part of the integration test stage while any compilation happens during build stage.

Well if we want to advertise MXNet being compatible with TVM, then it should be properly tested. What kind of discussion would you expect?

@tqchen
Copy link
Member Author

tqchen commented Feb 25, 2018

I just mean the cost of building TVM's LLVM dependency. I don't want to directly introduce additional burden to the CI while this being purely optional. Anyway, I get your point and will see if we can do a test with TVM's minimum dependency build

@marcoabreu
Copy link
Contributor

Don't worry about that. We are currently looking into ccache integration which should reduce the impact by a lot - especially if only GCC but not nvcc is being used.

@tqchen
Copy link
Member Author

tqchen commented Feb 25, 2018

Testcase and ci added

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments. Also, please add more test to cover all added APIs

import tvm.contrib.mxnet
import topi
except ImportError:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print message that test is not run because of missing tv

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TVM*

echo USE_RPC=1 >> config.mk
echo USE_GRAPH_RUNTIME=1 >> config.mk
echo CUDA_PATH=/usr/local/cuda >> config.mk
make -j10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make use of all CPU cores


# Build and install TVM
cd /tmp
git clone https://github.com/dmlc/tvm/ --recursive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you aware that the result of this script is being cached indefinitely? In that case, it would be better to specify a stable version instead of Master as otherwise environments may differ on different slaves

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am aware of that, change to used a fixed tag

@tqchen
Copy link
Member Author

tqchen commented Feb 26, 2018

@marcoabreu addressed the comments. The current test case already covers the API use-case of CPU and GPU of the async engine wrapping.

namespace mxnet {

// redefine DLPack enumeration to be backward compatible.
const int kCPU = kDLCPU;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be constexpr? what keeps it from generating an integer in the data segment for each file compiled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, will change to constexpr

const int kCPU = kDLCPU;
const int kGPU = kDLGPU;
// extension type code under TVM function.
const int kTVMNDArrayTypeCode = 19;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to make it an enumerator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly because the 19 seems arbitrary? and maybe extensible to other numbers in the future? in that case, an enum could help to manage accidental overlap.
although my assumptions here may not be correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enumerator is allocated in the TVM side and reserved for MXNet and NNVM project, so it is not arbitrary chosen https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L97

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is picked to be the last reserved enumerator for NNVM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add a comment about this in the updated code

// by default assume we mutate the array.
if (const_loc_ptr < const_loc.size() &&
i == const_loc[const_loc_ptr]) {
const_vars->push_back(nd.var());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this called a lot in performance-sensitive areas? should we do a reserve()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for all vectors here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't know the size of vector before hand

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ik

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

// sorted position of constant arguments
std::vector<int> const_loc;
for (int i = 0; i < num_const; ++i) {
const_loc.push_back(wrap_args[i + 3].operator int());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reserve?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not on critical path(function construction instead of running)

@cjolivier01
Copy link
Member

LGTM

import tvm.contrib.mxnet
import topi
except ImportError:
print("TVM bridge test skipped because TVM is missing...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logging.warn instead

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides that one minor requested change about logging, approved from my side


// C callback that can be used by TVM to extract
// the WrapAsyncCall function.
extern "C" MXNET_DLL int MXTVMBridge(TVMFunctionHandle pregister) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a CAPI? Should it be put in the c api folder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is queried by TVM, so not publicly facing C API. I feel that it is better to put in here. but we can move to c_api folder

f = tvm.build(s, [x, y, zz, scale])

# get a mxnet version
mxf = tvm.contrib.mxnet.to_mxnet_func(f, const_loc=[0, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design feels weird. MXNet and TVM are mutually referencing each other.
mxnet backend is referencing TVM with TVMBridge and tvm is referencing mxnet with contrib.mxnet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of this design, is that MXNet do not have to link libtvm.so which comes with llvm dependency and it will work out of box when mxnet and tvm are both available

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically, we can view this as both tvm and mxnet depend on header only fraction of TVM runtime, and TVM contrib library calls MXNet to retrieve the bridge function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to chime in here, was just wondering: Does TVM look for this function specifically, or does mxnet call a TVM function beforehand, passing it a pointer to this callback (which would seem less circular).

Copy link
Member Author

@tqchen tqchen Feb 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TVM looks for this function specifically. By this function I mean TVMBridge

@tqchen
Copy link
Member Author

tqchen commented Feb 26, 2018

@piiswrong Can you check and merge, or provide a list of action items that you think should change?

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to document the number 19 a bit more and add documentation that allows cross referencing? Maybe add a link to the TVM implementation that actually defines this number.

@tqchen
Copy link
Member Author

tqchen commented Feb 26, 2018

I added comment in the declaration point

@marcoabreu
Copy link
Contributor

Not in the Python part and there's no link to where this number actually comes from. I personally would not know where to look in TVM

@tqchen
Copy link
Member Author

tqchen commented Feb 26, 2018

OK, will do an update in the python part as well

@tqchen
Copy link
Member Author

tqchen commented Feb 26, 2018

Thanks for the reviews! If there is no requests to change things today. I am going to merge this in tomorrow

*
* We do require TVM and MXNet to be built with same C++ ABI of std::function
*/
#define TVM_RUNTIME_HEADER_ONLY 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you undefine it at the end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it is in cc file, this is not necessary

@piiswrong
Copy link
Contributor

I understand that this is a quick hack to plug TVM support to mxnet NDArray and I think we should merge it for the time being.
But for the record, I think we need better integrations in the future.

@tqchen
Copy link
Member Author

tqchen commented Feb 27, 2018

This design is certainly unconventional and takes me a while to come up with it, in a sense that it tries to achieve something that would otherwise impossible to achieve. Introducing TVM support to MXNet NDArray without a direct link dependency so a user can try it out anytime by using the default binary distro.

The complication and benefit bough by MXNet's async execution makes the callback wrapping style case a bit tricky. If the function call had been simple CUDA calls things would be simpler. But thankfully the logic is clear enough.

With this we can do things like write customized update rule in TVM while still JIT compile to get the best speedups

@tqchen tqchen merged commit 3545697 into apache:master Feb 27, 2018
@tornadomeet
Copy link
Contributor

tornadomeet commented Feb 28, 2018

build with commit 48749a5d43864a41653ccd8746cdccf1477b2ae4, error exits during make

tvm/runtiime/packed_func.h, No such file or directory

@tqchen
Copy link
Member Author

tqchen commented Feb 28, 2018

@tornadomeet do git submodule update --recursive

@tqchen
Copy link
Member Author

tqchen commented Feb 28, 2018

for the first time, do git submodule update --recursive --init

@tornadomeet
Copy link
Contributor

@tqchen yes, that fixed.

@jwfromm
Copy link

jwfromm commented Mar 10, 2018

Really like this feature, but am having some trouble getting to work. Even with MXNet and TVM both built from source, I'm getting the error

AttributeError: /opt/conda/lib/python3.6/site-packages/mxnet/libmxnet.so: undefined symbol: MXTVMBridge

When I try to call to_mxnet_func. Is there some other dependency needed to make it work?

@tqchen
Copy link
Member Author

tqchen commented Mar 10, 2018

@jwfromm We do need to build MXNet from source for now before next release.

This should work out of the box if you build MXNet and TVM from source.

The error is likely due to the fact that your /opt/conda/lib/python3.6/site-packages/mxnet/libmxnet.so do not yet contain this symbol. Please confirm that you build from the latest source correctly.

Specifically, please confirm the tvm_bridge.cc is built and linked into the shared library https://github.com/apache/incubator-mxnet/blob/master/src/nnvm/tvm_bridge.cc#L174 and use nm -g to see if the symbol is indeed there

@jwfromm
Copy link

jwfromm commented Mar 11, 2018

You're absolutely right, I was accidentally overwriting the built from source library with a pip install. Works great now!

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
* TVM bridge support.
Support wrap TVM compiled function as a NDArray function.

* Testcases and CI to include TVM as dependency

* address review comments

* Add more comments, change to constexpr

* change to log warn

* update comment on the type code
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* TVM bridge support.
Support wrap TVM compiled function as a NDArray function.

* Testcases and CI to include TVM as dependency

* address review comments

* Add more comments, change to constexpr

* change to log warn

* update comment on the type code
@merrymercy
Copy link
Member

@tqchen
I can use this feature by building mxnet from source. But I met errors when using packages from pip (v1.2, v1.3)

sometimes the error is

terminate called after throwing an instance of 'dmlc::Error'
  what():  TVMCall CFunc Error:
Traceback (most recent call last):
  File "tvm/_ffi/_cython/./function.pxi", line 38, in tvm._ffi._cy3.core.tvm_callback
TypeError: _list() missing 2 required positional arguments: 'name' and 'func'

It seems we cannot call

https://github.com/apache/incubator-mxnet/blob/ff39cf11fdbc2036b59a88f0173e4a6ff481aa08/src/nnvm/tvm_bridge.cc#L178
correctly.

sometimes the error is


Segmentation fault: 11

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x1d86a2) [0x7f8c64e396a2]
[bt] (1) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x32eb91e) [0x7f8c67f4c91e]
[bt] (2) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f8c88d584b0]
[bt] (3) /sampa/home/mercy/tvm/python/tvm/_ffi/_cy3/core.cpython-35m-x86_64-linux-gnu.so(+0x15b2c) [0x7f8bf07b8b2c]
[bt] (4) /sampa/home/mercy/tvm/build/libtvm.so(+0x60ea72) [0x7f8c1f639a72]
[bt] (5) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(MXTVMBridge+0xd8) [0x7f8c67a46638]
[bt] (6) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f8c87bace20]
[bt] (7) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7f8c87bac88b]
[bt] (8) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7f8c87ba701a]
[bt] (9) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(+0x9fcb) [0x7f8c87b9afcb]

@wkcn
Copy link
Member

wkcn commented Jan 30, 2019

@tqchen Hi!
I would like to use MXTVMBridge to accelerate my project, but there is a flaky problem.

I wrote a minimum reproducible example.
In this example, tvm_packed_func.h is simplyfied from TVM.

Steps to reproduce

  1. clone the example
  2. change the path of libmxnet.so in the line 55 of the code.
  3. make
  4. ./test

In the function SetMXTVMBridge of the example,
args.num_args is wrong and args.values[0].v_str is sometimes wrong when MXNet is installed by pip, namely pip install mxnet --pre.
However, the result is right when MXNet is built by myself, make -j 5 USE_OPENCV=1 USE_BLAS=openblas in the latest MXNet source.
The problem may be similar to the bug @merrymercy met.

Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants