Infra for tvm op runtime dispatch #16100

hzfan · 2019-09-05T10:05:16Z

Description

This PR implements an infra to let users dispatch the execution of a tvm operator to different schedules according to the runtime input shapes. This helps with acceleration.

A gemm example can be found in

Kernel definition: contrib/tvmop/core/multiarray.py
Operator registry and dispatch: src/operator/contrib/tvmop/dot.cc
Benchmark: benchmark/python/tvmop/benchmark_tvmop.py

The following are some experimental results for matrix multiplication between two n * n matrix. Note that benchmark results cannot be reproduced until this gets merged.

n	After Dispatch (ms)	Before Dispatch (ms)
1024	177	482
1056	190	366
1088	200	424

The example schedule is roughly equivalent to the Blocking optimization. More opt (like vectorization, loop permutation, array packing, write cache for blocks, parallel) can be used for further acceleration.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Add dispatch infra
Add an example dot operator

Comments

Thank @yzhliu and @reminisce for guidance and review.

ZhennanQin · 2019-09-06T05:21:34Z

Just for curious. Based on my knowledge, tvm op kernel is pre-compiled and then linked together with MXNet. How can it be configured according to the runtime input shapes?

hzfan · 2019-09-06T16:09:05Z

Just for curious. Based on my knowledge, tvm op kernel is pre-compiled and then linked together with MXNet. How can it be configured according to the runtime input shapes?

Yes, kernels are pre-compiled. At compile time, several different schedules (kernels) for a single op are defined and compiled. Then at runtime, with the runtime input shape, the most suitable kernel is chosen.

It's true that the kernel is pre-compiled, but we have multiple available kernels for one single op, so we can choose the most efficient one based on the runtime input shape.

ZhennanQin · 2019-09-09T01:06:04Z

@hzfan Thanks for explanation. My next question is, how do I know which schedule is the best one for a certain input shape? Static rule defined or runtime tuned?

hzfan · 2019-09-09T02:46:34Z

@hzfan Thanks for explanation. My next question is, how do I know which schedule is the best one for a certain input shape? Static rule defined or runtime tuned?

That's a good question. Actually we have considered both options, and for now we use simple static rules. To be more specific, for now I require the size of a for-loop to be multiples of its splitting factor (if the for-loop is splitted). This helps eliminate a if-condition, and thus makes it faster.

Runtime tuning has also been considered, but has not been implemented in this version. The idea is to try all the available schedules for every runtime shape, measure their performance, and cache the best choice. This is quite similar to autotvm.

Pros: the choice is optimal
Cons: first-time running of a shape not encountered before will be slow

yzhliu · 2019-09-17T17:09:35Z

Also cc @icemelon9 @kevinthesun

yzhliu · 2019-09-17T18:25:21Z

benchmark/python/tvmop/benchmark_tvmop.py

+    return diff / repeat
+
+
+def test_tvm_dot():


who uses this? for testing only?

Yes, for reproducing the benchmark result. Other code in benchmark/ only serves this purpose, too.

yzhliu · 2019-09-17T19:55:39Z

python/mxnet/libinfo.py

+    conf_path = [p for p in candidates_path if os.path.exists(p) and os.path.isfile(p)]
+    if len(conf_path) == 0:
+        raise RuntimeError('Cannot find the TVM op config.\n' +
+                           'List of candidates:\n' + str('\n'.join(candidates_path)))


can we fallback to default behavior if config file is missing?

Yes, just a little bit more code I think.

In which case will the config file be missing? It is generated in compile time (even if no tunable parameters are needed, a nearly empty config will be generated too).

contrib/tvmop/compile.py

reminisce · 2019-09-18T05:54:51Z

contrib/tvmop/core/multiarray.py

+def dot(dtype, fallback):
+    cfg = autotvm.get_config()
+    cfg.define_knob("bn", [64] if fallback else [64, 32])
+    cfg.define_knob("factor", [4] if fallback else [4])


Seems it's always [4] no matter what fallback is?

Yes. The difference is that when fallback is false, the shape comes with a hint, indicating that it is multiples of 4.

This factor means in any case I want to split the loop by a factor of 4. When fallback, there is no guarantee the loop size is a multiples of 4., while when not fallback, there is.

I understand what you are trying to get here. My point is that this line is equivalent to the following, correct?

cfg.define_knob("factor", [4])

src/operator/tvmop/op_module.h

reminisce · 2019-09-18T06:54:38Z

contrib/tvmop/space.py

+from collections import OrderedDict
+import numpy as _np
+
+class OtherOptionSpace(object):


Can we call this GeneralOptionSpace? Same for other places: other -> general.

Actually the OtherOptionSpace comes from tvm/python/tvm/autotvm/task/space.py. Besides OtherOptionSpace, there is SplitSpace, ReorderSpace and AnnotateSpace. Maybe in the future the three other spaces will be needed, so I keep its name consistent with tvm.

Fair enough.

reminisce · 2019-09-18T08:18:33Z

contrib/tvmop/compile.py

+                    if op.dispatch is True:
+                        config_space = autotvm.ConfigSpace()
+                        with autotvm.task.ApplyConfig(config_space):
+                            sch, args = op.func(fallback=False, **each_kwargs)


This requires fallback as a mandatory parameter in op.func, which is not ideal in terms of usability in my opinion. We should support compiling whatever users define and treat the fallback knob as an advanced feature for performance tuning.

A way to achieve such purpose is inspect the signature of op.func for keyword fallback. If the keyword does not exist, we just compile the op using the default schedule, e.g.

if 'fallback' in str(inspect.signature(op.func)): sch, args = op.func(fallback=False, **each_kwargs) else: sch, args = op.func(**each_kwargs)

@yzhliu What do you think?

Done. I set self.dispatchable = 'fallback' in inspect.signature(self.func).parameters in opdef.py

yzhliu

LGTM

* infra for dispatch tvm op * fix ci and sanity error * disable shape with hint and fix coding style * rename to avoid conflict with original dot * update tvm and use soft link * config file moves to lib/ when using Makefile * add tvmop.conf to ci * fix rebase * fix rebase * use inspect to detect dispatchable func

eric-haibin-lin · 2019-12-21T17:46:21Z

do we have a developer guide using tvm op?

hzfan · 2019-12-23T03:58:22Z

do we have a developer guide using tvm op?

Seems @yzhliu is working on it.

hzfan requested review from anirudh2290, eric-haibin-lin and szha as code owners September 5, 2019 10:05

hzfan force-pushed the autotvm_pr branch from c503cee to c7d1920 Compare September 5, 2019 16:42

yzhliu reviewed Sep 17, 2019

View reviewed changes

reminisce reviewed Sep 18, 2019

View reviewed changes

hzfan force-pushed the autotvm_pr branch 2 times, most recently from d69cc01 to 6b6c65f Compare September 28, 2019 06:03

hzfan requested review from aaronmarkham and marcoabreu as code owners September 28, 2019 06:03

Fan added 8 commits October 25, 2019 12:00

infra for dispatch tvm op

b55bbaa

fix ci and sanity error

2913663

disable shape with hint and fix coding style

b9249f6

rename to avoid conflict with original dot

7426fe0

update tvm and use soft link

76c0c12

config file moves to lib/ when using Makefile

e6a7a57

add tvmop.conf to ci

5bc4b36

fix rebase

3f594b4

hzfan force-pushed the autotvm_pr branch from 6b6c65f to 3f594b4 Compare October 25, 2019 06:38

fix rebase

b4f2d78

hzfan force-pushed the autotvm_pr branch 2 times, most recently from 2fe9f81 to 587812d Compare October 26, 2019 04:04

use inspect to detect dispatchable func

eec0dc2

hzfan force-pushed the autotvm_pr branch from 587812d to eec0dc2 Compare October 26, 2019 06:00

yzhliu approved these changes Oct 28, 2019

View reviewed changes

reminisce approved these changes Oct 28, 2019

View reviewed changes

reminisce merged commit 9322864 into apache:master Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infra for tvm op runtime dispatch #16100

Infra for tvm op runtime dispatch #16100

hzfan commented Sep 5, 2019 •

edited

Loading

ZhennanQin commented Sep 6, 2019

hzfan commented Sep 6, 2019

ZhennanQin commented Sep 9, 2019

hzfan commented Sep 9, 2019 •

edited

Loading

yzhliu commented Sep 17, 2019

yzhliu Sep 17, 2019

hzfan Sep 18, 2019

yzhliu Sep 17, 2019

hzfan Sep 18, 2019 •

edited

Loading

reminisce Sep 18, 2019

hzfan Sep 18, 2019

reminisce Sep 22, 2019

hzfan Sep 25, 2019

reminisce Sep 18, 2019

hzfan Sep 18, 2019

reminisce Sep 22, 2019

reminisce Sep 18, 2019

hzfan Oct 25, 2019

yzhliu left a comment

eric-haibin-lin commented Dec 21, 2019

hzfan commented Dec 23, 2019

Infra for tvm op runtime dispatch #16100

Infra for tvm op runtime dispatch #16100

Conversation

hzfan commented Sep 5, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

ZhennanQin commented Sep 6, 2019

hzfan commented Sep 6, 2019

ZhennanQin commented Sep 9, 2019

hzfan commented Sep 9, 2019 • edited Loading

yzhliu commented Sep 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hzfan Sep 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yzhliu left a comment

Choose a reason for hiding this comment

eric-haibin-lin commented Dec 21, 2019

hzfan commented Dec 23, 2019

hzfan commented Sep 5, 2019 •

edited

Loading

hzfan commented Sep 9, 2019 •

edited

Loading

hzfan Sep 18, 2019 •

edited

Loading