[Bug] tutorials do not build from a clean source tree #9013

areusch · 2021-09-15T02:58:46Z

I think this might be a compiler caching bug cc @tqchen @jroesch @mbs-octoml . Not caught in the regression because the regression rebuilds only changed tutorials.

Steps to reproduce:

git checkout dc2f70e3c8a9b14b9e414ecf768ad32e6c3c3960
rm -rf build
docker/bash.sh ci_gpu tests/scripts/task_config_build_gpu.sh
docker/bash.sh ci_gpu tests/scripts/task_build.sh build -j16
docker/bash.sh ci_gpu bash -c 'cd docs && make clean'
docker/bash.sh ci_gpu tests/scripts/task_ci_setup.sh
docker/bash.sh ci_gpu tests/scripts/task_python_docs.sh

Will show this traceback somewhere along the way. micro_autotune was just trying to build a relay model. I think the shapes look correct to me. Rerunning task_python_docs.sh should cause them to build.

conv2d: requires that `0`, the input channels (0) divided by groups (1),                                                                                                          [1324/4555]
 must match the input channels of the weight `3`, where the weight shape is ([6, 3, 5, 5]).
The type inference pass was unable to infer a type for this expression.
This usually occurs when an operator call is under constrained in some way, check other reported errors for hints of what may of happened.
WARNING: /home/areusch/ws/tvm4/tutorials/micro/micro_autotune.py failed to execute correctly: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/sphinx_gallery/gen_rst.py", line 480, in _memory_usage
    out = func()
  File "/usr/local/lib/python3.6/dist-packages/sphinx_gallery/gen_rst.py", line 465, in __call__
    exec(self.code, self.globals)
  File "/home/areusch/ws/tvm4/tutorials/micro/micro_autotune.py", line 179, in <module>
    lowered = tvm.relay.build(relay_mod, target=TARGET, params=params)
  File "../../python/tvm/relay/build_module.py", line 358, in build
    mod=ir_mod, target=target, params=params, executor=executor, mod_name=mod_name
  File "../../python/tvm/relay/build_module.py", line 172, in build
    self._build(mod, target, target_host, executor, mod_name)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 323, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 267, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 163, in tvm._ffi._cy3.core.CALL
tvm.error.DiagnosticError: Traceback (most recent call last):
  27: TVMFuncCall
        at /home/areusch/ws/tvm4/src/runtime/c_runtime_api.cc:474
  26: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1151
  25: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /usr/include/c++/7/bits/std_function.h:706
  24: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<c
har>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&,
tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
        at /usr/include/c++/7/bits/std_function.h:316
  23: tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object
> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/areusch/ws/tvm4/src/relay/backend/build_module.cc:181
  22: tvm::relay::backend::RelayBuildModule::Build(tvm::IRModule, tvm::runtime::Map<tvm::Integer, tvm::Target, void, void> const&, tvm::Target const&, tvm::runtime::String, tvm::runtime::St
ring)
        at /home/areusch/ws/tvm4/src/relay/backend/build_module.cc:288
  21: tvm::relay::backend::RelayBuildModule::BuildRelay(tvm::IRModule, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::NDAr
ray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
 > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::runtime::NDArray> > > const&, tvm::runtime::String)
        at /home/areusch/ws/tvm4/src/relay/backend/build_module.cc:479
  20: tvm::relay::backend::RelayBuildModule::Optimize(tvm::IRModule, tvm::runtime::Map<tvm::Integer, tvm::Target, void, void> const&, std::unordered_map<std::__cxx11::basic_string<char, std
::char_traits<char>, std::allocator<char> >, tvm::runtime::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::
basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::run
time::NDArray> > > const&)
        at /home/areusch/ws/tvm4/src/relay/backend/build_module.cc:329
  19: tvm::transform::Pass::operator()(tvm::IRModule) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:255
  18: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:267
  17: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:481
  16: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
       at /home/areusch/ws/tvm4/src/ir/transform.cc:267
  15: tvm::relay::transform::FunctionPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at /home/areusch/ws/tvm4/src/relay/ir/transform.cc:160
  14: tvm::transform::Pass::operator()(tvm::IRModule) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:255
  13: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:267
  12: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:415
  11: tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::IRModule, tvm::transform::PassContext) const
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1498
  10: tvm::IRModule tvm::runtime::detail::typed_packed_call_dispatcher<tvm::IRModule>::run<tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::IRModule&&, tvm:
:transform::PassContext&&)
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1444
  9: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::IRModule, tvm::transform::PassContext>(tvm::IRModule&&, tvm::transform::PassContext&&) const
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1369
  8: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /usr/include/c++/7/bits/std_function.h:706
  7: _M_invoke
        at /usr/include/c++/7/bits/std_function.h:316
  6: operator()
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1492
  5: unpack_call<tvm::IRModule, 2, tvm::relay::transform::InferType()::<lambda(tvm::IRModule, const PassContext&)> >
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1421
  4: run<>
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1382
  3: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1382
  2: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1397
  1: operator()
        at /home/areusch/ws/tvm4/src/relay/transforms/type_infer.cc:857
  0: tvm::DiagnosticContext::Render()
        at /home/areusch/ws/tvm4/src/ir/diagnostic.cc:105
  File "/home/areusch/ws/tvm4/src/ir/diagnostic.cc", line 105

The text was updated successfully, but these errors were encountered:

mbs-octoml · 2021-09-15T15:35:48Z

@electriclilies Lily, in your spelunking through build did you see any obvious global compile engine caching?

areusch · 2021-09-20T22:40:37Z

Failed again in mainline CI: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-9050/1/pipeline

mbs-octoml · 2021-09-21T00:02:55Z

No luck on a local repro (using nvidia docker and the ci_gpu image) as nvcc.py is failing for some reason.

areusch · 2021-09-21T00:51:52Z

@mbs-octoml i think you can repro like this:

git checkout 44d3934be5d33590ba63139f9b756b05aec9d5c5
rm -rf build
docker/bash.sh ci_gpu tests/scripts/task_config_build_gpu.sh
docker/bash.sh ci_gpu tests/scripts/task_build.sh build -j16 # note: adjust -j16 on your box
docker/bash.sh -it ci_gpu bash -c 'cd docs && make clean && TVM_TUTORIAL_EXEC_PATTERN="(micro)|(dev/use_pass_infra)" make html'

it definitely will not repro if you've already built the docs once and don't run cd docs && make clean.

mbs-octoml · 2021-09-21T02:19:26Z

I can repro with my local config just with make clean & make html, no need for docker etc. Good, that's easier.

zxybazh · 2021-09-21T06:39:26Z

Same here in my PR's CI. #9053

mikepapadim · 2021-09-21T08:35:05Z

I had the issue also in the stagging Jenkins.

mbs-octoml · 2021-09-21T21:00:39Z

I'm looking again now.

mbs-octoml · 2021-09-21T22:04:53Z

On the BAD runs:

[14:56:21] /home/mbs/github/mbs-tvm/src/relay/ir/transform.cc:135: AlterOpLayout: Input module:
def @main(%data: Tensor[(1, 3, 10, 10), float32]) -> Tensor[(1, 6, 10, 10), float32] {
  nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(6, 3, 5, 5), float32] */, padding=[2, 2, 2, 2], kernel_size=[5, 5], out_dtype="float32") /* ty=Tensor[(1, 6, 10, 10), float32] */
}


[14:56:21] /home/mbs/github/mbs-tvm/src/relay/ir/transform.cc:159: AlterOpLayout: Output module:
def @main(%data: Tensor[(1, 3, 10, 10), float32]) -> Tensor[(1, 6, 10, 10), float32] {
  %0 = layout_transform(%data, src_layout="NCHW", dst_layout="NCHW16c");
  %1 = nn.conv2d(%0, meta[relay.Constant][0] /* ty=Tensor[(6, 3, 5, 5), float32] */, padding=[2, 2, 2, 2], kernel_size=[5, 5], data_layout="NCHW16c", out_dtype="float32");
  layout_transform(%1, src_layout="NCHW16c", dst_layout="NCHW")
}

I can't find any matching rewrite on the GOOD runs -- they're all of the form:

[14:53:31] /home/mbs/github/mbs-tvm/src/relay/ir/transform.cc:135: AlterOpLayout: Input module:
def @main(%data: Tensor[(1, 3, 10, 10), float32]) -> Tensor[(1, 6, 10, 10), float32] {
  nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(6, 3, 5, 5), float32] */, padding=[2, 2, 2, 2], kernel_size=[5, 5], out_dtype="float32") /* ty=Tensor[(1, 6, 10, 10), float32] */
}


[14:53:31] /home/mbs/github/mbs-tvm/src/relay/ir/transform.cc:159: AlterOpLayout: Output module:
def @main(%data: Tensor[(1, 3, 10, 10), float32]) -> Tensor[(1, 6, 10, 10), float32] {
  %0 = layout_transform(%data, src_layout="NCHW", dst_layout="NCHW3c");
  %1 = layout_transform(meta[relay.Constant][0] /* ty=Tensor[(6, 3, 5, 5), float32] */, src_layout="OIHW", dst_layout="OIHW3i3o");
  %2 = nn.contrib_conv2d_NCHWc(%0, %1, padding=[2, 2, 2, 2], channels=6, kernel_size=[5, 5], data_layout="NCHW3c", kernel_layout="OIHW3i3o", out_layout="NCHW3c", out_dtype="float32");
  layout_transform(%2, src_layout="NCHW3c", dst_layout="NCHW")
}

Eh?

zxybazh · 2021-09-21T22:07:03Z

So the only difference is dst_layout of NCHW3c v.s. NCHW16c?

mbs-octoml · 2021-09-21T22:12:15Z

As expected all is well if disable AlterOpLayout. I need to log whatever hidden state is driving that rewrite.

mbs-octoml · 2021-09-22T00:26:38Z

Ok after getting lost in AlterOpLayout I see dev/use_pass_infra.py has @relay.op.register_alter_op_layout("nn.conv2d") which is obviously sticky and still visibible to the later micro_autotune.py. Almost certainly that defn is ill-formed in some way.

mbs-octoml · 2021-09-22T00:40:58Z

So the root problem is our tutorials need to be hermetic but there's no 'unregister' mechanism or ability to register under some 'with TvmRegistrationScope()' statement.

At least making that layout xform valid will let us hobble along a bit longer tho.

mbs-octoml · 2021-09-22T00:41:55Z

Even better- stop using sphinx_gallery.

areusch · 2021-09-22T16:37:37Z

thanks for the detailed investigation @mbs-octoml ! I do think we should make the compiler work multiple times in a row. certainly our unit tests require this and we will expose a bunch of problems with xdist after it starts reordering them. :)

mbs-octoml · 2021-09-22T16:54:34Z

#9076

mbs-octoml · 2021-09-23T21:28:35Z

This is fixed -- don't have edit rights on issues.

jroesch · 2021-09-23T21:58:40Z

Thanks for fixing this one @mbs-octoml !

areusch added the type: bug label Sep 15, 2021

areusch changed the title ~~[Bug] docs do not build from a clean source tree~~ [Bug] tutorials do not build from a clean source tree Sep 15, 2021

areusch self-assigned this Sep 15, 2021

areusch mentioned this issue Sep 20, 2021

[hot-fix][CI]Disable mypy for ethosu #9050

Closed

areusch mentioned this issue Sep 21, 2021

[microTVM][Zephyr] Add 'config_main_stack_size' option to API server #9026

Merged

areusch mentioned this issue Sep 22, 2021

[CI] Prevent the complete Jenkins pipeline to run when files commited only to /docs #9031

Merged

This was referenced Sep 23, 2021

Move the allocates of AoT codegen to be TVMBAWs #9065

Merged

[4/6] Arm(R) Ethos(TM)-U NPU TIR to CS for Conv2D #8811

Merged

jroesch closed this as completed Sep 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] tutorials do not build from a clean source tree #9013

[Bug] tutorials do not build from a clean source tree #9013

areusch commented Sep 15, 2021 •

edited

Loading

mbs-octoml commented Sep 15, 2021

areusch commented Sep 20, 2021

mbs-octoml commented Sep 21, 2021

areusch commented Sep 21, 2021

mbs-octoml commented Sep 21, 2021

zxybazh commented Sep 21, 2021 •

edited

Loading

mikepapadim commented Sep 21, 2021 •

edited

Loading

mbs-octoml commented Sep 21, 2021

mbs-octoml commented Sep 21, 2021 •

edited

Loading

zxybazh commented Sep 21, 2021

mbs-octoml commented Sep 21, 2021

mbs-octoml commented Sep 22, 2021

mbs-octoml commented Sep 22, 2021

mbs-octoml commented Sep 22, 2021

areusch commented Sep 22, 2021

mbs-octoml commented Sep 22, 2021

mbs-octoml commented Sep 23, 2021

jroesch commented Sep 23, 2021

[Bug] tutorials do not build from a clean source tree #9013

[Bug] tutorials do not build from a clean source tree #9013

Comments

areusch commented Sep 15, 2021 • edited Loading

mbs-octoml commented Sep 15, 2021

areusch commented Sep 20, 2021

mbs-octoml commented Sep 21, 2021

areusch commented Sep 21, 2021

mbs-octoml commented Sep 21, 2021

zxybazh commented Sep 21, 2021 • edited Loading

mikepapadim commented Sep 21, 2021 • edited Loading

mbs-octoml commented Sep 21, 2021

mbs-octoml commented Sep 21, 2021 • edited Loading

zxybazh commented Sep 21, 2021

mbs-octoml commented Sep 21, 2021

mbs-octoml commented Sep 22, 2021

mbs-octoml commented Sep 22, 2021

mbs-octoml commented Sep 22, 2021

areusch commented Sep 22, 2021

mbs-octoml commented Sep 22, 2021

mbs-octoml commented Sep 23, 2021

jroesch commented Sep 23, 2021

areusch commented Sep 15, 2021 •

edited

Loading

zxybazh commented Sep 21, 2021 •

edited

Loading

mikepapadim commented Sep 21, 2021 •

edited

Loading

mbs-octoml commented Sep 21, 2021 •

edited

Loading