Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] tutorials do not build from a clean source tree #9013

Closed
areusch opened this issue Sep 15, 2021 · 18 comments
Closed

[Bug] tutorials do not build from a clean source tree #9013

areusch opened this issue Sep 15, 2021 · 18 comments
Assignees

Comments

@areusch
Copy link
Contributor

areusch commented Sep 15, 2021

I think this might be a compiler caching bug cc @tqchen @jroesch @mbs-octoml . Not caught in the regression because the regression rebuilds only changed tutorials.

Steps to reproduce:

  1. git checkout dc2f70e3c8a9b14b9e414ecf768ad32e6c3c3960
  2. rm -rf build
  3. docker/bash.sh ci_gpu tests/scripts/task_config_build_gpu.sh
  4. docker/bash.sh ci_gpu tests/scripts/task_build.sh build -j16
  5. docker/bash.sh ci_gpu bash -c 'cd docs && make clean'
  6. docker/bash.sh ci_gpu tests/scripts/task_ci_setup.sh
  7. docker/bash.sh ci_gpu tests/scripts/task_python_docs.sh

Will show this traceback somewhere along the way. micro_autotune was just trying to build a relay model. I think the shapes look correct to me. Rerunning task_python_docs.sh should cause them to build.

conv2d: requires that `0`, the input channels (0) divided by groups (1),                                                                                                          [1324/4555]
 must match the input channels of the weight `3`, where the weight shape is ([6, 3, 5, 5]).
The type inference pass was unable to infer a type for this expression.
This usually occurs when an operator call is under constrained in some way, check other reported errors for hints of what may of happened.
WARNING: /home/areusch/ws/tvm4/tutorials/micro/micro_autotune.py failed to execute correctly: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/sphinx_gallery/gen_rst.py", line 480, in _memory_usage
    out = func()
  File "/usr/local/lib/python3.6/dist-packages/sphinx_gallery/gen_rst.py", line 465, in __call__
    exec(self.code, self.globals)
  File "/home/areusch/ws/tvm4/tutorials/micro/micro_autotune.py", line 179, in <module>
    lowered = tvm.relay.build(relay_mod, target=TARGET, params=params)
  File "../../python/tvm/relay/build_module.py", line 358, in build
    mod=ir_mod, target=target, params=params, executor=executor, mod_name=mod_name
  File "../../python/tvm/relay/build_module.py", line 172, in build
    self._build(mod, target, target_host, executor, mod_name)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 323, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 267, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 163, in tvm._ffi._cy3.core.CALL
tvm.error.DiagnosticError: Traceback (most recent call last):
  27: TVMFuncCall
        at /home/areusch/ws/tvm4/src/runtime/c_runtime_api.cc:474
  26: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1151
  25: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /usr/include/c++/7/bits/std_function.h:706
  24: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<c
har>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&,
tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
        at /usr/include/c++/7/bits/std_function.h:316
  23: tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object
> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/areusch/ws/tvm4/src/relay/backend/build_module.cc:181
  22: tvm::relay::backend::RelayBuildModule::Build(tvm::IRModule, tvm::runtime::Map<tvm::Integer, tvm::Target, void, void> const&, tvm::Target const&, tvm::runtime::String, tvm::runtime::St
ring)
        at /home/areusch/ws/tvm4/src/relay/backend/build_module.cc:288
  21: tvm::relay::backend::RelayBuildModule::BuildRelay(tvm::IRModule, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::NDAr
ray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
 > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::runtime::NDArray> > > const&, tvm::runtime::String)
        at /home/areusch/ws/tvm4/src/relay/backend/build_module.cc:479
  20: tvm::relay::backend::RelayBuildModule::Optimize(tvm::IRModule, tvm::runtime::Map<tvm::Integer, tvm::Target, void, void> const&, std::unordered_map<std::__cxx11::basic_string<char, std
::char_traits<char>, std::allocator<char> >, tvm::runtime::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::
basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::run
time::NDArray> > > const&)
        at /home/areusch/ws/tvm4/src/relay/backend/build_module.cc:329
  19: tvm::transform::Pass::operator()(tvm::IRModule) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:255
  18: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:267
  17: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:481
  16: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
       at /home/areusch/ws/tvm4/src/ir/transform.cc:267
  15: tvm::relay::transform::FunctionPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at /home/areusch/ws/tvm4/src/relay/ir/transform.cc:160
  14: tvm::transform::Pass::operator()(tvm::IRModule) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:255
  13: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:267
  12: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
        at /home/areusch/ws/tvm4/src/ir/transform.cc:415
  11: tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::IRModule, tvm::transform::PassContext) const
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1498
  10: tvm::IRModule tvm::runtime::detail::typed_packed_call_dispatcher<tvm::IRModule>::run<tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::IRModule&&, tvm:
:transform::PassContext&&)
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1444
  9: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::IRModule, tvm::transform::PassContext>(tvm::IRModule&&, tvm::transform::PassContext&&) const
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1369
  8: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /usr/include/c++/7/bits/std_function.h:706
  7: _M_invoke
        at /usr/include/c++/7/bits/std_function.h:316
  6: operator()
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1492
  5: unpack_call<tvm::IRModule, 2, tvm::relay::transform::InferType()::<lambda(tvm::IRModule, const PassContext&)> >
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1421
  4: run<>
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1382
  3: run<tvm::runtime::TVMMovableArgValueWithContext_>
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1382
  2: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
        at /home/areusch/ws/tvm4/include/tvm/runtime/packed_func.h:1397
  1: operator()
        at /home/areusch/ws/tvm4/src/relay/transforms/type_infer.cc:857
  0: tvm::DiagnosticContext::Render()
        at /home/areusch/ws/tvm4/src/ir/diagnostic.cc:105
  File "/home/areusch/ws/tvm4/src/ir/diagnostic.cc", line 105
@areusch areusch changed the title [Bug] docs do not build from a clean source tree [Bug] tutorials do not build from a clean source tree Sep 15, 2021
@areusch areusch self-assigned this Sep 15, 2021
@mbs-octoml
Copy link
Contributor

@electriclilies Lily, in your spelunking through build did you see any obvious global compile engine caching?

@areusch
Copy link
Contributor Author

areusch commented Sep 20, 2021

@mbs-octoml
Copy link
Contributor

No luck on a local repro (using nvidia docker and the ci_gpu image) as nvcc.py is failing for some reason.

@areusch
Copy link
Contributor Author

areusch commented Sep 21, 2021

@mbs-octoml i think you can repro like this:

  1. git checkout 44d3934be5d33590ba63139f9b756b05aec9d5c5
  2. rm -rf build
  3. docker/bash.sh ci_gpu tests/scripts/task_config_build_gpu.sh
  4. docker/bash.sh ci_gpu tests/scripts/task_build.sh build -j16 # note: adjust -j16 on your box
  5. docker/bash.sh -it ci_gpu bash -c 'cd docs && make clean && TVM_TUTORIAL_EXEC_PATTERN="(micro)|(dev/use_pass_infra)" make html'

it definitely will not repro if you've already built the docs once and don't run cd docs && make clean.

@mbs-octoml
Copy link
Contributor

I can repro with my local config just with make clean & make html, no need for docker etc. Good, that's easier.

@zxybazh
Copy link
Member

zxybazh commented Sep 21, 2021

Same here in my PR's CI. #9053

@mikepapadim
Copy link
Contributor

mikepapadim commented Sep 21, 2021

I had the issue also in the stagging Jenkins.

@mbs-octoml
Copy link
Contributor

I'm looking again now.

@mbs-octoml
Copy link
Contributor

mbs-octoml commented Sep 21, 2021

On the BAD runs:

[14:56:21] /home/mbs/github/mbs-tvm/src/relay/ir/transform.cc:135: AlterOpLayout: Input module:
def @main(%data: Tensor[(1, 3, 10, 10), float32]) -> Tensor[(1, 6, 10, 10), float32] {
  nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(6, 3, 5, 5), float32] */, padding=[2, 2, 2, 2], kernel_size=[5, 5], out_dtype="float32") /* ty=Tensor[(1, 6, 10, 10), float32] */
}


[14:56:21] /home/mbs/github/mbs-tvm/src/relay/ir/transform.cc:159: AlterOpLayout: Output module:
def @main(%data: Tensor[(1, 3, 10, 10), float32]) -> Tensor[(1, 6, 10, 10), float32] {
  %0 = layout_transform(%data, src_layout="NCHW", dst_layout="NCHW16c");
  %1 = nn.conv2d(%0, meta[relay.Constant][0] /* ty=Tensor[(6, 3, 5, 5), float32] */, padding=[2, 2, 2, 2], kernel_size=[5, 5], data_layout="NCHW16c", out_dtype="float32");
  layout_transform(%1, src_layout="NCHW16c", dst_layout="NCHW")
}

I can't find any matching rewrite on the GOOD runs -- they're all of the form:

[14:53:31] /home/mbs/github/mbs-tvm/src/relay/ir/transform.cc:135: AlterOpLayout: Input module:
def @main(%data: Tensor[(1, 3, 10, 10), float32]) -> Tensor[(1, 6, 10, 10), float32] {
  nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(6, 3, 5, 5), float32] */, padding=[2, 2, 2, 2], kernel_size=[5, 5], out_dtype="float32") /* ty=Tensor[(1, 6, 10, 10), float32] */
}


[14:53:31] /home/mbs/github/mbs-tvm/src/relay/ir/transform.cc:159: AlterOpLayout: Output module:
def @main(%data: Tensor[(1, 3, 10, 10), float32]) -> Tensor[(1, 6, 10, 10), float32] {
  %0 = layout_transform(%data, src_layout="NCHW", dst_layout="NCHW3c");
  %1 = layout_transform(meta[relay.Constant][0] /* ty=Tensor[(6, 3, 5, 5), float32] */, src_layout="OIHW", dst_layout="OIHW3i3o");
  %2 = nn.contrib_conv2d_NCHWc(%0, %1, padding=[2, 2, 2, 2], channels=6, kernel_size=[5, 5], data_layout="NCHW3c", kernel_layout="OIHW3i3o", out_layout="NCHW3c", out_dtype="float32");
  layout_transform(%2, src_layout="NCHW3c", dst_layout="NCHW")
}

Eh?

@zxybazh
Copy link
Member

zxybazh commented Sep 21, 2021

So the only difference is dst_layout of NCHW3c v.s. NCHW16c?

@mbs-octoml
Copy link
Contributor

As expected all is well if disable AlterOpLayout. I need to log whatever hidden state is driving that rewrite.

@mbs-octoml
Copy link
Contributor

Ok after getting lost in AlterOpLayout I see dev/use_pass_infra.py has @relay.op.register_alter_op_layout("nn.conv2d") which is obviously sticky and still visibible to the later micro_autotune.py. Almost certainly that defn is ill-formed in some way.

@mbs-octoml
Copy link
Contributor

So the root problem is our tutorials need to be hermetic but there's no 'unregister' mechanism or ability to register under some 'with TvmRegistrationScope()' statement.

At least making that layout xform valid will let us hobble along a bit longer tho.

@mbs-octoml
Copy link
Contributor

Even better- stop using sphinx_gallery.

@areusch
Copy link
Contributor Author

areusch commented Sep 22, 2021

thanks for the detailed investigation @mbs-octoml ! I do think we should make the compiler work multiple times in a row. certainly our unit tests require this and we will expose a bunch of problems with xdist after it starts reordering them. :)

@mbs-octoml
Copy link
Contributor

#9076

@mbs-octoml
Copy link
Contributor

This is fixed -- don't have edit rights on issues.

@jroesch
Copy link
Member

jroesch commented Sep 23, 2021

Thanks for fixing this one @mbs-octoml !

@jroesch jroesch closed this as completed Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants