-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VTA][OpenCL] intelfocl #6126
[VTA][OpenCL] intelfocl #6126
Changes from all commits
98be683
64a5a5c
83f2bdb
a1c720a
73dea7b
ab11df7
53a90fe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,16 +88,6 @@ class BuiltinLower : public StmtExprMutator { | |
op = stmt.as<AllocateNode>(); | ||
// Get constant allocation bound. | ||
int64_t nbytes = GetVectorBytes(op->dtype); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you mind explaining the reasoning behind this deletion? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This removes special handling for kDLCPU. Otherwise, it may cause LLVM parameters match error. Traceback (most recent call last):
File "vta/tutorials/frontend/deploy_classification.py", line 210, in <module>
params=params, target_host=env.target_host)
File "/4pd/home/zhanghao/workspace/tvm-2/tvm/python/tvm/relay/build_module.py", line 251, in build
graph_json, mod, params = bld_mod.build(mod, target, target_host, params)
File "/4pd/home/zhanghao/workspace/tvm-2/tvm/python/tvm/relay/build_module.py", line 120, in build
self._build(mod, target, target_host)
File "tvm/_ffi/_cython/./packed_func.pxi", line 321, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 256, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./packed_func.pxi", line 245, in tvm._ffi._cy3.core.FuncCall3
File "tvm/_ffi/_cython/./base.pxi", line 160, in tvm._ffi._cy3.core.CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (8) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(TVMFuncCall+0x4c) [0x7f385ac9bc1c]
[bt] (7) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x316) [0x7f385ab2a566]
[bt] (6) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::relay::backend::RelayBuildModule::BuildRelay(tvm::IRModule, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::runtime::NDArray> > > const&)+0xe31) [0x7f385ab29c11]
[bt] (5) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::build(tvm::Map<tvm::runtime::String, tvm::IRModule, void, void> const&, tvm::Target const&)+0x3c4) [0x7f385a4322d4]
[bt] (4) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::build(tvm::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)+0x326) [0x7f385a4318c6]
[bt] (3) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::codegen::Build(tvm::IRModule, tvm::Target const&)+0x67a) [0x7f385a74f68a]
[bt] (2) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(+0x1277ea1) [0x7f385ac7eea1]
[bt] (1) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(tvm::codegen::LLVMModuleNode::Init(tvm::IRModule const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0x1388) [0x7f385ac82c68]
[bt] (0) /4pd/home/zhanghao/workspace/tvm-2/tvm/build/libtvm.so(+0x1276a57) [0x7f385ac7da57]
File "/4pd/home/zhanghao/workspace/tvm-2/tvm/src/target/llvm/llvm_module.cc", line 230
TVMError: LLVM module verification failed with the following errors:
Call parameter type does not match function signature!
%.sub = getelementptr inbounds [4 x <8 x float>], [4 x <8 x float>]* %3, i64 0, i64 0
i8* %34 = call i8* @VTABufferCPUPtr(i8* %17, <8 x float>* nonnull %.sub)
Call parameter type does not match function signature!
%.sub = getelementptr inbounds [8 x float], [8 x float]* %3, i64 0, i64 0
i8* %31 = call i8* @VTABufferCPUPtr(i8* %14, float* nonnull %.sub) The raise error is due to the LLVM code here (lib/IR/Verifier.cpp): 2598 // Verify that all arguments to the call match the function type.
2599 for (unsigned i = 0, e = FTy->getNumParams(); i != e; ++i)
2600 Assert(CS.getArgument(i)->getType() == FTy->getParamType(i),
2601 "Call parameter type does not match function signature!",
2602 CS.getArgument(i), FTy->getParamType(i), I); It will raise this error if the special handling for kDLCPU is there. I think it is because the signature for the AllocateNode is not consistent with the parameter? Any ideas about alternative fix? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tqchen perhaps you'd have some input on why this code was needed in the first place? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tmoreau89 @tqchen @zhanghaohit I've been searching for a bug introduced by this PR that somehow doesn't show up in CI? I've tested it locally with the docker image and still see the failure. Anyway, if I run python/tests/onnx/test_forward.py:test_loop on main locally it fails. If I revert the change to this file, it passes. I'm tempted to revert this PR until we can find a better way to fix this for VTA, do you guys have a better suggestion here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mbrookhart I am in favor of reverting the changes applied to this file and in a separate PR we can ensure that the error encountered by @zhanghaohit is resolved while making sure that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it would be needed to revert the entire PR, rather introduce a PR that just reverts the changes applied to this file. Given that the Intelfocl backend is not CI tested it's not going to break unit tests in TVM. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems that the error is due to the _concatenate_shape_func, where an The changes of this PR may introduce One quick fix is to remove the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jwfromm took a look at the IR generated with and without this code snippet, this is what he got:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like this is fundamentally changing how shape functions get lowered, I don't think that just removing the assert is the right way to go about it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
:/ It should not be possible to pass |
||
if (device_type_.defined()) { | ||
if (const auto* dev_type = device_type_.as<IntImmNode>()) { | ||
if (dev_type->value == kDLCPU) { | ||
int32_t constant_size = op->constant_allocation_size(); | ||
if (constant_size > 0 && constant_size * nbytes < runtime::kMaxStackAlloca) { | ||
return stmt; | ||
} | ||
} | ||
} | ||
} | ||
PrimExpr total_bytes = make_const(op->extents[0].dtype(), nbytes); | ||
for (size_t i = 0; i < op->extents.size(); ++i) { | ||
total_bytes = total_bytes * op->extents[i]; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mind explaining the changes made to this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original code will fail if there are multiple workloads in one schedule. For example, in
fused_nn_conv2d_add_add_right_shift_clip_cast_31
, theconv2d
andadd
may both haveworkload
attrs. We have to get the correct workload by comparing thetask_name
.Previously it works fine, as
add
is not a tunable op. But since we also want to put middle alu-only nodes (residual blocks) to VTA, such asfused_cast_cast_add_nn_relu_clip_cast_3
. We create a vta schedule foradd
(see add.alu)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. How do we guard against extracting add as a standalone op for other backends int his case?