-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TIR][VM] Revert a change to lower_tvm_builtin.cc from #6126 #8274
[TIR][VM] Revert a change to lower_tvm_builtin.cc from #6126 #8274
Conversation
I guess unsurprisingly this revision causes the error that lead to the code being removed in #6126. We'll have to either fix it a different way or temporarily disable that portion of VTA testing. |
We'll want to not disable the VTA tests because that backend will quickly bitrot. At this point, we can either: |
I tried to reproduce the failing VTA test with the changes applied by @mbrookhart but I wasn't able to reproduce the error https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-8274/1/pipeline#step-458-log-1910. To be continued. |
In my option, even though this revert makes I admit that the removal of the following optimization: if (device_type_.defined()) {
if (const auto* dev_type = device_type_.as<IntImmNode>()) {
if (dev_type->value == kDLCPU) {
int32_t constant_size = op->constant_allocation_size();
if (constant_size > 0 && constant_size * nbytes < runtime::kMaxStackAlloca) {
return stmt;
}
}
}
} may cause some performance regressions. But this optimization will raise LLVM function signature errors when there are multiple targets (For now, I cannot find an alternative fix to make this work). So there may be two bugs in the current codebase:
If this problem blocks something urgent to merge, I think we can do a quick fix first (choose either one of the following two) and open a bug issue for further fix:
For the two hidden bugs, I think it takes some time to find and fix them. What do you think? |
ping @tmoreau89 @zhanghaohit any update on this? |
05f28b4
to
d902c54
Compare
I took this suggestion:
To disable the tutorial. |
d902c54
to
b0f8c02
Compare
@tmoreau89 @zhanghaohit those comments did not, in fact, prevent the tutorial from being run, so this still failed. I'm fixing the CI issues onnx tests today, and this is now blocking that. Any thoughts on how to proceed? |
See this: #8643
|
@mbrookhart I think maybe you apply the changes from this PR (#8643) to see whether it works, since we already discuss a lot here. I will delete #8643 later. |
7fed7a3
to
ba3d6e8
Compare
1a60541
to
060c72e
Compare
b102df8
to
2172b82
Compare
fix bad refactor try again
2172b82
to
56331a4
Compare
Thanks @mbrookhart and @zhanghaohit the fix has been merged. |
@zhanghaohit would you be able to create an issue that tracks the two hidden bugs that require deeper investigation? |
|
@@ -113,6 +113,16 @@ class BuiltinLower : public StmtExprMutator { | |||
op = stmt.as<AllocateNode>(); | |||
// Get constant allocation bound. | |||
int64_t nbytes = GetVectorBytes(op->dtype); | |||
if (device_type_.defined()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @mbrookhart
Can you explain the rationale for not generating TVMBAW calls for static size allocates for DLCPU ?
@tmoreau89 @areusch @mbrookhart , IIUC, This is problematic for micro because now the constant sized allocates are forced to be placed on stack (bypassing TVMPlatformAllocate abstraction) because that is how the codegen_c lowers the allocate. tvm/src/target/source/codegen_c.cc Lines 860 to 877 in 1b99adc
I dont think is desired. |
Hi @manupa-arm I think we have some conflict between what the VM needs to run on CPU and what microprocessors need. Can we talk about this on the issues @zhanghaohit created #8977 #8978? I think @jroesch might have some thoughts on separating the needs. |
* main: (102 commits) Implementation of relay_to_tir target hook (apache#8423) [Onnx] Fix NLL Loss tests (apache#8971) [Bugfix] Fix other div zero errors also in rewrite_simplify (apache#8983) [ONNX] enable the onnx tests after PR apache#8274 merged (apache#9019) [Hexagon] Disable `thread_local` on Hexagon (apache#9025) [Hexagon] Allow undefined symbols in libtvm_runtime.so on Hexagon (apache#9024) [Onnx] Add momentum (apache#9000) fix (apache#9021) [Community] @AndrewZhaoLuo -> Reviewer (apache#9020) [Hexagon] Implement model launcher (apache#8986) [Relay][Pass] Add ExtractOperators pass (apache#8996) [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration (apache#8808) [ONNX] Add Einsum converter (apache#8985) Add standalone_crt/ to be part of the wheel package, when available. (apache#9005) [Relay] Remove memory planing from LowerTEPass (apache#8974) [Hexagon] Treat floats as float32 when passing args to offloaded kernels (apache#9010) [Runtime] Pipeline Executor Initial patch. (apache#8702) [Hexagon] `llvm-options` attribute is an array of strings (apache#9011) disable cuda int8 schedule for non-cuda gpu target (apache#9014) [Torch] Add an option to make imported models compatible with the Relay text parser (apache#9015) ...
…pache#8274) * revert a change to lower_tvm_builtin.cc from apache#6126 * disable sim target on VTA tutorial fix bad refactor try again
* enable the onnx tests after PR apache#8274 merged * fix lint
…pache#8274) * revert a change to lower_tvm_builtin.cc from apache#6126 * disable sim target on VTA tutorial fix bad refactor try again
* enable the onnx tests after PR apache#8274 merged * fix lint
This change is causing a regression in tests/python/fronend/onnx/test_forward.py:test_loop, and causes the generated IR for the shape function to be very different.
I'm really, really confused why this didn't fail in CI, it fails in every local setup and CI docker image we've tried outside of CI.
cc @jwfromm @tmoreau89 @tqchen @zhanghaohit
this PR:
current main: