[TIR][VM] Revert a change to lower_tvm_builtin.cc from #6126 #8274

mbrookhart · 2021-06-17T15:22:01Z

This change is causing a regression in tests/python/fronend/onnx/test_forward.py:test_loop, and causes the generated IR for the shape function to be very different.

I'm really, really confused why this didn't fail in CI, it fails in every local setup and CI docker image we've tried outside of CI.

cc @jwfromm @tmoreau89 @tqchen @zhanghaohit

this PR:

allocate(v_copy_shape_func: Pointer(int64), int64, [1]) {
  attr [0] "extern_scope" = 0 {
    v_expand_dim_shape_func: Pointer(int64)[0] = 1i64
    v_expand_dim_shape_func[1] = (int64*)v_copy_shape_func[0]
  }
  attr [0] "extern_scope" = 0 {
    v_concatenate_shape_func: Pointer(int64)[0] = 0i64
    v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)placeholder: Pointer(int64)[0])
    v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)v_expand_dim_shape_func[0])
    v_concatenate_shape_func[1] = (int64*)placeholder[1]
    assert(((int64*)v_concatenate_shape_func[1] == (int64*)v_expand_dim_shape_func[1]), "Dims mismatch in the inputs of concatenate.")
    0
  }
}

current main:

attr [v_expand_dim_shape_func: Pointer(int64)] "storage_alignment" = 128 {
  let v_expand_dim_shape_func = @tir.TVMBackendAllocWorkspace(1, dev_id: int32, 16u64, 0, 64, dtype=handle)
   {
    if @tir.isnullptr(v_expand_dim_shape_func, dtype=bool) {
      @tir.tvm_throw_last_error(, dtype=int32)
    }
    attr [v_copy_shape_func: Pointer(int64)] "storage_scope" = "global";
    attr [v_copy_shape_func] "storage_alignment" = 128 {
      let v_copy_shape_func = @tir.TVMBackendAllocWorkspace(1, dev_id, 8u64, 0, 64, dtype=handle)
       {
        if @tir.isnullptr(v_copy_shape_func, dtype=bool) {
          @tir.tvm_throw_last_error(, dtype=int32)
        }
         {
          attr [0] "extern_scope" = 0 {
            v_expand_dim_shape_func[0] = 1i64
            v_expand_dim_shape_func[1] = (int64*)v_copy_shape_func[0]
          }
          attr [0] "extern_scope" = 0 {
            v_concatenate_shape_func: Pointer(int64)[0] = 0i64
            v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)placeholder: Pointer(int64)[0])
            v_concatenate_shape_func[0] = ((int64*)v_concatenate_shape_func[0] + (int64*)v_expand_dim_shape_func[0])
            v_concatenate_shape_func[1] = (int64*)placeholder[1]
            assert(((int64*)v_concatenate_shape_func[1] == (int64*)v_expand_dim_shape_func[1]), "Dims mismatch in the inputs of concatenate.")
            0
          }
        }
      }
      if (@tir.TVMBackendFreeWorkspace(1, dev_id, v_copy_shape_func, dtype=int32) != 0) {
        @tir.tvm_throw_last_error(, dtype=int32)
      }
    }
  }
  if (@tir.TVMBackendFreeWorkspace(1, dev_id, v_expand_dim_shape_func, dtype=int32) != 0) {
    @tir.tvm_throw_last_error(, dtype=int32)
  }
}

jwfromm · 2021-06-17T20:40:26Z

I guess unsurprisingly this revision causes the error that lead to the code being removed in #6126. We'll have to either fix it a different way or temporarily disable that portion of VTA testing.

tmoreau89 · 2021-06-17T20:49:58Z

We'll want to not disable the VTA tests because that backend will quickly bitrot.

At this point, we can either:
(1) Revert #6126 but that is less than desirable
(2) Identify a fix that resolves the tests/python/fronend/onnx/test_forward.py:test_loop and VTA unit test - but this can take a couple days

tmoreau89 · 2021-06-18T01:18:00Z

I tried to reproduce the failing VTA test with the changes applied by @mbrookhart but I wasn't able to reproduce the error https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-8274/1/pipeline#step-458-log-1910. To be continued.

zhanghaohit · 2021-06-18T05:20:51Z

In my option, even though this revert makes tests/python/fronend/onnx/test_forward.py:test_loop pass, there may be a hidden bug for the vm runtime/other implementation. What if the condition if (constant_size > 0 && constant_size * nbytes < runtime::kMaxStackAlloca) is not true, it will generate the same IR as the current main.

I admit that the removal of the following optimization:

    if (device_type_.defined()) {
      if (const auto* dev_type = device_type_.as<IntImmNode>()) {
        if (dev_type->value == kDLCPU) {
          int32_t constant_size = op->constant_allocation_size();
          if (constant_size > 0 && constant_size * nbytes < runtime::kMaxStackAlloca) {
            return stmt;
          }
        }
      }
    }

may cause some performance regressions. But this optimization will raise LLVM function signature errors when there are multiple targets (For now, I cannot find an alternative fix to make this work).

So there may be two bugs in the current codebase:

the VM runtime (also maybe other parts) cannot handle the IR generated by the current main (e.g., if constant_size > runtime::kMaxStackAlloca)
when there are multiple targets, the generated IR / LLVM code with this PR is not correct.

If this problem blocks something urgent to merge, I think we can do a quick fix first (choose either one of the following two) and open a bug issue for further fix:

to fix the VTA unittest, we can remove the multiple targets test in deploy_classification.py. Just remove the "sim" in these three lines: deploy_classification.py#L194, deploy_classification.py#L206
deploy_classification.py#L224
to quick fix the test tests/python/fronend/onnx/test_forward.py:test_loop, remove the check assert out[i] == inputs[j][i], "Dims mismatch in the inputs of concatenate." in the _concatenate_shape_func.

For the two hidden bugs, I think it takes some time to find and fix them.

What do you think?

mbrookhart · 2021-07-09T00:40:45Z

ping @tmoreau89 @zhanghaohit any update on this?

mbrookhart · 2021-07-28T22:42:13Z

@tmoreau89 @zhanghaohit

I took this suggestion:

to fix the VTA unittest, we can remove the multiple targets test in deploy_classification.py. Just remove the "sim" in these three lines: deploy_classification.py#L194, deploy_classification.py#L206
deploy_classification.py#L224

To disable the tutorial.

mbrookhart · 2021-08-02T17:45:41Z

@tmoreau89 @zhanghaohit those comments did not, in fact, prevent the tutorial from being run, so this still failed. I'm fixing the CI issues onnx tests today, and this is now blocking that. Any thoughts on how to proceed?

zhanghaohit · 2021-08-05T12:17:44Z

See this: #8643

@tmoreau89 @zhanghaohit those comments did not, in fact, prevent the tutorial from being run, so this still failed. I'm fixing the CI issues onnx tests today, and this is now blocking that. Any thoughts on how to proceed?

zhanghaohit · 2021-08-11T05:02:15Z

@tmoreau89 @zhanghaohit those comments did not, in fact, prevent the tutorial from being run, so this still failed. I'm fixing the CI issues onnx tests today, and this is now blocking that. Any thoughts on how to proceed?

@mbrookhart I think maybe you apply the changes from this PR (#8643) to see whether it works, since we already discuss a lot here. I will delete #8643 later.

fix bad refactor try again

tmoreau89 · 2021-09-09T15:50:59Z

Thanks @mbrookhart and @zhanghaohit the fix has been merged.

tmoreau89 · 2021-09-09T15:51:52Z

@zhanghaohit would you be able to create an issue that tracks the two hidden bugs that require deeper investigation?

zhanghaohit · 2021-09-10T05:41:45Z

@zhanghaohit would you be able to create an issue that tracks the two hidden bugs that require deeper investigation?

Sure. I create two issues: #8977 #8978

manupak · 2021-09-13T18:07:02Z

src/tir/transforms/lower_tvm_builtin.cc

@@ -113,6 +113,16 @@ class BuiltinLower : public StmtExprMutator {
    op = stmt.as<AllocateNode>();
    // Get constant allocation bound.
    int64_t nbytes = GetVectorBytes(op->dtype);
+    if (device_type_.defined()) {


Hi, @mbrookhart

Can you explain the rationale for not generating TVMBAW calls for static size allocates for DLCPU ?

cc: @grant-arm @Mousius @mbaret

manupak · 2021-09-14T12:28:17Z

@tmoreau89 @areusch @mbrookhart ,

IIUC,

This is problematic for micro because now the constant sized allocates are forced to be placed on stack (bypassing TVMPlatformAllocate abstraction) because that is how the codegen_c lowers the allocate.

tvm/src/target/source/codegen_c.cc

Lines 860 to 877 in 1b99adc

    
           void CodeGenC::VisitStmt_(const AllocateNode* op) { 
        
             ICHECK(!is_zero(op->condition)); 
        
             std::string vid = AllocVarID(op->buffer_var.get()); 
        
             this->PrintIndent(); 
        
             int32_t constant_size = op->constant_allocation_size(); 
        
             ICHECK_GT(constant_size, 0) << "Can only handle constant size stack allocation for now"; 
        
             auto scope = GetPtrStorageScope(op->buffer_var); 
        
             alloc_storage_scope_[op->buffer_var.get()] = scope; 
        
             PrintStorageScope(scope, stream); 
        
             PrintType(op->dtype, stream); 
        
             stream << ' ' << vid << '[' << constant_size << "];\n"; 
        
             RegisterHandleType(op->buffer_var.get(), op->dtype); 
        
             this->PrintStmt(op->body); 
        
           }

I dont think is desired.

mbrookhart · 2021-09-15T16:51:04Z

Hi @manupa-arm I think we have some conflict between what the VM needs to run on CPU and what microprocessors need. Can we talk about this on the issues @zhanghaohit created #8977 #8978?

I think @jroesch might have some thoughts on separating the needs.

* enable the onnx tests after PR #8274 merged * fix lint

@AndrewZhaoLuo

* main: (102 commits) Implementation of relay_to_tir target hook (apache#8423) [Onnx] Fix NLL Loss tests (apache#8971) [Bugfix] Fix other div zero errors also in rewrite_simplify (apache#8983) [ONNX] enable the onnx tests after PR apache#8274 merged (apache#9019) [Hexagon] Disable `thread_local` on Hexagon (apache#9025) [Hexagon] Allow undefined symbols in libtvm_runtime.so on Hexagon (apache#9024) [Onnx] Add momentum (apache#9000) fix (apache#9021) [Community] @AndrewZhaoLuo -> Reviewer (apache#9020) [Hexagon] Implement model launcher (apache#8986) [Relay][Pass] Add ExtractOperators pass (apache#8996) [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration (apache#8808) [ONNX] Add Einsum converter (apache#8985) Add standalone_crt/ to be part of the wheel package, when available. (apache#9005) [Relay] Remove memory planing from LowerTEPass (apache#8974) [Hexagon] Treat floats as float32 when passing args to offloaded kernels (apache#9010) [Runtime] Pipeline Executor Initial patch. (apache#8702) [Hexagon] `llvm-options` attribute is an array of strings (apache#9011) disable cuda int8 schedule for non-cuda gpu target (apache#9014) [Torch] Add an option to make imported models compatible with the Relay text parser (apache#9015) ...

…pache#8274) * revert a change to lower_tvm_builtin.cc from apache#6126 * disable sim target on VTA tutorial fix bad refactor try again

* enable the onnx tests after PR apache#8274 merged * fix lint

…pache#8274) * revert a change to lower_tvm_builtin.cc from apache#6126 * disable sim target on VTA tutorial fix bad refactor try again

* enable the onnx tests after PR apache#8274 merged * fix lint

mbrookhart changed the title ~~revert a change to lower_tvm_builtin.cc from #6126~~ [TIR][VM] Revert a change to lower_tvm_builtin.cc from #6126 Jun 17, 2021

jwfromm approved these changes Jun 17, 2021

View reviewed changes

mbrookhart force-pushed the revert_lower_tvm_builtin_change branch from 05f28b4 to d902c54 Compare July 28, 2021 22:40

mbrookhart requested review from junrushao, kparzysz-quic, masahi, tmoreau89, tqchen, vegaluisjose, vinx13 and ZihengJiang as code owners July 28, 2021 22:40

mbrookhart force-pushed the revert_lower_tvm_builtin_change branch from d902c54 to b0f8c02 Compare July 29, 2021 21:31

zhanghaohit mentioned this pull request Aug 13, 2021

[WIP] Revert lower tvm builtin change #8643

Closed

mbrookhart force-pushed the revert_lower_tvm_builtin_change branch 4 times, most recently from 7fed7a3 to ba3d6e8 Compare August 19, 2021 21:34

mbrookhart force-pushed the revert_lower_tvm_builtin_change branch 4 times, most recently from 1a60541 to 060c72e Compare August 31, 2021 19:46

mbrookhart force-pushed the revert_lower_tvm_builtin_change branch 2 times, most recently from b102df8 to 2172b82 Compare September 3, 2021 19:59

Matthew added 2 commits September 7, 2021 10:49

revert a change to lower_tvm_builtin.cc from apache#6126

3787e93

disable sim target on VTA tutorial

56331a4

fix bad refactor try again

mbrookhart force-pushed the revert_lower_tvm_builtin_change branch from 2172b82 to 56331a4 Compare September 7, 2021 16:49

mbrookhart requested a review from Hzfengsy as a code owner September 7, 2021 16:49

tmoreau89 merged commit 90676c4 into apache:main Sep 9, 2021

mbrookhart deleted the revert_lower_tvm_builtin_change branch September 9, 2021 15:50

This was referenced Sep 10, 2021

[Bug][VM] If not allocate on stack, VM runtime cannot work？ #8977

Closed

[Bug][VTA][OpenCL] If allowed to allocate in stack, VTA multiple target test will fail #8978

Closed

manupak mentioned this pull request Sep 13, 2021

[5/6] Arm(R) Ethos(TM)-U NPU codegen integration #8849

Merged

manupak reviewed Sep 13, 2021

View reviewed changes

mbrookhart pushed a commit to mbrookhart/tvm that referenced this pull request Sep 15, 2021

enable the onnx tests after PR apache#8274 merged

9e06d69

jroesch pushed a commit that referenced this pull request Sep 16, 2021

[ONNX] enable the onnx tests after PR #8274 merged (#9019)

2711229

* enable the onnx tests after PR #8274 merged * fix lint

ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021

[ONNX] enable the onnx tests after PR apache#8274 merged (apache#9019)

edca79b

* enable the onnx tests after PR apache#8274 merged * fix lint

ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022

[ONNX] enable the onnx tests after PR apache#8274 merged (apache#9019)

f9c6e8c

* enable the onnx tests after PR apache#8274 merged * fix lint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR][VM] Revert a change to lower_tvm_builtin.cc from #6126 #8274

[TIR][VM] Revert a change to lower_tvm_builtin.cc from #6126 #8274

mbrookhart commented Jun 17, 2021

jwfromm commented Jun 17, 2021

tmoreau89 commented Jun 17, 2021 •

edited

Loading

tmoreau89 commented Jun 18, 2021

zhanghaohit commented Jun 18, 2021 •

edited

Loading

mbrookhart commented Jul 9, 2021

mbrookhart commented Jul 28, 2021

mbrookhart commented Aug 2, 2021

zhanghaohit commented Aug 5, 2021

zhanghaohit commented Aug 11, 2021

tmoreau89 commented Sep 9, 2021

tmoreau89 commented Sep 9, 2021

zhanghaohit commented Sep 10, 2021

manupak Sep 13, 2021

manupak commented Sep 14, 2021

mbrookhart commented Sep 15, 2021

[TIR][VM] Revert a change to lower_tvm_builtin.cc from #6126 #8274

[TIR][VM] Revert a change to lower_tvm_builtin.cc from #6126 #8274

Conversation

mbrookhart commented Jun 17, 2021

jwfromm commented Jun 17, 2021

tmoreau89 commented Jun 17, 2021 • edited Loading

tmoreau89 commented Jun 18, 2021

zhanghaohit commented Jun 18, 2021 • edited Loading

mbrookhart commented Jul 9, 2021

mbrookhart commented Jul 28, 2021

mbrookhart commented Aug 2, 2021

zhanghaohit commented Aug 5, 2021

zhanghaohit commented Aug 11, 2021

tmoreau89 commented Sep 9, 2021

tmoreau89 commented Sep 9, 2021

zhanghaohit commented Sep 10, 2021

manupak Sep 13, 2021

Choose a reason for hiding this comment

manupak commented Sep 14, 2021

mbrookhart commented Sep 15, 2021

tmoreau89 commented Jun 17, 2021 •

edited

Loading

zhanghaohit commented Jun 18, 2021 •

edited

Loading