Skip to content

Conversation

@haoyang9804
Copy link
Contributor

When relay.build a IRModule in which a function whose returned value includes another function exists, TVM will crash with segmentation fault. Initially I assume this is the behavior disllowed by TVM, but in test_analysis_basic_block_normal_form.py, the following function shows the tolerance of it.

@pytest.mark.xfail(raises=tvm.error.TVMError)
def test_func():
    x = relay.var("x", shape=(1,), dtype="float32")  # , a)
    y = relay.var("y", shape=(1,), dtype="float32")  # , a)
    z = relay.var("z", shape=(1,), dtype="float32")  # , a)
    x2 = relay.add(x, x)
    func_a = relay.Function([y], relay.add(x2, y))  # , a, [a])
    func_b = relay.Function([z], relay.add(x2, z))  # , a, [a])
    body = relay.Tuple([func_a, func_b])
    body = relay.Function([x], body)
    """
    fn (%x: Tensor[(1), float32]) {
      %1 = fn (%y: Tensor[(1), float32]) {
        %0 = add(%x, %x);
        add(%0, %y)
      };
      %2 = fn (%z: Tensor[(1), float32]) {
        add(%0, %z)
      };
      (%1, %2)
    }
    """
    check_basic_block_normal_form(body)

So I write the following script with simpler structure to fuzz TVM:

import tvm
from tvm import relay
mod = tvm.IRModule()
x = relay.var("x", shape=(1,), dtype="float32")
x2 = relay.add(x, x)
f = relay.Function([x], x2)
# body = relay.Tuple([f,])
body = relay.Function([], f)
mod['main'] = body
mod = relay.transform.InferType()(mod)
print(mod.astext(show_meta_data=False))
graph, lib, params = relay.build(mod, target='llvm')

The relay IR is easy, only including a function whose returned value is another function

#[version = "0.0.5"]
def @main() -> fn (Tensor[(1), float32]) -> Tensor[(1), float32] {
  fn (%x: Tensor[(1), float32]) -> Tensor[(1), float32] {
    add(%x, %x) /* ty=Tensor[(1), float32] */
  }
}

As expected, segmentation fault showed up.
By bug localization, I find this problem is caused during memory allocation in graph_executor_codegen. More concretely, function CalculateRelayExprSizeBytes forgets the situation where the type returned variable could be of FuncType and directly turns the typenode into TensorTypeNode. Then auto shape = tensor_type->shape; triggers the bug.

@haoyang9804
Copy link
Contributor Author

haoyang9804 commented Mar 6, 2022

Don't know who to cc. But in TVM community, I had some discussion with @masahi on this problem. So... could you please take a look on my fix? Don't know if it violates some basic principle of TVM.

@masahi
Copy link
Member

masahi commented Mar 6, 2022

I don't think returning a function is something we want to support in graph runtime. What do you want to do with it?

@haoyang9804
Copy link
Contributor Author

haoyang9804 commented Mar 7, 2022

I don't think returning a function is something we want to support in graph runtime. What do you want to do with it?

@masahi Hi. Thanks for your reply. I make this change from the perspective of TVM users. If returning functions is allowed when we play with relay, then I think it should be compiled with relay.build. So I made the above changes. Otherwise, if that's not you developers want, I think we should add a ICheck to tell it's disallowed.

If you agree with the latter, I can change back and add an ICheck

@masahi
Copy link
Member

masahi commented Mar 7, 2022

I mean, have you done anything with the returned function from graph, lib, params = relay.build(mod, target='llvm')?

I don't think we can return a function to python and do anything with it.

@haoyang9804
Copy link
Contributor Author

I mean, have you done anything with the returned function from graph, lib, params = relay.build(mod, target='llvm')?

I don't think we can return a function to python and do anything with it.

I haven't. I get your point. But I think adding a ICheck to give users a reasonable warning message instead of directly showing segmentation fault is reasonable. If you agree, I will add one.

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think adding a ICheck to give users a reasonable warning message instead of directly showing segmentation fault is reasonable

Sure, see my comment below.

ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
<< "Only functions supported by custom codegen";
// ICHECK(op->GetAttr<String>(attr::kCompiler).defined())
// << "Only functions supported by custom codegen";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this diff

return 0;
}
auto tensor_type = expr_type.as<TensorTypeNode>();
auto shape = tensor_type->shape;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the above diff and add ICHECK(tensor_type).

@masahi masahi merged commit 22abfc4 into apache:main Mar 7, 2022
ziqiangxu8457 pushed a commit to ziqiangxu8457/tvm that referenced this pull request Mar 9, 2022
* fix InferType bug

* fix InferType related bug

* support returned function in relay.build

* support returned function in relay.build

* support returned function in relay.build

* support returned function in relay.build

* add warning about function returning function
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
* fix InferType bug

* fix InferType related bug

* support returned function in relay.build

* support returned function in relay.build

* support returned function in relay.build

* support returned function in relay.build

* add warning about function returning function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants