Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RUNTIME] Scaffold structured error handling. #2838

Merged
merged 2 commits into from
Mar 19, 2019
Merged

Conversation

tqchen
Copy link
Member

@tqchen tqchen commented Mar 18, 2019

Given that there is some enough interest in structured error handling #2279 and we want to have a guideline to resolve the tension of simplicity and structures. This PR attempts to provide a minimum scaffold toward that step.

  • Introduce tvm.error which include specific error class
  • Introduce the error handling convention in C++
  • Detect and rewrite error messages so that c++ and python stack trace are unified and not redundant.
    • Remove the unnecessary stack-trace back to python(as they will be part of python trace)
    • In python side, print the trace in the most recent call last order.
  • Introduce an initial guideline for error handling

New Error Messages

One thing that I like to highlight is that this PR allows us to simply use
LOG(FATAL) << "ErrorType: message to throw a specific error out.

We also combine the python/c++ stack trace in a reasonable way so that the error message is more human-friendly. The error message itself contains the traceback in C++ function and can be read naturally when combined with the python traceback. We also make it work for callbacks.

Example C++ code:

  // src/api_test.cc
  void ErrorTest(int x, int y) {
    CHECK_EQ(x, y) << "ValueError: expect x and y to be equal."
    if (x == 1) {
      LOG(FATAL) << "InternalError: cannot reach here";
    }
  }

The above function is registered as PackedFunc into the python frontend,
under the name tvm._api_internal._ErrorTest.
Here is what will happen if we call the registered function:

  >>> import tvm
  >>> tvm._api_internal._ErrorTest(0, 1)
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/path/to/tvm/python/tvm/_ffi/_ctypes/function.py", line 190, in __call__
      raise get_last_ffi_error()
  ValueError: Traceback (most recent call last):
    [bt] (3) /path/to/tvm/build/libtvm.so(TVMFuncCall+0x48) [0x7fab500b8ca8]
    [bt] (2) /path/to/tvm/build/libtvm.so(+0x1c4126) [0x7fab4f7f5126]
    [bt] (1) /path/to/tvm/build/libtvm.so(+0x1ba2f8) [0x7fab4f7eb2f8]
    [bt] (0) /path/to/tvm/build/libtvm.so(+0x177d12) [0x7fab4f7a8d12]
    File "/path/to/tvm/src/api/api_test.cc", line 80
  ValueError: Check failed: x == y (0 vs. 1) : expect x and y to be equal.
  >>>
  >>> tvm._api_internal._ErrorTest(1, 1)
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/path/to/tvm/python/tvm/_ffi/_ctypes/function.py", line 190, in __call__
      raise get_last_ffi_error()
  tvm.error.InternalError: Traceback (most recent call last):
    [bt] (3) /path/to/tvm/build/libtvm.so(TVMFuncCall+0x48) [0x7fab500b8ca8]
    [bt] (2) /path/to/tvm/build/libtvm.so(+0x1c4126) [0x7fab4f7f5126]
    [bt] (1) /path/to/tvm/build/libtvm.so(+0x1ba35c) [0x7fab4f7eb35c]
    [bt] (0) /path/to/tvm/build/libtvm.so(+0x177d12) [0x7fab4f7a8d12]
    File "/path/to/tvm/src/api/api_test.cc", line 83
  InternalError: cannot reach here
  TVM hint: You hit an internal error. Please open a thread on https://discuss.tvm.ai/ to report it.

It also works with recursive callbacks between python->c++->python

def test_deep_callback():
    def error_callback():
        raise ValueError("callback error")
    # Implemented in c++
    # wrap1 = lambda : error_callback()
    wrap1 = tvm._api_internal._test_wrap_callback(error_callback)
    def flevel2():
        wrap1()
    wrap2 = tvm._api_internal._test_wrap_callback(flevel2)
    wrap2()
Traceback (most recent call last):
  File "tests/python/unittest/test_runtime_error.py", line 58, in <module>
    test_deep_callback()
  File "tests/python/unittest/test_runtime_error.py", line 38, in test_deep_callback
    wrap2()
  File "/home/tqchen/github/tvm/python/tvm/_ffi/_ctypes/function.py", line 190, in __call__
    raise get_last_ffi_error()
ValueError: Traceback (most recent call last):
  [bt] (2) /home/tqchen/github/tvm/build/libtvm.so(TVMFuncCall+0x48) [0x7f6a422e9cb8]
  [bt] (1) /home/tqchen/github/tvm/build/libtvm.so(+0x1bab28) [0x7f6a41a1cb28]
  [bt] (0) /home/tqchen/github/tvm/build/libtvm.so(+0xa88ded) [0x7f6a422eaded]
  File "/home/tqchen/github/tvm/python/tvm/_ffi/_ctypes/function.py", line 55, in cfun
    rv = local_pyfunc(*pyargs)
  File "tests/python/unittest/test_runtime_error.py", line 36, in flevel2
    wrap1()
  File "/home/tqchen/github/tvm/python/tvm/_ffi/_ctypes/function.py", line 190, in __call__
    raise get_last_ffi_error()
  [bt] (2) /home/tqchen/github/tvm/build/libtvm.so(TVMFuncCall+0x48) [0x7f6a422e9cb8]
  [bt] (1) /home/tqchen/github/tvm/build/libtvm.so(+0x1bab28) [0x7f6a41a1cb28]
  [bt] (0) /home/tqchen/github/tvm/build/libtvm.so(+0xa88ded) [0x7f6a422eaded]
  File "/home/tqchen/github/tvm/python/tvm/_ffi/_ctypes/function.py", line 55, in cfun
    rv = local_pyfunc(*pyargs)
  File "tests/python/unittest/test_runtime_error.py", line 33, in error_callback
    raise ValueError("callback error")
ValueError: callback error

@tqchen
Copy link
Member Author

tqchen commented Mar 18, 2019

@yidawang
Copy link
Contributor

Thanks @tqchen for the effort. This is amazing! I think this is a great setup for unified error messages. I particularly like the way to unify C++ and Python stack traces.

@yidawang
Copy link
Contributor

yidawang commented Mar 18, 2019

cc @KoinFlyp @wweic

@junrushao
Copy link
Member

Sounds like a plan

@tqchen
Copy link
Member Author

tqchen commented Mar 18, 2019

see also wrapper vs less-abstraction discussion in #2279

@junrushao
Copy link
Member

This is a perfect solution when all exceptions defined don't carry extra information besides their own types, and I don't think there is usecase for now that an exception would carry extra information. So I agree with the high-level idea.

Rushing for a paper deadline, will review the detailed implementation tomorrow night.



@register_error
class OpNotImplemented(OpError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like how the Python and C++ error handling is unified. Should OpNotImplemented also inherit NotImplementedError? In case one is trying to catch specific error types, e.g., to gather a list of all operators or attributes that are unsupported.

Copy link
Contributor

@markrogersjr markrogersjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some suggestions regarding the exception hierarchy.


.. code:: python

def preferred():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely sure about which of these are preferred. Is only the first option valid? Option 2 is not preferred, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more discussions in #2279, i think this is something that maybe worth some more discussion


def _valid_error_name(name):
"""Check whether name is a valid error name."""
return all(x.isalnum() or x == '_' or x == '.' for x in name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: return all(x.isalnum() or x in '_.' for x in name)


How to choose an Error Type
---------------------------
You can go through the error types are listed below, try to us common
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: us->use

raise OpNotImplemented("Operator relu is not implemented in the MXNet fronend")

def _op_not_implemented(op_name):
return OpNotImpelemented("Operator {} is not implemented.").format(op_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Impelemented->Implemented


@register_error
class OpError(TVMError):
"""Base class of all operator errors in fronends."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: fronends->frontends



@register_error
class OpNotImplemented(OpError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest class OpNotImplementedError(OpError, NotImplementedError):



@register_error
class OpAttributeRequired(OpError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest class OpAttributeRequiredError(OpError, AttributeError):



@register_error
class OpAttributeInvalid(OpError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest class OpAttributeInvalidError(OpError, AttributeError):



@register_error
class OpAttributeUnimplemented(OpError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest class OpAttributeNotImplementedError(OpError, AttributeError, NotImplementedError):

Copy link
Contributor

@markrogersjr markrogersjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tqchen tqchen merged commit f63631f into apache:master Mar 19, 2019
@tqchen
Copy link
Member Author

tqchen commented Mar 19, 2019

Thanks, @yidawang @KoinFlyp @junrushao1994, this is now merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants