-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ONNX][CI]Parametrize ONNX Unit tests #8621
Conversation
Thanks for doing this @mbrookhart! Given the difficulty surrounding the docker image update (we've been trying for around a month to get the update through), I think that getting this refactor in is probably the fastest way to prevent further regressions. |
74db414
to
2578ad0
Compare
This is a great change. Thanks for all the hard work. Is the plan to split CPU and GPU tests across the different nodes once we get onnx building on the cpu node? Also, have you got a chance to see how much longer tests take with this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 , thank you @mbrookhart!
@tvm.testing.parametrize_targets | ||
def test_forward_arg_min_max(target, dev): | ||
if "cuda" in target: | ||
pytest.skip("Fails on CUDA") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tvm.testing.parametrize_targets | |
def test_forward_arg_min_max(target, dev): | |
if "cuda" in target: | |
pytest.skip("Fails on CUDA") | |
@tvm.testing.known_failing_targets("cuda") | |
@tvm.testing.parametrize_targets | |
def test_forward_arg_min_max(target, dev): |
Perhaps consider using known_failing_targets here and elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that on multiple tests, and they still ran on cuda and failed. This decorator apparently doesn't seem to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try using @tvm.testing.exclude_targets('cuda')
instead? The known_failing_targets
applies pytest.mark.xfail
, which still runs a test to see if it succeeds, but doesn't count a failure of that test as a failure of the test suite. This makes sense if it is an assertion failure, but not if the test results in a segfault. The exclude_targets
decorator removes those tests entirely from the ones being run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tvm.testing.exclude_targets('cuda')
fails in the same way, test runs, fails, and is reported as a failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up, it looks like currently the @tvm.testing.parametrize_targets
decorator doesn't respect the settings in known_failing_targets
or exclude_targets
. There's a slightly different code-path between the explicit parametrization and the auto-parametrization. This is resolved with #8542, which merges these two paths.
I'm not planning on it unless we start an effort to do it for all frameworks, pytorch was segfaulting on CPU when I tried.
It's a minimal change vs. running the tests locally before (perhaps 1-2 minutes of extra reference kernel calls). Unfortunately, many of these tests were disabled before, so this will up the CI time to re-enable them. |
Thanks @tkonolige @Lunderberg @jroesch @electriclilies @csullivan! I've synced with @Lunderberg, we've decided to merge this and he will rebase #8542 and include |
#7438 Accidentally disabled the ONNX unit tests on CPU, and since then we've had a few regressions creep in. We've been attempting to enable the ONNX tests in a CPU only job at #8390 , but we've hit issues with pytorch in the CPU docker image.
This PR instead refactors the onnx test file to use the
parametrize_targets
flow.it
This will re-enable a bunch of tests that have been disabled in CI and allow us to more easily skip tests that fail only on one target.
Thank you!
cc @electriclilies @Lunderberg @tkonolige @jwfromm @masahi
P.S. This conflicts somewhat with #8574 , but I really like the onnx node file changes in there.