Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX][CI]Parametrize ONNX Unit tests #8621

Merged
merged 1 commit into from
Aug 3, 2021

Conversation

mbrookhart
Copy link
Contributor

@mbrookhart mbrookhart commented Aug 2, 2021

#7438 Accidentally disabled the ONNX unit tests on CPU, and since then we've had a few regressions creep in. We've been attempting to enable the ONNX tests in a CPU only job at #8390 , but we've hit issues with pytorch in the CPU docker image.

This PR instead refactors the onnx test file to use the parametrize_targets flow.
it

  1. adds that decorator to every test
  2. inlines helper functions that are only used in one test as closures
  3. removes the for loops over enabled targets so pytest will run every test on each target individually
  4. removes -m gpu from the invocation of the pytest call in CI

This will re-enable a bunch of tests that have been disabled in CI and allow us to more easily skip tests that fail only on one target.

Thank you!

cc @electriclilies @Lunderberg @tkonolige @jwfromm @masahi

P.S. This conflicts somewhat with #8574 , but I really like the onnx node file changes in there.

@electriclilies
Copy link
Contributor

electriclilies commented Aug 2, 2021

Thanks for doing this @mbrookhart! Given the difficulty surrounding the docker image update (we've been trying for around a month to get the update through), I think that getting this refactor in is probably the fastest way to prevent further regressions.

@tkonolige
Copy link
Contributor

This is a great change. Thanks for all the hard work. Is the plan to split CPU and GPU tests across the different nodes once we get onnx building on the cpu node?

Also, have you got a chance to see how much longer tests take with this change?

Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 , thank you @mbrookhart!

Comment on lines +1569 to +1572
@tvm.testing.parametrize_targets
def test_forward_arg_min_max(target, dev):
if "cuda" in target:
pytest.skip("Fails on CUDA")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@tvm.testing.parametrize_targets
def test_forward_arg_min_max(target, dev):
if "cuda" in target:
pytest.skip("Fails on CUDA")
@tvm.testing.known_failing_targets("cuda")
@tvm.testing.parametrize_targets
def test_forward_arg_min_max(target, dev):

Perhaps consider using known_failing_targets here and elsewhere

Copy link
Contributor Author

@mbrookhart mbrookhart Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that on multiple tests, and they still ran on cuda and failed. This decorator apparently doesn't seem to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try using @tvm.testing.exclude_targets('cuda') instead? The known_failing_targets applies pytest.mark.xfail, which still runs a test to see if it succeeds, but doesn't count a failure of that test as a failure of the test suite. This makes sense if it is an assertion failure, but not if the test results in a segfault. The exclude_targets decorator removes those tests entirely from the ones being run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tvm.testing.exclude_targets('cuda') fails in the same way, test runs, fails, and is reported as a failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up, it looks like currently the @tvm.testing.parametrize_targets decorator doesn't respect the settings in known_failing_targets or exclude_targets. There's a slightly different code-path between the explicit parametrization and the auto-parametrization. This is resolved with #8542, which merges these two paths.

@mbrookhart
Copy link
Contributor Author

This is a great change. Thanks for all the hard work. Is the plan to split CPU and GPU tests across the different nodes once we get onnx building on the cpu node?

I'm not planning on it unless we start an effort to do it for all frameworks, pytorch was segfaulting on CPU when I tried.

Also, have you got a chance to see how much longer tests take with this change?

It's a minimal change vs. running the tests locally before (perhaps 1-2 minutes of extra reference kernel calls). Unfortunately, many of these tests were disabled before, so this will up the CI time to re-enable them.

@mbrookhart mbrookhart merged commit 5140d90 into apache:main Aug 3, 2021
@mbrookhart mbrookhart deleted the parameterize_onnx_tests branch August 3, 2021 15:28
@mbrookhart
Copy link
Contributor Author

Thanks @tkonolige @Lunderberg @jroesch @electriclilies @csullivan!

I've synced with @Lunderberg, we've decided to merge this and he will rebase #8542 and include @tvm.testing.known_failing_targets("cuda") on that PR which fixes the inconsistencies with the current API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants