Skip to content

Conversation

@Lunderberg
Copy link
Contributor

The test_pass_fq2i_avg_pool2d.py::test_avgpool_conv2d test is sensitive to rounding errors, and failed about a third of the time (42 / 100 tests). This was first noticed as CI failures in unrelated PRs (e.g. https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-hexagon/detail/PR-16184/6/tests). This commit marks the flaky portions of the test with pytest.mark.xfail, to avoid causing breaking CI for other PRs.

To minimize the extent of the disabled test cases, this commit breaks up each of the unit tests. Where previously a single test performed both hardware/simulation tests and relay graph comparisons, these are now done in separate test functions. The hardware/simulation tests use tvm.testing.assert_allclose and have a tolerance of 1e-02, while the graph-comparison tests use tvm.ir.structural_equal, and require identical floating-point values. Only the graph-comparison test is disabled here.

The other two test cases in test_pass_fq2i_avg_pool2d.py do not show this same sensitivity, with no failures seen in 100 executions.

The `test_pass_fq2i_avg_pool2d.py::test_avgpool_conv2d` test is
sensitive to rounding errors, and failed about a third of the time (42
/ 100 tests).  This was first noticed as CI failures in unrelated
PRs (e.g. https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-hexagon/detail/PR-16184/6/tests).
This commit marks the flaky portions of the test with
`pytest.mark.xfail`, to avoid causing breaking CI for other PRs.

To minimize the extent of the disabled test cases, this commit breaks
up each of the unit tests.  Where previously a single test performed
both hardware/simulation tests and relay graph comparisons, these are
now done in separate test functions.  The hardware/simulation tests
use `tvm.testing.assert_allclose` and
have a tolerance of `1e-02`, while the graph-comparison tests use
`tvm.ir.structural_equal`, and require identical floating-point
values.  Only the graph-comparison test is disabled here.

The other two test cases in `test_pass_fq2i_avg_pool2d.py` do not show
this same sensitivity, with no failures seen in 100 executions.
@Lunderberg
Copy link
Contributor Author

Lunderberg commented Jan 2, 2024

@rasagna-quic Can you take a look at this test case? It was introduced in #15599, and is failing about 1/3 of the time. This PR is a stopgap to avoid impacting other work, but it should have a better fix for the long-term.

@rasagna-quic
Copy link
Contributor

@Lunderberg Thank you for these changes, these look good to me. I will work create a new PR to fix this issue.

@rasagna-quic Can you take a look at this test case? It was introduced in #15599, and is failing about 1/3 of the time. This PR is a stopgap to avoid impacting other work, but it should have a better fix for the long-term.

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@junrushao junrushao merged commit 42b4f21 into apache:main Jan 3, 2024
@Lunderberg Lunderberg deleted the hexagon_disable_flaky_qnn_test branch January 3, 2024 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants