-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Detect flaky distribuiton build failures and integration test failures #4171
Comments
In order to avoid creation and closing of multiple issues, we should introduce a circuit breaker to the createGithubIssue library, what this should do is before creating an issue it should query for Example: https://github.com/opensearch-project/cross-cluster-replication/issues?q=is%3Aissue+%5BAUTOCUT%5D+Distribution+Build+Failed+for+cross-cluster-replication-3.0.0+is%3Aclosed+closed%3A2023-10-15..2023-10-22+ |
[Untriage] @gaiksaya take a look and close this issue if you think this solves the problem. Thank you |
Thanks @prudhvigodithi Looks good. It needs to add more details in comment but that can be tracked in another issue. |
We should add a flaky-test label when a test passes and fails between different runs. CC: @prudhvigodithi @gaiksaya |
@rishabh6788 is going to work on a POC to record, track and surface flaky integration tests for OpenSearch core before implementing it for plugins. Note: We will currently focus only on Gradle based projects. |
We now have the Gradle Check insights on failed and flaky tests in the OpenSearch Gradle Check Metrics dashboard. As required moving forward we can have similar setup/metrics for distribution build and integration test failures. Based on the this data and trend (part of the metrics initiate) we can go with the solution @gaiksaya described of creating/updating/commenting on issues. |
Is your feature request related to a problem? Please describe
The GitHub issues created at distribution level for build failures and integration test failures lack the intelligence to detect if the build or tests are flaky. Currently, the logic blindly closes the issues if it passes the build in say one distribution and opens a new one if it fails for another platform.
Example: https://github.com/opensearch-project/cross-cluster-replication/issues?q=is%3Aissue++%5BAUTOCUT%5D+Distribution+Build+Failed+for+cross-cluster-replication-3.0.0+
Describe the solution you'd like
The GH issue creation should be smart enough to detect the following:
If yes, it should label the issue or comment on it saying this is flaky and should not be closed unless addressed
Time span to detect the issue as flaky can be 3-4 hours considering 3-4 runs within the given time frame.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: