[Feature request] Detect flaky distribuiton build failures and integration test failures #4171

gaiksaya · 2023-10-24T21:56:04Z

Is your feature request related to a problem? Please describe

The GitHub issues created at distribution level for build failures and integration test failures lack the intelligence to detect if the build or tests are flaky. Currently, the logic blindly closes the issues if it passes the build in say one distribution and opens a new one if it fails for another platform.
Example: https://github.com/opensearch-project/cross-cluster-replication/issues?q=is%3Aissue++%5BAUTOCUT%5D+Distribution+Build+Failed+for+cross-cluster-replication-3.0.0+

Describe the solution you'd like

The GH issue creation should be smart enough to detect the following:

Is the build flaky: Are the failures consistent with particular type of platform and architecture. Add comment to the issue instead of closing and creating a new one
Are the integration tests flaky: Are the failures consistent with particular type of platform and architecture. Add comment to the issue instead of closing and creating a new one

If yes, it should label the issue or comment on it saying this is flaky and should not be closed unless addressed

Time span to detect the issue as flaky can be 3-4 hours considering 3-4 runs within the given time frame.

Describe alternatives you've considered

No response

Additional context

No response

prudhvigodithi · 2023-10-25T02:28:08Z

In order to avoid creation and closing of multiple issues, we should introduce a circuit breaker to the createGithubIssue library, what this should do is before creating an issue it should query for AUTOCUT issues for a release version and if closed less than 24-48hrs reopen and update the issue with failed build information.

Example: https://github.com/opensearch-project/cross-cluster-replication/issues?q=is%3Aissue+%5BAUTOCUT%5D+Distribution+Build+Failed+for+cross-cluster-replication-3.0.0+is%3Aclosed+closed%3A2023-10-15..2023-10-22+
Take the latest issue, re-open and update with the build failure.

prudhvigodithi · 2023-11-01T16:36:14Z

[Untriage]
We have the library now updated that re-opens the AUTOCUT issues instead of just creating new one.
opensearch-project/common-utils#556 (comment)
Screenshot 2023-10-31 at 3.57.26 PM.png

@gaiksaya take a look and close this issue if you think this solves the problem.

Thank you

gaiksaya · 2023-11-01T17:17:27Z

Thanks @prudhvigodithi Looks good. It needs to add more details in comment but that can be tracked in another issue.
Closing the issue.

bbarani · 2024-02-29T20:09:39Z

We should add a flaky-test label when a test passes and fails between different runs. CC: @prudhvigodithi @gaiksaya

bbarani · 2024-03-05T17:44:33Z

@rishabh6788 is going to work on a POC to record, track and surface flaky integration tests for OpenSearch core before implementing it for plugins.

Note: We will currently focus only on Gradle based projects.

prudhvigodithi · 2024-06-04T16:44:21Z

We now have the Gradle Check insights on failed and flaky tests in the OpenSearch Gradle Check Metrics dashboard.
https://github.com/opensearch-project/OpenSearch/blob/main/DEVELOPER_GUIDE.md#gradle-check-metrics-dashboard

As required moving forward we can have similar setup/metrics for distribution build and integration test failures. Based on the this data and trend (part of the metrics initiate) we can go with the solution @gaiksaya described of creating/updating/commenting on issues.

@getsaurabh02 @dblock

gaiksaya added enhancement New Enhancement untriaged Issues that have not yet been triaged labels Oct 24, 2023

This was referenced Oct 30, 2023

Update createGithubIssue to re-open recently closed issue opensearch-project/opensearch-build-libraries#347

Merged

Update version to latest release 5.11.1: Solves creation and closing of multiple AUTOCUT issues #4188

Merged

prudhvigodithi removed the untriaged Issues that have not yet been triaged label Nov 1, 2023

gaiksaya closed this as completed Nov 1, 2023

prudhvigodithi self-assigned this Nov 1, 2023

prudhvigodithi added this to OpenSearch Engineering Effectiveness Nov 6, 2023

github-project-automation bot moved this to Backlog in OpenSearch Engineering Effectiveness Nov 6, 2023

bbarani moved this from Backlog to Done in OpenSearch Engineering Effectiveness Nov 6, 2023

prudhvigodithi mentioned this issue Nov 15, 2023

[Meta] OpenSearch Release Improvements #4216

Open

32 tasks

bbarani reopened this Feb 29, 2024

github-project-automation bot moved this from Done to Not started in OpenSearch Engineering Effectiveness Feb 29, 2024

github-actions bot added the untriaged Issues that have not yet been triaged label Feb 29, 2024

bbarani removed the untriaged Issues that have not yet been triaged label Mar 1, 2024

prudhvigodithi removed this from OpenSearch Engineering Effectiveness Jun 6, 2024

prudhvigodithi mentioned this issue Jul 19, 2024

[FEATURE] OpenSearch Project Distribution Build and Integration Test Analytics opensearch-project/opensearch-metrics#56

Closed

gaiksaya mentioned this issue Sep 18, 2024

Revamp build and test failure notification system #5038

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Detect flaky distribuiton build failures and integration test failures #4171

[Feature request] Detect flaky distribuiton build failures and integration test failures #4171

gaiksaya commented Oct 24, 2023 •

edited

Loading

prudhvigodithi commented Oct 25, 2023 •

edited

Loading

prudhvigodithi commented Nov 1, 2023 •

edited

Loading

gaiksaya commented Nov 1, 2023

bbarani commented Feb 29, 2024

bbarani commented Mar 5, 2024 •

edited

Loading

prudhvigodithi commented Jun 4, 2024

[Feature request] Detect flaky distribuiton build failures and integration test failures #4171

[Feature request] Detect flaky distribuiton build failures and integration test failures #4171

Comments

gaiksaya commented Oct 24, 2023 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

prudhvigodithi commented Oct 25, 2023 • edited Loading

prudhvigodithi commented Nov 1, 2023 • edited Loading

gaiksaya commented Nov 1, 2023

bbarani commented Feb 29, 2024

bbarani commented Mar 5, 2024 • edited Loading

prudhvigodithi commented Jun 4, 2024

gaiksaya commented Oct 24, 2023 •

edited

Loading

prudhvigodithi commented Oct 25, 2023 •

edited

Loading

prudhvigodithi commented Nov 1, 2023 •

edited

Loading

bbarani commented Mar 5, 2024 •

edited

Loading