-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional details on Gradle Check failures autocut issues #13950
Comments
I added a comment on #11217 (comment), highlighting that we need to avoid tainted data from tests broken by changes in an open PR, but that get fixed before the PR is merged. (Short story: I added a commit to fix 1 test and broke over 1000 other tests. Nobody else saw it though, because my mistake was limited to my open PR.) Maybe a good heuristic could look at tests that fail across multiple PRs in some time window? Alternatively, if we have a job that just runs Gradle check continuously (without changing any code), it could be a good canary to collect "true" failures. |
Thanks @msfroh, the question I have is for |
Correct -- it was caught while the PR was still open. I noticed that those 1000+ test failures showed up on the dashboard (and made it look like |
Hey @msfroh thanks for your input, the idea is to create a issue on when a test is failed in Post Merge Action (after the PR is merged) not for the failed tests that are part of the open PR, so in your case the tests that are caught while the PR is open, the automaton will not create issues for these. Once the PR is merged by the maintainers after the gradle check is green, there is one more gradle check that gets triggered based on the merge commit, so whatever tests fail as part of this commit are considered flaky as they just showed green for the PR to get merged. The metrics project is collecting this data and will use it for issue creation. |
Hey @reta coming from #14012 I can see the test |
Hey @prudhvigodithi , this is fair point, I will close the #14012 for now since indeed the code is not merged but should not be in any correlation. Thanks for bringing it up! |
Thanks @reta, but there could be a chance where it was lucky and have not failed in post merge actions :) we should even have a mechanism to flag these type of flaky issues as well that are part of the open PR. |
I will start with an automation by creating issues at class level in the following format. I have noticed one class can have multiple failing tests so rather than creating multiple issues for each failing test we can group at class level. For example the class Title:[AUTOCUT] The Gradle check encountered flaky failures with Noticed the
The other pull requests, besides those involved in post-merge actions, that contain failing tests with the For more details on the failed tests refer to OpenSearch Gradle Check Metrics dashboard. |
Thanks @prudhvigodithi , it looks awesome to start with |
Hey Just an update on this, I have the issues created in my fork repo with automation Also what I noticed is the label used today is |
I know we use |
That answer is probably lost to history. I don't think it has any special meaning as far as I know. |
The automation flagged and created the following issues (46 of them) which are identifying as flaky tests from past one month. #14332 Adding @andrross @reta @dblock @msfroh @getsaurabh02 to please take a look. |
I just created a PR #14334 to remove the issue creation from the gradle check workflow. |
The 2nd time the automation wont create new issues but rather updates the existing issue body if an issue for a flaky test already exists. FYI Re-ran the job to validate this https://build.ci.opensearch.org/job/gradle-check-flaky-test-detector/4/console. |
Hey FYI, I have added a new visualizations to OpenSearch Gradle Check Metrics dashboard to track the trend of these flaky issues. |
It looks pretty cool, thanks @prudhvigodithi ! Just one question, I noticed the |
Should we find a way to link/close the existing manually created flaky test issues and assign the automatically cut ones to the same devs? |
Hey @reta I just referred this issue in past #14255 and added |
Sure @dblock, we should do that. |
That would help to preserve any existing boards / filters, thanks @prudhvigodithi |
@prudhvigodithi done |
Thanks @dblock. |
Hey @andrross @dblock @reta, since we have the automation in place please let me know if we can close this issue ? For:
There are 145 manual created issue for the flaky tests, we should go over them and close if there is already an automation issue created for the same flaky test. |
I think we could close this issue indeed, thanks @prudhvigodithi |
Closing. Thanks @prudhvigodithi! |
Is your feature request related to a problem? Please describe
Coming from #11217 and #3713 we now have the the failure analytics for OpenSearch Gradle Check failures. For more details check the Developer Guide. The next step is to highlight the flaky failures as GitHub Issues so they can be prioritized and fixed. Additionally, when a test fails during the creation of a new PR but is not related to the PR's code changes, this process will help make the contributor aware of the existing flaky test.
Describe the solution you'd like
From the OpenSearch Gradle Check MetricsOpenSearch Gradle Check Metrics dashboard create an issue (with specific format) on failing tests that are part of post merge actions. The failing tests from post merge actions which are executed after the PR is merged are for sure the flaky tests.
Related component
Build
Describe alternatives you've considered
Today the gradle check failure issues are created from https://github.com/opensearch-project/OpenSearch/blob/main/.github/workflows/gradle-check.yml#L161-L168 which sometimes fails to execute https://github.com/opensearch-project/OpenSearch/actions/runs/9320653340/job/25657907035, so clean up the issues and disable the functionality in the gradle-check.yml.
Additional context
Coming from @dblock #11217 (comment) we should also
The text was updated successfully, but these errors were encountered: