Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-644] Automated flaky test detection #11991

Closed
wants to merge 23 commits into from

Conversation

cetsai
Copy link
Contributor

@cetsai cetsai commented Aug 2, 2018

Description

This PR adds the necessary components for an automated flaky tests detection measure, the design of which is detailed on the wiki.

These components, diff collator, dependency analyzer, and flakiness checker are used by the check_flakiness script, which will be run in a Jenkins pipeline to automatically check PRs for flaky tests. Once active, the tool will mark PRs that cause flaky tests so that they can be fixed before being merged with master.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Moved flakiness_checker.py to flaky_tests folder, along with other improvements
  • Added script, check_flakiness.py, which will be used in a Jenkins pipeline to check commits for flaky tests
  • Added Jenkinsfile and docker run-time function to automate the checking of commits for flaky tests

@cetsai cetsai requested a review from szha as a code owner August 2, 2018 02:21
@cetsai
Copy link
Contributor Author

cetsai commented Aug 2, 2018

@marcoabreu @haojin2

@haojin2
Copy link
Contributor

haojin2 commented Aug 2, 2018

Why do you have some updates in mshadow and tvm? Did you do git submodule update --init --recursive before you commit any changes?

@cetsai cetsai force-pushed the flaky_test_bot branch 4 times, most recently from 9b3edb9 to 8ec8f08 Compare August 2, 2018 17:06
Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some reviews for now

return [(filename, test)
for filename in deps.keys()
for test in deps[filename]
if test.startswith("test_")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this "test_" a constant header and document this clearly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for t in tests:
total_time += time_test(t)

n = int(TIME_BUDGET / total_time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle divide-by-zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



def output_results(flaky, nonflaky):
print("Following tests failed flakiness checker:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use logger?

Copy link
Contributor Author

@cetsai cetsai Aug 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually my approach is to use logging for outputting information about program execution and using print for actual output. However, I'd be fine with switching over to logging, if that's more in line with what MXNet does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using logging for everything is preferred in general

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tools/flaky_tests/dependency_analyzer.py Show resolved Hide resolved
@cetsai
Copy link
Contributor Author

cetsai commented Aug 6, 2018

@marcoabreu, could you take a look at least at the Jenkinsfile to see if I'm missing anything

node('mxnetlinux-gpu') {
ws('workspace/flakiness_check'){
init_git()
docker_run('ubuntu_gpu', 'run_flakiness_checker', true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it not need a compiled version of MXNet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, for now I'll build mxnet each time we run this, but really we should only do so when dependency analyzer detects changed tests. This would take some refactoring, however, and I think it would be good to get a version of this tool running ASAP.

@marcoabreu
Copy link
Contributor

Sorry, I'm currently swamped with high priority tasks and I don't have time to review your pull request.

cetsai and others added 10 commits August 8, 2018 15:36
reorganized code

changed diff collator output

added logging and improved command-line options

Removed extra space added some comments

fixed documentation

create folder
wip on chack_branch

changed logging and added support for cross-file dependnecies

finished basic check_branch
# The first commit's message is:
check_branch is demo-ready

# This is the 2nd commit message:

renamed test_selector to dependency_analyzer

# This is the 3rd commit message:

fixed check_branch output

# This is the 4th commit message:

refactoring

# This is the 5th commit message:

improved logging
renamed test_selector to dependency_analyzer

fixed check_branch output

refactoring

improved logging

wip ci deployment

changde logging levels

code improvment
changed Jenkinsfile to use redesigned system

included config file for dependency analyzer

minor fixes
// only continue if some tests were selected
if( ! tests ) {
currentBuild.result = 'SUCCESS'
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't return here. The wrapping logic (utils.main_wrapper) is taking care of propagating the results properly. Just do nothing in that case.

utils.init_git()
utils.docker_run('ubuntu_cpu', 'select_tests', false)
tests = fileExists('tests.tmp')
stash name:'flaky_tests', includes:'tests.tmp'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

node(NODE_LINUX_CPU){
ws('workspace/fc-preprocessing'){
utils.init_git()
utils.docker_run('ubuntu_cpu', 'select_tests', false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select_tests does not exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, fixed

tools/flaky_tests/Jenkinsfile Outdated Show resolved Hide resolved
tools/flaky_tests/Jenkinsfile Outdated Show resolved Hide resolved
@cetsai
Copy link
Contributor Author

cetsai commented Sep 4, 2018

Thanks for the reviews @marcoabreu anything else needed in order to merge?

@lebeg
Copy link
Contributor

lebeg commented Sep 6, 2018

There is an error on the run job: http://jenkins.mxnet-ci-dev.amazon-ml.com/job/flaky-test-detector/view/change-requests/job/PR-11991/4/console

+ NOSE_COVERAGE_ARGUMENTS='--with-coverage --cover-inclusive --cover-xml --cover-branches --cover-package=mxnet'
+ set +x
+ tool/flaky_test_bot/test_selector.py -b HEAD~1 HEAD
/work/runtime_functions.sh: line 1012: tool/flaky_test_bot/test_selector.py: No such file or directory

@cetsai would you be able to look into it?

@cetsai
Copy link
Contributor Author

cetsai commented Sep 6, 2018

@lebeg it was fixed in the last commit

E: sorry, I forgot I changed the directory name

@lebeg
Copy link
Contributor

lebeg commented Sep 6, 2018

Have you tried to test your changes locally? The command would be:

ci/build.py -p ubuntu_cpu /work/runtime_functions.sh flaky_check_select_tests

and

ci/build.py -p ubuntu_gpu /work/runtime_functions.sh flaky_check_run_flakiness_checker

@cetsai cetsai force-pushed the flaky_test_bot branch 3 times, most recently from 72b63e3 to a442e7a Compare September 6, 2018 20:04
}
flaky_check_run_flakiness_checker(){
set -ex
export PYTHONPATH=./python/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, I was looking at the python2 unit test runtime function, which uses export PYTHONPATH=./python/ https://github.com/apache/incubator-mxnet/blob/64566872a28a9426f3ec20bcf0210ebb608854f8/ci/docker/runtime_functions.sh#L639

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the pipeline is failing on not able to import mxnet here: http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/flaky-test-detector/detail/PR-11991/11/pipeline/67. Maybe we should switch to /work/mxnet/python/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps-- however, that run was triggered before the last commit, so I don't know which of these options we should go with. Perhaps @marcoabreu can help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu could you give some help here to get this to work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haojin2 @cetsai @lebeg @marcoabreu requesting an update on this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haojin2 I'm having trouble running this locally, can you try ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haojin2 @cetsai Were you able to run this locally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, don't have any cycles for this at this moment...

@roywei
Copy link
Member

roywei commented Oct 29, 2018

ping @cetsai any updates?

@anirudhacharya
Copy link
Member

@nswamy @sandeep-krishnamurthy can you please close this PR.

@cetsai feel free to reopen the PR once the changes are ready.

@stu1130
Copy link
Contributor

stu1130 commented Nov 20, 2018

@nswamy @sandeep-krishnamurthy can you please close this PR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.