-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Flaky tests: test_gluon_model_zoo_gpu.test_training #13221
Comments
@mxnet-label-bot add [Gluon, GPU, Model Zoo, MKLDNN] |
@pengxin99 will take a look for this issue :) |
@pengzhao-intel @TaoLv dose this error means that the output of CPU model parameters does not the match with the GPU? we run this testcase at our machine and can`t reproduce this error, our envs :
we got ok result as follows:
|
@pengxin99 Though I think you have the right setup, we are testing the build with following parameters:
I see that |
@lebeg I want to test if the
could you tell me what |
The flag |
@lebeg Thanks, and I build with openblas, but still can`t reproduce this issue, i want to know if this issue appear every time in your envrionment?
|
Maybe it is not an issue anymore, feel free to introduce a PR that reverts the disabling of the test. |
Flaky test found - http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-13901/2/pipeline in unrelated PR |
@pengxin99 could you check this test again? |
@pengzhao-intel @ChaiBapchya build code: test seed set: and got test ok as follows:
|
Yet another time
|
@TaoLv please let ruilin taking look for this issue. |
@pengzhao-intel @ChaiBapchya @TaoLv
My Mxnet build environment:
Ran 2 tests in 24.117s OK |
@wuxun-zhang please double check if the issue still exist, thanks. |
@pengzhao-intel Sure. Will take a look at this. |
@pengzhao-intel Just tested with mxnet master (commit: ef19b09) on V100 and cannot reproduce such issue. |
Alright, then we can close this (can reopen if it resurfaces) |
Description
Test fails on master:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1901/pipeline
Related issues
test_gluon_model_zoo_gpu.test_training out of memory
#10323
Flaky test_gluon_model_zoo_gpu.test_training @ Python3: MKLDNN-GPU
#9820
The text was updated successfully, but these errors were encountered: