Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

CI showing random pylint failures #16401

Closed
DickJC123 opened this issue Oct 9, 2019 · 14 comments
Closed

CI showing random pylint failures #16401

DickJC123 opened this issue Oct 9, 2019 · 14 comments

Comments

@DickJC123
Copy link
Contributor

Description

The 'sanity' CI runner is randomly failing with the following pylint error:

python/mxnet/numpy_op_signature.py:21:0: E0402: Attempted relative import beyond top-level package (relative-beyond-top-level)

Some recent examples from different PR's:

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fsanity/detail/PR-16397/2/pipeline

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fsanity/detail/PR-16391/2/pipeline

While there may be a direct fix to numpy_op_signature.py, the bigger question is why are some pipelines showing this error and others not? Is there an inconsistency in the pylint run by python3 -m pylint ...?

@larroy

Environment info (Required)

What to do:
1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
2. Run the script using `python diagnose.py` and paste its output here.

Package used (Python/R/Scala/Julia):
(I'm using ...)

For Scala user, please provide:

  1. Java version: (java -version)
  2. Maven version: (mvn -version)
  3. Scala runtime if applicable: (scala -version)

For R user, please provide R sessionInfo():

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):

MXNet commit hash:
(Paste the output of git rev-parse HEAD here.)

Build config:
(Paste the content of config.mk, or the build command.)

Error Message:

(Paste the complete error message, including stack trace.)

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Bug, CI

@larroy
Copy link
Contributor

larroy commented Oct 9, 2019

Thanks for reporting this, we will have a look.

@reminisce
Copy link
Contributor

It's a pylint bug. I have deleted that line in the master branch. The flakiness should not show up again.

@DickJC123
Copy link
Contributor Author

DickJC123 commented Oct 9, 2019

One of the failures I posted (the one from my PR) is built off of a fairly recent master. It's only missing your 'add boolean ndarray' PR from 10/8/2019, which does not touch the numpy_op_signature.py file. @reminisce, what commit introduced the fix your mention?

I assume that the pylint bug was specific to certain pylint versions (as opposed to being truly non-deterministic). My point to @larroy was that I think the CI runners should use a consistent version of pylint, so the results are the same throughout the many pipelines we launch as we improve our PRs.

@DickJC123
Copy link
Contributor Author

After merging the most recent master into my PR, I still get a failure:

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fsanity/detail/PR-16391/6/pipeline

@reminisce Could you tell us more about the pylint bug as you understand it?

@reminisce
Copy link
Contributor

@DickJC123 I am not an expert in root causing pylint errors, but I guess CI still reports this maybe due to some cached code in the system before you merged the master tip into your branch for two reasons:

  1. It's the same message as you posted before I deleted from __future__ import absolute_import.
  2. The line where the error is reported does not match what pylint describes. It matches the code before you updated your branch to the latest.

I am not sure whether this is because you merged the master branch into yours. Maybe you can try to rebase all of your commits on top of the master branch and see how sanity check goes.

@DickJC123
Copy link
Contributor Author

To answer my own question from above, #16305 introduced the file ./python/mxnet/numpy_op_signature.py with initial lines:

from __future__ import absolute_import
import sys

#16370 attempted to correct the pylint issues, changing the lines to:

from __future__ import absolute_import # pylint: disable=reimported
from sys import version, version_info

And finally the most recent change to numpy_op_signature.py came from commit:

4940ec0e7  2019-10-06  reminisce              Fix random op signature

... which simplified the lines to:

import sys

So why have I seen the pylint problem with my PR #16391? I rebased the changes of my PR onto a master that already had the 4940ec0 commit. I don't understand the mechanism by which there could be "some cached code in the system". Thoughts on this @larroy ?

@reminisce
Copy link
Contributor

@DickJC123 The following is just my guess.

I looked at your branch for PR 16391. Your commits are interleaved with the master branch commits, which suggests it's a merge, instead of rebase. It may require a rebase, which means putting your commits on top of the master branch tip, to refresh the CI cache (if there is any). But again, this is just my speculation. I have submitted three PRs today and have not seen the pylint error come back.

image

@DickJC123
Copy link
Contributor Author

DickJC123 commented Oct 10, 2019

I can confirm that a rebase of my commits to the latest master has corrected this problem. Thanks @reminisce!

@larroy Should I close this issue, or do you want to use my experience to diagnose the root cause of this behavior? In my mind, there's a bug in the way MXNet creates the repo used to test against. Perhaps it's confused by commit time-stamps. Does MXNet have a home-grown script to create the repo or is it an approach used by many git projects?

@larroy
Copy link
Contributor

larroy commented Oct 10, 2019

I'm having this in my PR: #16430 (comment)

I just restarted it.

@DickJC123
Copy link
Contributor Author

It appears the output of pylint differs between the case of jobs==1 and the case jobs==8 (as set in the ./ci/other/pylintrc file). I can get the error python/mxnet/numpy_op_signature.py:21:0: E0402: Attempted relative import beyond top-level package (relative-beyond-top-level) to appear consistently when I set the jobs==1. Perhaps jobs==1 should be the setting for our CI, since it is masking errors and is otherwise (apparently) non-deterministically reporting errors.

@ddavydenko
Copy link
Contributor

@mxnet-label-bot , add [Build]

@DickJC123
Copy link
Contributor Author

Issue fixed by now-merged #16462.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants