Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix flakey pylint CI failures #16462

Merged
merged 7 commits into from
Oct 15, 2019
Merged

Conversation

DickJC123
Copy link
Contributor

Description

As reported in #16401, the pylint check performed by the 'sanity' CI runner has been flakey. Toward correcting this, the planned steps of this PR are:

  1. demo that the pylint failure can be made solid if the pylint parallelism is reduced from the current 8 jobs to 1 job.
  2. fix the source code issues exposed
  3. merge the PR, leaving the pylint parallelism at jobs==1 (at a runtime expense of about 1 minute, a small price to pay for complete and reliable error reporting)

If others want to jump in to diagnose the pylint issues, the CI command is:

python3 -m pylint --rcfile=./ci/other/pylintrc --ignore-patterns=".*\.so$,.*\.dll$,.*\.dylib$" python/mxnet tools/caffe_converter/*.py

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@marcoabreu
Copy link
Contributor

Reducing parallelism for the cost of a minute sounds fine to me. Good call.

@DickJC123 DickJC123 requested a review from szha as a code owner October 13, 2019 02:26
@DickJC123
Copy link
Contributor Author

DickJC123 commented Oct 13, 2019

My 2nd commit successfully eliminated the E0402 error triggered by numpy_op_signature.py.
Since @reminisce, @stu1130 and @larroy were involved in prior tweeking this file for this issue, I invite any of them to weigh-in on my fix, which basically involved massaging the syntax of the code without really changing the functionality.

So it seems that the jobs==1 setting generates many more errors than jobs==N. For our codebase, it introduced too-many-lines, duplicate-code and cyclic-import errors. I could see how the duplicate-code and cyclic-import errors might be turned off with jobs=N, since one really needs a view of the entire codebase to report all errors correctly. I'm not sure why suddenly too-many-lines errors are cropping up, but it isn't in the scope of this PR to fix our in-some-cases massive modules.

Bottom line: with my third commit, I've disabled too-many-lines, duplicate-code and cyclic-import, and now pylint --jobs=1 is passing. I'll trigger the CI again to gain some confidence in the solution.

Removing WIP. Please review.

@DickJC123 DickJC123 changed the title [WIP] fix flakey pylint Ci failures Fix flakey pylint Ci failures Oct 13, 2019
@DickJC123 DickJC123 changed the title Fix flakey pylint Ci failures Fix flakey pylint CI failures Oct 13, 2019
@@ -19,6 +19,7 @@

import sys
import warnings
import inspect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting the efforts on taming pylint. I'm curious why this reorganization of the code would make the pylint error disappear?

Copy link
Contributor Author

@DickJC123 DickJC123 Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, first I should mention that as I developed the PR on my local machine, I was seeing slightly different pylint output than the upstream CI. This could be due to a different pylint version. In the comment that follows, I'll refer to the pylint command I posted above in the PR description as <pylint_cmd>:

With the import inspect statement in its original location, I saw:

$ <pylint_cmd> | grep numpy_op_signature
************* Module mxnet.numpy_op_signature
python/mxnet/numpy_op_signature.py:C0415:63,4: _register_op_signatures: Import outside toplevel (inspect)

With the import inspect statement moved, no such output line was generated. Since I was touching the file numpy_op_signature.py anyway to eliminate the E0402 error, I thought I'd make the file totally clean w.r.t pylint by moving the import inspect.

Introducing the line op_module = root_module was the fix that really mattered (eliminated the E0402 error). Since it didn't change the logic of the code materially, I view this as working around the quirks of pylint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, pylint is pinned in some cases inside the ci folder:

install/requirements
30:pylint==2.3.1; python_version >= '3.0'

We should use just one version of pylint...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI appears to pull in pylint 2.3.1. The most current pylint is 2.4.2.

too-many-statements,
too-many-lines,
duplicate-code,
cyclic-import
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cyclic import not a valid thing to consider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, however there are over 90 cyclic-import errors reported. Analyzing the python software architecture and possibly refactoring it to eliminate cyclic imports is beyond the scope of this PR. I'll file an issue to see if one of the python major code owners will look into this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks

@ptrendx ptrendx merged commit f9359c3 into apache:master Oct 15, 2019
aaronmarkham pushed a commit to aaronmarkham/incubator-mxnet that referenced this pull request Oct 16, 2019
* Change pylint parallel jobs from 8 to 1

* Non-functional tweeks to keep pylint happy

* Disable pylint too-many-lines, duplicate-code, cyclic-import

* Trigger CI

* Trigger CI

* Fighting unrelated CI failures.  Trigger CI.

* Fighting unrelated CI failures.  Trigger CI.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants