Skip to content

[SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4#22400

Closed
srowen wants to merge 4 commits intoapache:masterfrom
srowen:SPARK-25238.2
Closed

[SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4#22400
srowen wants to merge 4 commits intoapache:masterfrom
srowen:SPARK-25238.2

Conversation

@srowen
Copy link
Copy Markdown
Member

@srowen srowen commented Sep 12, 2018

(This change is a subset of the changes needed for the JIRA; see #22231)

What changes were proposed in this pull request?

Use raw strings and simpler regex syntax consistently in Python, which also avoids warnings from pycodestyle about accidentally relying Python's non-escaping of non-reserved chars in normal strings. Also, fix a few long lines.

How was this patch tested?

Existing tests, and some manual double-checking of the behavior of regexes in Python 2/3 to be sure.

…h also avoids warnings from pycodestyle about accidentally relying Python's non-escaping of non-reserved chars in normal strings
@srowen
Copy link
Copy Markdown
Member Author

srowen commented Sep 12, 2018

CC @cclauss @holdenk


failure_note_by_errcode = {
1: 'executing the `dev/run-tests` script', # error to denote run-tests script failures
1: 'executing the dev/run-tests script', # error to denote run-tests script failures
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back-ticks invoke repr or something, I think? not the intent here so I removed them to quiet the warning

Copy link
Copy Markdown
Member

@HyukjinKwon HyukjinKwon Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I think this is a part of the Jenkins test result message. How about # noqa if it complains?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it wouldn't matter as it's just a message printed in Github/Jenkins messages. But yeah noqa is easy.


class BucketedRandomProjectionLSHModel(LSHModel, JavaMLReadable, JavaMLWritable):
"""
r"""
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few docstrings have backslash or backticks in them. This should make sure they don't have surprising effects some day.

@since(2.1)
def approx_count_distinct(col, rsd=None):
"""Aggregate function: returns a new :class:`Column` for approximate distinct count of column `col`.
"""Aggregate function: returns a new :class:`Column` for approximate distinct count of
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long

@SparkQA
Copy link
Copy Markdown

SparkQA commented Sep 12, 2018

Test build #95973 has finished for PR 22400 at commit fc4b49e.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Sep 12, 2018

Test build #95975 has finished for PR 22400 at commit 9c9178b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

columnNameOfCorruptRecord=None, multiLine=None, charToEscapeQuoteEscaping=None,
enforceSchema=None, emptyValue=None):
"""Loads a CSV file stream and returns the result as a :class:`DataFrame`.
r"""Loads a CSV file stream and returns the result as a :class:`DataFrame`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: there are two spaces within result as a :class:Da`

@HyukjinKwon
Copy link
Copy Markdown
Member

retest this please

Copy link
Copy Markdown
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise


def is_release(commit_title):
return re.findall("\[release\]", commit_title.lower()) or \
return re.findall(r"\[release\]", commit_title.lower()) or \
Copy link
Copy Markdown

@cclauss cclauss Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could we use parens to remove line terminating backslashes as recommended in PEP8?
  2. Could we get rid of the use of re in this instance with
    return ("[release]" in commit_title.lower() or
         "preparing spark release" in commit_title.lower() or
         "preparing development version" in commit_title.lower() or
         "CHANGES.txt" in commit_title)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh yeah not sure why it ended up as a regex, actually

Copy link
Copy Markdown

@cclauss cclauss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@SparkQA
Copy link
Copy Markdown

SparkQA commented Sep 12, 2018

Test build #95979 has finished for PR 22400 at commit 9c9178b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Sep 12, 2018

Test build #95978 has finished for PR 22400 at commit 9c9178b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


failure_note_by_errcode = {
1: 'executing the `dev/run-tests` script', # error to denote run-tests script failures
1: 'executing the dev/run-tests script', # error to denote run-tests script failures
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it wouldn't matter as it's just a message printed in Github/Jenkins messages. But yeah noqa is easy.


def is_release(commit_title):
return re.findall("\[release\]", commit_title.lower()) or \
return re.findall(r"\[release\]", commit_title.lower()) or \
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh yeah not sure why it ended up as a regex, actually

print("Previous release tag: %s" % PREVIOUS_RELEASE_TAG)
print("Number of commits in this range: %s" % len(new_commits))
print
print("")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works in Python 2 and 3

# Extract spark component(s):
# Look for alphanumeric chars, spaces, dashes, periods, and/or commas
pattern = re.compile(r'(\[[\w\s,-\.]+\])', re.IGNORECASE)
pattern = re.compile(r'(\[[\w\s,.-]+\])', re.IGNORECASE)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues: \. is unnecessary, and ,-. is probably misleading as it's a range, not three characters in the class. As it happens the range from comma to period is exactly those chars in ASCII!

@SparkQA
Copy link
Copy Markdown

SparkQA commented Sep 12, 2018

Test build #95996 has finished for PR 22400 at commit f750e08.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Copy Markdown
Member

retest this please

@SparkQA
Copy link
Copy Markdown

SparkQA commented Sep 12, 2018

Test build #95998 has finished for PR 22400 at commit 3112b33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Sep 13, 2018

Test build #96006 has finished for PR 22400 at commit 3112b33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master and branch-2.4.

asfgit pushed a commit that referenced this pull request Sep 13, 2018
(This change is a subset of the changes needed for the JIRA; see #22231)

## What changes were proposed in this pull request?

Use raw strings and simpler regex syntax consistently in Python, which also avoids warnings from pycodestyle about accidentally relying Python's non-escaping of non-reserved chars in normal strings. Also, fix a few long lines.

## How was this patch tested?

Existing tests, and some manual double-checking of the behavior of regexes in Python 2/3 to be sure.

Closes #22400 from srowen/SPARK-25238.2.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
(cherry picked from commit 08c76b5)
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
@asfgit asfgit closed this in 08c76b5 Sep 13, 2018
fjh100456 pushed a commit to fjh100456/spark that referenced this pull request Sep 13, 2018
(This change is a subset of the changes needed for the JIRA; see apache#22231)

## What changes were proposed in this pull request?

Use raw strings and simpler regex syntax consistently in Python, which also avoids warnings from pycodestyle about accidentally relying Python's non-escaping of non-reserved chars in normal strings. Also, fix a few long lines.

## How was this patch tested?

Existing tests, and some manual double-checking of the behavior of regexes in Python 2/3 to be sure.

Closes apache#22400 from srowen/SPARK-25238.2.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
@srowen srowen deleted the SPARK-25238.2 branch September 20, 2018 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants