Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions dev/create-release/releaseutils.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
sys.exit(-1)

if sys.version < '3':
input = raw_input
input = raw_input # noqa

# Contributors list file name
contributors_file_name = "contributors.txt"
Expand Down Expand Up @@ -152,7 +152,11 @@ def get_commits(tag):
if not is_valid_author(author):
author = github_username
# Guard against special characters
author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
try: # Python 2
author = unicode(author, "UTF-8")
except NameError: # Python 3
author = str(author)
author = unidecode.unidecode(author).strip()
Copy link
Member

@BryanCutler BryanCutler Mar 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you just put this above:

if sys.version > '3':
    unicode = str

and then just do?

author = unidecode.unidecode(unicode(author)).strip()

I don't think you need to specify "UTF-8" because either way it will be a unicode object, but I'm not too sure how this conversion is supposed to work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler I think we do need to specify it because it won't be unicode type in Python 2, we get these from run_cmd which call's Popen communicate which returns a regular string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... My other worry in Py2 would be if sys.setdefaultencoding() was set to somthing other that utf-8. That method was also thankfully dropped in Py3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that we are first casting author this to unicode already with unicode(author) and it doesn't really matter if it is "UTF-8" or not because we then immediately decode it into ASCII with unidecode, which can handle it even it it wasn't "UTF-8", so the end result should be the same I believe. It was just to clean up a little, so not a big deal either way. The way it is now replicates the old behavior, so it's probably safer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it is now [...] it's probably safer.

Let's agree to leave this as is in this PR. EOL of Python 2 in 500 daze away so safe is better.

commit = Commit(_hash, author, title, pr_number)
commits.append(commit)
return commits
Expand Down
2 changes: 1 addition & 1 deletion dev/merge_spark_pr.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
JIRA_IMPORTED = False

if sys.version < '3':
input = raw_input
input = raw_input # noqa

# Location of your Spark git development area
SPARK_HOME = os.environ.get("SPARK_HOME", os.getcwd())
Expand Down
5 changes: 4 additions & 1 deletion python/pyspark/sql/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@
from pyspark import since, _NoValue
from pyspark.rdd import ignore_unicode_prefix

if sys.version_info[0] >= 3:
basestring = str


class RuntimeConfig(object):
"""User-facing configuration API, accessible through `SparkSession.conf`.
Expand Down Expand Up @@ -59,7 +62,7 @@ def unset(self, key):

def _checkType(self, obj, identifier):
"""Assert that an object is of type str."""
if not isinstance(obj, str) and not isinstance(obj, unicode):
if not isinstance(obj, basestring):
Copy link
Member

@HyukjinKwon HyukjinKwon Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine since we rely on short-circuiting but I guess it's fine if it complains.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope

raise TypeError("expected %s '%s' to be a string (was '%s')" %
(identifier, obj, type(obj).__name__))

Expand Down
5 changes: 1 addition & 4 deletions python/pyspark/sql/streaming.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,7 @@
import json

if sys.version >= '3':
intlike = int
basestring = unicode = str
else:
intlike = (int, long)
basestring = str

from py4j.java_gateway import java_import

Expand Down
2 changes: 2 additions & 0 deletions python/pyspark/streaming/dstream.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@

if sys.version < "3":
from itertools import imap as map, ifilter as filter
else:
long = int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for it? Seems only used once and shouldn't be difficult to add a test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a call to slice in tests.py should be enough to test this. I'm surprised we haven't caught this before, but I suppose this isn't a very frequently exercised code path.


from py4j.protocol import Py4JJavaError

Expand Down
34 changes: 33 additions & 1 deletion python/pyspark/streaming/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ def func(dstream):
self._test_func(input, func, expected)

def test_flatMap(self):
"""Basic operation test for DStream.faltMap."""
"""Basic operation test for DStream.flatMap."""
input = [range(1, 5), range(5, 9), range(9, 13)]

def func(dstream):
Expand All @@ -206,6 +206,38 @@ def func(dstream):
expected = [[len(x)] for x in input]
self._test_func(input, func, expected)

def test_slice(self):
"""Basic operation test for DStream.slice."""
import datetime as dt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove the import here since it is at the top already

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, not your doing but I noticed this spelling error on ln 183 "DStream.faltMap" should be "DStream.flatMap", would you mind changing that while we are here?
https://github.com/apache/spark/pull/20838/files#diff-ca4ec8dece48511b915cad6a801695c1R183

self.ssc = StreamingContext(self.sc, 1.0)
self.ssc.remember(4.0)
input = [[1], [2], [3], [4]]
stream = self.ssc.queueStream([self.sc.parallelize(d, 1) for d in input])

time_vals = []

def get_times(t, rdd):
if rdd and len(time_vals) < len(input):
time_vals.append(t)

stream.foreachRDD(get_times)

self.ssc.start()
self.wait_for(time_vals, 4)
begin_time = time_vals[0]

def get_sliced(begin_delta, end_delta):
begin = begin_time + dt.timedelta(seconds=begin_delta)
end = begin_time + dt.timedelta(seconds=end_delta)
rdds = stream.slice(begin, end)
result_list = [rdd.collect() for rdd in rdds]
return [r for result in result_list for r in result]

self.assertEqual(set([1]), set(get_sliced(0, 0)))
self.assertEqual(set([2, 3]), set(get_sliced(1, 2)))
self.assertEqual(set([2, 3, 4]), set(get_sliced(1, 4)))
self.assertEqual(set([1, 2, 3, 4]), set(get_sliced(0, 4)))

def test_reduce(self):
"""Basic operation test for DStream.reduce."""
input = [range(1, 5), range(5, 9), range(9, 13)]
Expand Down
3 changes: 3 additions & 0 deletions sql/hive/src/test/resources/data/scripts/dumpdata_script.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@
#
import sys

if sys.version_info[0] >= 3:
xrange = range

for i in xrange(50):
for j in xrange(5):
for k in xrange(20022):
Expand Down