Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -482,11 +482,17 @@ case class Sort(

/** Factory for constructing new `Range` nodes. */
object Range {
def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range = {
def apply(start: Long, end: Long, step: Long, numSlices: Option[Int])
: LeafNode with MultiInstanceRelation = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this signature looks weird...

val output = StructType(StructField("id", LongType, nullable = false) :: Nil).toAttributes
new Range(start, end, step, numSlices, output)
if (start == end || (start < end ^ 0 < step)) {
LocalRelation(output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have something like Range.empty to still return a Range?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @cloud-fan . In that case, which spec can we use for that empty Range which is returned from Range.empty? For the following cases, it seems that we need to remove one of the following precondition again.

  require(step != 0, s"step ($step) cannot be 0")
  require(start != end, s"start ($step) cannot be equal to end ($end)")
  require(start < end ^ step < 0, s"the sign of step ($step) is invalid for range ($start, $end)")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree that an empty Range seems a more reasonable choose than a LocalRelation to return for those cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the values of parameters, start, end, step?

Copy link
Member

@viirya viirya Jun 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit weird to see a LocalRelation node when constructing a Range node.

If possible, I'd let the empty Range keeps its wrong parameters as is. So the logical plan can be consistent with input query. And only turning it to empty relation when planning the query.

Of course this involves few change in Range's planning. I'm not sure if it's acceptable to others. So I'm ok with current solution too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, at the first commit, I handled this invalid 'Range' in a new optimizer, 'RemoveInvalidRange'. That would be a similar approach.

I will wait more comment on this~ Thank you for review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think doing it in optimization sounds not ok for me too. It's not an optimization actually. Anyway, let's wait for other comments. Maybe there's other good ways to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like empty Range keeps its wrong parameters as is, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then do you mean removing the 'require' statements back? For the invalid parameters, 'require' raises exceptions now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiszk . What is the difference in your empty Range and invalid Range? It looks to me the same in your suggestion.

} else {
new Range(start, end, step, numSlices, output)
}
}
def apply(start: Long, end: Long, step: Long, numSlices: Int): Range = {
def apply(start: Long, end: Long, step: Long, numSlices: Int)
: LeafNode with MultiInstanceRelation = {
Range(start, end, step, Some(numSlices))
}
}
Expand All @@ -500,6 +506,8 @@ case class Range(
extends LeafNode with MultiInstanceRelation {

require(step != 0, s"step ($step) cannot be 0")
require(start != end, s"start ($step) cannot be equal to end ($end)")
require(start < end ^ step < 0, s"the sign of step ($step) is invalid for range ($start, $end)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow to return an empty relation/empty range in Range.apply, why we won't allow it too here? Although it's no-op, but looks like it's legal query?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This precheck statements are added by review comments. This explains the assumptions used in the implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? I saw the comments we should generate no-op code if the query is no-op? And you also said we cannot raise exceptions on Dataset operations inconsistently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already prevent them in the above factory method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You assume we should only use factory Range to construct the nodes.


val numElements: BigInt = {
val safeStart = BigInt(start)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,17 @@ class DataFrameRangeSuite extends QueryTest with SharedSQLContext with Eventuall
checkAnswer(sql("SELECT * FROM range(3)"), Row(0) :: Row(1) :: Row(2) :: Nil)
}
}

test("SPARK-21041 SparkSession.range()'s behavior is inconsistent with SparkContext.range()") {
val start = java.lang.Long.MAX_VALUE - 3
val end = java.lang.Long.MIN_VALUE + 2
Seq("false", "true").foreach { value =>
withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> value) {
assert(spark.sparkContext.range(start, end, 1).collect.length == 0)
assert(spark.range(start, end, 1).collect.length == 0)
Copy link
Member

@viirya viirya Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also test the case start = end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

}
}
}
}

object DataFrameRangeSuite {
Expand Down