Skip to content

Conversation

@sandeep-katta
Copy link
Contributor

@sandeep-katta sandeep-katta commented May 21, 2020

What changes were proposed in this pull request?

IntegralDivide operator returns Long DataType, so integer overflow case should be handled.
If the operands are of type Int it will be casted to Long

Why are the changes needed?

As IntegralDivide returns Long datatype, integer overflow should not happen

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT and also tested in the local cluster

After fix

image

SQL Test

After fix
image

Before Fix
image

@sandeep-katta sandeep-katta changed the title [SPARK-16323][SQL] cast integer to Long to avoid IntegerOverflow for IntegralDivide operator [SPARK-31761][SQL] cast integer to Long to avoid IntegerOverflow for IntegralDivide operator May 21, 2020
@SparkQA
Copy link

SparkQA commented May 22, 2020

Test build #122943 has finished for PR 28600 at commit cac15d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 22, 2020

Test build #122969 has finished for PR 28600 at commit b2eea1f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 22, 2020

Test build #122979 has finished for PR 28600 at commit 7cdc7b5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 22, 2020

Test build #122982 has finished for PR 28600 at commit e2d8e1f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

there is a failure in org.apache.spark.sql.catalyst.expressions.ArithmeticExpressionSuite./ (Divide) for integral type

@sandeep-katta
Copy link
Contributor Author

sandeep-katta commented May 22, 2020

this is because I changed the acceptedType to Long and Decimal

checkEvaluation(IntegralDivide(Literal(1.toByte), Literal(2.toByte)), 0L) checkEvaluation(IntegralDivide(Literal(1.toShort), Literal(2.toShort)), 0L) checkEvaluation(IntegralDivide(Literal(1), Literal(2)), 0L) checkEvaluation(IntegralDivide(Literal(1.toLong), Literal(2.toLong)), 0L) checkEvaluation(IntegralDivide(positiveShortLit, negativeShortLit), 0L) checkEvaluation(IntegralDivide(positiveIntLit, negativeIntLit), 0L) checkEvaluation(IntegralDivide(positiveLongLit, negativeLongLit), 0L)

from the test case shall I remove checkEvaluation for byte,short and int ?

@SparkQA
Copy link

SparkQA commented May 22, 2020

Test build #122988 has finished for PR 28600 at commit d9f4616.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 22, 2020

Test build #122992 has finished for PR 28600 at commit 351e49a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

from the test case shall I remove checkEvaluation for byte,short and int ?

Yes please

@cloud-fan
Copy link
Contributor

org.scalatest.exceptions.TestFailedException: "struct<([3 div 2]):bigint>" did not equal "struct<([CAST(3 AS BIGINT) div CAST(2 AS BIGINT)]):bigint>" Schema did not match for query SELECT 3 div 2

OK so the problem here is we change the auto-generated column name of 3 div 2, because we add cast.

It's not a big deal, we can just update all the tests. But I'm thinking if it's better to embed the cast in the IntegralDivide itself. e.g.

case class IntegralDivide(...) {
  private lazy val div: (Any, Any) => Any = {
    val integral = left.dataType match {
      case _: IntegralType =>
        LongType.integral.asInstanceOf[Integral[Any]] // see the change in this line
      case d: DecimalType =>
        d.asIntegral.asInstanceOf[Integral[Any]]
    }
    (x, y) => {
      val res = integral.quot(x, y)
      if (res == null) {
        null
      } else {
        integral.asInstanceOf[Integral[Any]].toLong(res)
      }
    }
  }

  override def operationCode(v1: String, v2: String): String = {
    if (decimalType) ... else {
      s"((long) $v1) / $v2"
    }
  }
}

This is probably more efficient as there are less expressions

@sandeep-katta
Copy link
Contributor Author

org.scalatest.exceptions.TestFailedException: "struct<([3 div 2]):bigint>" did not equal "struct<([CAST(3 AS BIGINT) div CAST(2 AS BIGINT)]):bigint>" Schema did not match for query SELECT 3 div 2

OK so the problem here is we change the auto-generated column name of 3 div 2, because we add cast.

It's not a big deal, we can just update all the tests. But I'm thinking if it's better to embed the cast in the IntegralDivide itself. e.g.

case class IntegralDivide(...) {
  private lazy val div: (Any, Any) => Any = {
    val integral = left.dataType match {
      case _: IntegralType =>
        LongType.integral.asInstanceOf[Integral[Any]] // see the change in this line
      case d: DecimalType =>
        d.asIntegral.asInstanceOf[Integral[Any]]
    }
    (x, y) => {
      val res = integral.quot(x, y)
      if (res == null) {
        null
      } else {
        integral.asInstanceOf[Integral[Any]].toLong(res)
      }
    }
  }

  override def operationCode(v1: String, v2: String): String = {
    if (decimalType) ... else {
      s"((long) $v1) / $v2"
    }
  }
}

This is probably more efficient as there are less expressions

this was my first approach, now I am updating the golden files.

Let me know which way we need to do, accordingly I will update the code

@SparkQA
Copy link

SparkQA commented May 22, 2020

Test build #123005 has finished for PR 28600 at commit db605e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the tests passed.

@SparkQA
Copy link

SparkQA commented May 23, 2020

Test build #123026 has finished for PR 28600 at commit 3a1a061.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sandeep-katta
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 23, 2020

Test build #123029 has finished for PR 28600 at commit 3a1a061.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too

@HyukjinKwon
Copy link
Member

Merged to master.

@sandeep-katta, it has a conflict against branch-3.0. Although it's minor, can you create a PR to backport it? Looks like these SQL files have some diff in the tests results lately. Let's just create a PR to backport just to doubly make sure.

@sandeep-katta
Copy link
Contributor Author

okay will raise BackPort PR

sandeep-katta added a commit to sandeep-katta/spark that referenced this pull request May 24, 2020
…IntegralDivide operator

`IntegralDivide` operator returns Long DataType, so integer overflow case should be handled.
If the operands are of type Int it will be casted to Long

As `IntegralDivide` returns Long datatype, integer overflow should not happen

No

Added UT and also tested in the local cluster

After fix

![image](https://user-images.githubusercontent.com/35216143/82603361-25eccc00-9bd0-11ea-9ca7-001c539e628b.png)

SQL Test

After fix
![image](https://user-images.githubusercontent.com/35216143/82637689-f0250300-9c22-11ea-85c3-886ab2c23471.png)

Before Fix
![image](https://user-images.githubusercontent.com/35216143/82637984-878a5600-9c23-11ea-9e47-5ce2fb923c01.png)

Closes apache#28600 from sandeep-katta/integerOverFlow.

Authored-by: sandeep katta <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants