-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8218][SQL] Add binary log math function #6725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f373bac
c795342
c6c187f
21c3bfd
605574d
ebc9929
23c54a3
5b39c02
3d75bfc
1750034
0634ef7
db7dc38
bc89597
8cf37b7
6089d11
beed631
fd01863
102070d
bf96bd9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -255,3 +255,23 @@ case class Pow(left: Expression, right: Expression) | |
| """ | ||
| } | ||
| } | ||
|
|
||
| case class Logarithm(left: Expression, right: Expression) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We seem to be doing this throughout the file, but it seems pretty confusing to me to be using
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh - one thing is that left/right is coming from BinaryExpression
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, that is what I meant saying it wasn't worth whatever code reuse we are getting. The other option would be to name the arguments and have
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inheriting from |
||
| extends BinaryMathExpression((c1, c2) => math.log(c2) / math.log(c1), "LOG") { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we need to override the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think Due to support the case when the left is null in that 10 base logarithm is applied, to override
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, but this is confusing. Is it this lambda that is being used for evaluation or the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be ok because there is other binary math expression
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would argue that that function is also implemented in a confusing way. We should not shoehorn things into the class hierarchy if its going to result hard to follow code. I'd rather we have small amounts of code duplication.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. Sounds reasonable. I think I can refactor this part of codes a little. |
||
| def this(child: Expression) = { | ||
| this(EulerNumber(), child) | ||
| } | ||
|
|
||
| override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = { | ||
| val logCode = if (left.isInstanceOf[EulerNumber]) { | ||
| defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.log($c2)") | ||
| } else { | ||
| defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.log($c2) / java.lang.Math.log($c1)") | ||
| } | ||
| logCode + s""" | ||
| if (Double.valueOf(${ev.primitive}).isNaN()) { | ||
| ${ev.isNull} = true; | ||
| } | ||
| """ | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1083,6 +1083,22 @@ object functions { | |
| */ | ||
| def log(columnName: String): Column = log(Column(columnName)) | ||
|
|
||
| /** | ||
| * Returns the first argument-base logarithm of the second argument. | ||
| * | ||
| * @group math_funcs | ||
| * @since 1.4.0 | ||
| */ | ||
| def log(base: Double, a: Column): Column = Logarithm(lit(base).expr, a.expr) | ||
|
|
||
| /** | ||
| * Returns the first argument-base logarithm of the second argument. | ||
| * | ||
| * @group math_funcs | ||
| * @since 1.4.0 | ||
| */ | ||
| def log(base: Double, columnName: String): Column = log(base, Column(columnName)) | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why don't we support specify the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is suggested by @rxin. I think it is reasonable because it is hard to have a use case to returns the logarithm of one column with another column as base. Usually you want to compute the logarithm values for a column with the same base. |
||
| /** | ||
| * Computes the logarithm of the given value in base 10. | ||
| * | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -236,6 +236,19 @@ class MathExpressionsSuite extends QueryTest { | |
| testOneToOneNonNegativeMathFunction(log1p, math.log1p) | ||
| } | ||
|
|
||
| test("binary log") { | ||
| val df = Seq[(Integer, Integer)]((123, null)).toDF("a", "b") | ||
| checkAnswer( | ||
| df.select(org.apache.spark.sql.functions.log("a"), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we remove the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It will conflict with Logger. |
||
| org.apache.spark.sql.functions.log(2.0, "a"), | ||
| org.apache.spark.sql.functions.log("b")), | ||
| Row(math.log(123), math.log(123) / math.log(2), null)) | ||
|
|
||
| checkAnswer( | ||
| df.selectExpr("log(a)", "log(2.0, a)", "log(b)"), | ||
| Row(math.log(123), math.log(123) / math.log(2), null)) | ||
| } | ||
|
|
||
| test("abs") { | ||
| val input = | ||
| Seq[(java.lang.Double, java.lang.Double)]((null, null), (0.0, 0.0), (1.5, 1.5), (-2.5, 2.5)) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.5