Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion python/pyspark/sql/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,8 @@ def _():
'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' +
'polar coordinates (r, theta).',
'hypot': 'Computes `sqrt(a^2^ + b^2^)` without intermediate overflow or underflow.',
'pow': 'Returns the value of the first argument raised to the power of the second argument.'
'pow': 'Returns the value of the first argument raised to the power of the second argument.',
'logarithm': 'Returns the first argument-based logarithm of the second argument',
}

_window_functions = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ object FunctionRegistry {
expression[Tanh]("tanh"),
expression[ToDegrees]("todegrees"),
expression[ToRadians]("toradians"),
expression[Logarithm]("logarithm"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should name this log to make it compatible. (you'd need to make the logarithm expression itself being able to handle both)


// aggregate functions
expression[Average]("avg"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -202,3 +202,14 @@ case class Pow(left: Expression, right: Expression)
"""
}
}

case class Logarithm(left: Expression, right: Expression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be doing this throughout the file, but it seems pretty confusing to me to be using left and right instead of something like value and base. I'm not sure this is worth whatever code reuse we are getting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - one thing is that left/right is coming from BinaryExpression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is what I meant saying it wasn't worth whatever code reuse we are getting. The other option would be to name the arguments and have def left = base, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheriting from BinaryMathExpression seems like a bad idea to me. It is forcing the arguments to have weird names and is resulting in dead code.

extends BinaryMathExpression((c1, c2) => math.log(c2) / math.log(c1), "LOG") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to override the eval(), probably it's better to inherit from BinaryArithmetic, not BinaryMathExpression, then we needn't to pass the additional function object for its parent class.
Or can we remove the eval() from this class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think BinaryArithmetic is more limited in its semantic meaning. As I can tell, it is more suitable for the binary expression between two expressions of same type. BinaryMathExpression is more like to represent math function with two expressions.

Due to support the case when the left is null in that 10 base logarithm is applied, to override eval() is needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but this is confusing. Is it this lambda that is being used for evaluation or the eval function below? We should not have both if only one is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be ok because there is other binary math expression Atan2 that overrides eval because its eval behavior is different than the default one in BinaryMathExpression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that that function is also implemented in a confusing way. We should not shoehorn things into the class hierarchy if its going to result hard to follow code. I'd rather we have small amounts of code duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Sounds reasonable. I think I can refactor this part of codes a little.

override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.log($c2) / java.lang.Math.log($c1)") + s"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's generate two versions of the code, i.e. if there is only one argument, generate the old version directly. When there are two arguments, generate the new version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean, when there is only one argument, Logarithm generates the codes as the same as Log?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if left == EulerNumber, can we generate the code to just use java.lang.Math.log?

if (Double.valueOf(${ev.primitive}).isNaN()) {
${ev.isNull} = true;
}
"""
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -176,4 +176,8 @@ class MathFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper {
testBinary(Atan2, math.atan2)
}

test("binary log") {
testBinary(Logarithm, (c1, c2) => math.log(c2) / math.log(c1),
(1 to 20).map(v => (v * 0.1, v * 0.2)))
}
}
64 changes: 64 additions & 0 deletions sql/core/src/main/scala/org/apache/spark/sql/functions.scala
Original file line number Diff line number Diff line change
Expand Up @@ -1299,6 +1299,70 @@ object functions {
*/
def toRadians(columnName: String): Column = toRadians(Column(columnName))

/**
* Returns the first argument-base logarithm of the second argument.
*
* @group math_funcs
* @since 1.4.0
*/
def logarithm(l: Column, r: Column): Column = Logarithm(l.expr, r.expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's name the 1st argument as "base", and the second as "a".

and also only keep the following 2 versions:

log(base: Double, a: Column)

log(base: Double, a: String)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I think it is OK to just name this log, and we should move them to where log is right now.


/**
* Returns the first argument-base logarithm of the second argument.
*
* @group math_funcs
* @since 1.4.0
*/
def logarithm(l: Column, rightName: String): Column = logarithm(l, Column(rightName))

/**
* Returns the first argument-base logarithm of the second argument.
*
* @group math_funcs
* @since 1.4.0
*/
def logarithm(leftName: String, r: Column): Column = logarithm(Column(leftName), r)

/**
* Returns the first argument-base logarithm of the second argument.
*
* @group math_funcs
* @since 1.4.0
*/
def logarithm(leftName: String, rightName: String): Column =
logarithm(Column(leftName), Column(rightName))

/**
* Returns the first argument-base logarithm of the second argument.
*
* @group math_funcs
* @since 1.4.0
*/
def logarithm(l: Column, r: Double): Column = logarithm(l, lit(r).expr)

/**
* Returns the first argument-base logarithm of the second argument.
*
* @group math_funcs
* @since 1.4.0
*/
def logarithm(leftName: String, r: Double): Column = logarithm(Column(leftName), r)

/**
* Returns the first argument-base logarithm of the second argument.
*
* @group math_funcs
* @since 1.4.0
*/
def logarithm(l: Double, r: Column): Column = logarithm(lit(l).expr, r)

/**
* Returns the first argument-base logarithm of the second argument.
*
* @group math_funcs
* @since 1.4.0
*/
def logarithm(l: Double, rightName: String): Column = logarithm(l, Column(rightName))

//////////////////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////////////////////
Expand Down