Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion python/pyspark/sql/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
"""
A collections of builtin functions
"""
import math
import sys

if sys.version < "3":
Expand Down Expand Up @@ -143,7 +144,7 @@ def _():
'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' +
'polar coordinates (r, theta).',
'hypot': 'Computes `sqrt(a^2^ + b^2^)` without intermediate overflow or underflow.',
'pow': 'Returns the value of the first argument raised to the power of the second argument.'
'pow': 'Returns the value of the first argument raised to the power of the second argument.',
}

_window_functions = {
Expand Down Expand Up @@ -403,6 +404,21 @@ def when(condition, value):
return Column(jc)


@since(1.4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.5

def log(col, base=math.e):
"""Returns the first argument-based logarithm of the second argument.

>>> df.select(log(df.age, 10.0).alias('ten')).map(lambda l: str(l.ten)[:7]).collect()
['0.30102', '0.69897']

>>> df.select(log(df.age).alias('e')).map(lambda l: str(l.e)[:7]).collect()
['0.69314', '1.60943']
"""
sc = SparkContext._active_spark_context
jc = sc._jvm.functions.log(base, _to_java_column(col))
return Column(jc)


@since(1.4)
def lag(col, count=1, default=None):
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ object FunctionRegistry {
expression[Expm1]("expm1"),
expression[Floor]("floor"),
expression[Hypot]("hypot"),
expression[Logarithm]("log"),
expression[Log]("ln"),
expression[Log10]("log10"),
expression[Log1p]("log1p"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -255,3 +255,23 @@ case class Pow(left: Expression, right: Expression)
"""
}
}

case class Logarithm(left: Expression, right: Expression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be doing this throughout the file, but it seems pretty confusing to me to be using left and right instead of something like value and base. I'm not sure this is worth whatever code reuse we are getting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - one thing is that left/right is coming from BinaryExpression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is what I meant saying it wasn't worth whatever code reuse we are getting. The other option would be to name the arguments and have def left = base, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheriting from BinaryMathExpression seems like a bad idea to me. It is forcing the arguments to have weird names and is resulting in dead code.

extends BinaryMathExpression((c1, c2) => math.log(c2) / math.log(c1), "LOG") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to override the eval(), probably it's better to inherit from BinaryArithmetic, not BinaryMathExpression, then we needn't to pass the additional function object for its parent class.
Or can we remove the eval() from this class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think BinaryArithmetic is more limited in its semantic meaning. As I can tell, it is more suitable for the binary expression between two expressions of same type. BinaryMathExpression is more like to represent math function with two expressions.

Due to support the case when the left is null in that 10 base logarithm is applied, to override eval() is needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but this is confusing. Is it this lambda that is being used for evaluation or the eval function below? We should not have both if only one is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be ok because there is other binary math expression Atan2 that overrides eval because its eval behavior is different than the default one in BinaryMathExpression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that that function is also implemented in a confusing way. We should not shoehorn things into the class hierarchy if its going to result hard to follow code. I'd rather we have small amounts of code duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Sounds reasonable. I think I can refactor this part of codes a little.

def this(child: Expression) = {
this(EulerNumber(), child)
}

override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
val logCode = if (left.isInstanceOf[EulerNumber]) {
defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.log($c2)")
} else {
defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.log($c2) / java.lang.Math.log($c1)")
}
logCode + s"""
if (Double.valueOf(${ev.primitive}).isNaN()) {
${ev.isNull} = true;
}
"""
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -204,4 +204,22 @@ class MathFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper {
testBinary(Atan2, math.atan2)
}

test("binary log") {
val f = (c1: Double, c2: Double) => math.log(c2) / math.log(c1)
val domain = (1 to 20).map(v => (v * 0.1, v * 0.2))

domain.foreach { case (v1, v2) =>
checkEvaluation(Logarithm(Literal(v1), Literal(v2)), f(v1 + 0.0, v2 + 0.0), EmptyRow)
checkEvaluation(Logarithm(Literal(v2), Literal(v1)), f(v2 + 0.0, v1 + 0.0), EmptyRow)
checkEvaluation(new Logarithm(Literal(v1)), f(math.E, v1 + 0.0), EmptyRow)
}
checkEvaluation(
Logarithm(Literal.create(null, DoubleType), Literal(1.0)),
null,
create_row(null))
checkEvaluation(
Logarithm(Literal(1.0), Literal.create(null, DoubleType)),
null,
create_row(null))
}
}
16 changes: 16 additions & 0 deletions sql/core/src/main/scala/org/apache/spark/sql/functions.scala
Original file line number Diff line number Diff line change
Expand Up @@ -1083,6 +1083,22 @@ object functions {
*/
def log(columnName: String): Column = log(Column(columnName))

/**
* Returns the first argument-base logarithm of the second argument.
*
* @group math_funcs
* @since 1.4.0
*/
def log(base: Double, a: Column): Column = Logarithm(lit(base).expr, a.expr)

/**
* Returns the first argument-base logarithm of the second argument.
*
* @group math_funcs
* @since 1.4.0
*/
def log(base: Double, columnName: String): Column = log(base, Column(columnName))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we support specify the base from a column?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is suggested by @rxin. I think it is reasonable because it is hard to have a use case to returns the logarithm of one column with another column as base. Usually you want to compute the logarithm values for a column with the same base.

/**
* Computes the logarithm of the given value in base 10.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,19 @@ class MathExpressionsSuite extends QueryTest {
testOneToOneNonNegativeMathFunction(log1p, math.log1p)
}

test("binary log") {
val df = Seq[(Integer, Integer)]((123, null)).toDF("a", "b")
checkAnswer(
df.select(org.apache.spark.sql.functions.log("a"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the org.apache.spark.sql.functions. ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will conflict with Logger.

org.apache.spark.sql.functions.log(2.0, "a"),
org.apache.spark.sql.functions.log("b")),
Row(math.log(123), math.log(123) / math.log(2), null))

checkAnswer(
df.selectExpr("log(a)", "log(2.0, a)", "log(b)"),
Row(math.log(123), math.log(123) / math.log(2), null))
}

test("abs") {
val input =
Seq[(java.lang.Double, java.lang.Double)]((null, null), (0.0, 0.0), (1.5, 1.5), (-2.5, 2.5))
Expand Down