-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21007][SQL]Add SQL function - RIGHT && LEFT #18228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
128d3e5 to
e227528
Compare
|
ok to test |
|
jenkins add to whitelist |
|
Test build #77824 has finished for PR 18228 at commit
|
|
Are these ANSI SQL functions? If it is just some esoteric MySQL function I don't think we should add them. |
|
Both of mysql and SQL server support these two functions, oracle don't support these functions. |
141a42f to
2136c1b
Compare
|
Test build #78090 has finished for PR 18228 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be implemented using RuntimeReplaceable. For example, NullIf
|
Test build #78422 has finished for PR 18228 at commit
|
|
Test build #78437 has finished for PR 18228 at commit
|
|
As we already have |
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change it to something like
val pos = xyz
val len = xyz
string.asInstanceOf[UTF8String].substringSQL(pos, len)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not split the codes in the middle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you post the outputs of MySQL when the length is negative?
|
@gatorsmile Thanks mysql> select right("sparksql",null); mysql> select left("sparksql",null); |
|
Test build #79432 has finished for PR 18228 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we use RuntimeReplaceable? I think both left and right can be implemented by substring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,i will do,thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left has been implemented by substring, but Right implemented by substring may be not very good:
case class Right(str: Expression, len: Expression, child: Expression)
extends RuntimeReplaceable {
def this(str: Expression, len: Expression) = {
this(str, len, Substring(str, If(LessThanOrEqual(len, Literal(0)),
Literal(Integer.MAX_VALUE), UnaryMinus(len)), len))
}
override def flatArguments: Iterator[Any] = Iterator(str, len)
override def sql: String = s"$prettyName(${str.sql}, ${len.sql})"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substring supports negative position, we can implement Right as Substring(str, UnaryMinus(len)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example:
select right("sparksql",-2);
for this case,we expected is "",
if we implement Right as Substring(str, UnaryMinus(len)). this result will be parksql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so we should do: If(LessThanOrEqual(len, Literal(0), Literal(UTF8String.EMPTY_UTF8), Substring(str, UnaryMinus(len))). Complex expression is OK, after codegen, it should be almost same as a customize implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK,thank you very much.
|
Test build #79456 has finished for PR 18228 at commit
|
fa81e44 to
1cb0448
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also explain the behavior if len is less or equal than 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also mention that len can be string type. BTW is this common in other databases to support string type len?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, MYSQL support string type len, too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the corrected answer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this has a question with this test case: left("abcd", 'a')
In Mysql:
mysql> select left("abcd", -2), left("abcd", 0), left("abcd", 'a');
+------------------+-----------------+-------------------+
| left("abcd", -2) | left("abcd", 0) | left("abcd", 'a') |
+------------------+-----------------+-------------------+
| | | |
+------------------+-----------------+-------------------+
mysql> select right("abcd", -2), right("abcd", 0), right("abcd", 'a');
+-------------------+------------------+--------------------+
| right("abcd", -2) | right("abcd", 0) | right("abcd", 'a') |
+-------------------+------------------+--------------------+
| | | |
+-------------------+------------------+--------------------+
Substring is same as Left
|
Test build #79510 has finished for PR 18228 at commit
|
|
Test build #79511 has finished for PR 18228 at commit
|
|
Test build #79518 has finished for PR 18228 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int: ... the result is an empty string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer If(LessThanOrEqual(len, Literal(0), Literal(UTF8String.EMPTY_UTF8), Substring(str, UnaryMinus(len))).
The reason is that, your expression will end up calling UTF8String.substringSQL(Int.Max, ...), which goes through all bytes in this UTF8String and is a performance waste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right(null, -10)
I agree with you, but , for this test case, there is a problem:
Which we expected is null,but it is an empty string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do the null check first, e.g.
If(
IsNull(str),
Literal(null, StringType),
If(
LessThanOrEqual(len, Literal(0)),
Literal(UTF8String.EMPTY_UTF8, StringType),
new Substring(str, UnaryMinus(len))
)
)
|
Test build #79548 has finished for PR 18228 at commit
|
|
Test build #79550 has finished for PR 18228 at commit
|
|
retest this please |
|
Test build #79551 has finished for PR 18228 at commit
|
|
LGTM, merging to master! |
What changes were proposed in this pull request?
Add SQL function - RIGHT && LEFT, same as MySQL:
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_left
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_right
How was this patch tested?
unit test