Skip to content

Conversation

@07ARB
Copy link
Contributor

@07ARB 07ARB commented Nov 12, 2019

What changes were proposed in this pull request?

rpad and lpad should return NULL when padstring parameter is empty

Why are the changes needed?

Returns str, right-padded or left-padded with pad to a length of length. If str is longer than length, the return value is shortened to length characters. In case of empty pad string, the return value is null.

Different behaviours of rpad and lpad function in case of empty pad string :

Mysql : rpad ('hi', 5, '') => NULL,   lpad ('hi', 5, '') => NULL
Hive : rpad ('hi', 5, '') => NULL,  lpad ('hi', 5, '') => NULL
Oracle : rpad('hi', 5, '') => NULL,   lpad ('hi', 5, '') => NULL
Postgresql : rpad ('hi', 5, '') => hi,  lpad ('hi', 5, '') => hi
presto> select rpad('hi', 5, '');  select lpad('hi', 5, '');
Query 20191225_040937_00004_htqqc failed: Padding string must not be empty

Note : Implemented RPAD and LPAD function as per the definition.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Old unit tests correct as per this jira.

@07ARB
Copy link
Contributor Author

07ARB commented Nov 12, 2019

@HyukjinKwon , please verify this PR.

@07ARB
Copy link
Contributor Author

07ARB commented Nov 13, 2019

@HyukjinKwon, this we can fix it now ?

@07ARB 07ARB reopened this Dec 24, 2019
@07ARB
Copy link
Contributor Author

07ARB commented Dec 24, 2019

@HyukjinKwon , please verify this PR.

@07ARB
Copy link
Contributor Author

07ARB commented Dec 24, 2019

I refer other DBMS and analyse in other project like HIVE they have fixed this issue.
It's good to handle in spark.

@maropu
Copy link
Member

maropu commented Dec 24, 2019

Can you add end-to-end tests, too, before accepting tests?

@07ARB
Copy link
Contributor Author

07ARB commented Dec 24, 2019

ok @maropu

@07ARB
Copy link
Contributor Author

07ARB commented Dec 24, 2019

@maropu and @cloud-fan , i have modified exiting end to end test cases , please review again.
kindly inform if i missed anything

@maropu
Copy link
Member

maropu commented Dec 25, 2019

ok to test

@maropu
Copy link
Member

maropu commented Dec 25, 2019

Also, you need to update the migration guide because this is a behaviour change.

@07ARB
Copy link
Contributor Author

07ARB commented Dec 25, 2019

sure i will update the migration guide also.

@07ARB 07ARB changed the title [SPARK-29776][SQL] rpad returning invalid value when parameter is empty [SPARK-29776][SQL] rpad should return NULL when padstring parameter is empty Dec 25, 2019
@maropu
Copy link
Member

maropu commented Dec 25, 2019

@maropu
Copy link
Member

maropu commented Dec 25, 2019

cc: @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Dec 25, 2019

Test build #115758 has finished for PR 26477 at commit 69cd167.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@07ARB
Copy link
Contributor Author

07ARB commented Dec 25, 2019

@maropu , I have fixed all the review comments, please check.

@SparkQA
Copy link

SparkQA commented Dec 25, 2019

Test build #115762 has finished for PR 26477 at commit 628c070.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 26, 2019

Test build #115813 has finished for PR 26477 at commit def8a0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Dec 26, 2019

What about lpad too?

@07ARB
Copy link
Contributor Author

07ARB commented Dec 26, 2019

@srowen , for lpad SPARK-29853 this jira is there, after this i will reopen PR for lpad.

@srowen
Copy link
Member

srowen commented Dec 26, 2019

These are so closely related that they should be one PR and JIRA

@07ARB
Copy link
Contributor Author

07ARB commented Dec 26, 2019

ok then i will combine both PR.

@07ARB 07ARB changed the title [SPARK-29776][SQL] rpad should return NULL when padstring parameter is empty [SPARK-29776][SQL] rpad and lpad should return NULL when padstring parameter is empty Dec 26, 2019
@07ARB
Copy link
Contributor Author

07ARB commented Dec 26, 2019

@srowen and @maropu , i have updated the code for lpad function also, please review.
(I have handled all the above review comments during lpad function implementation).

@SparkQA
Copy link

SparkQA commented Dec 26, 2019

Test build #115818 has finished for PR 26477 at commit 62604e4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 26, 2019

Test build #115820 has finished for PR 26477 at commit b7d232f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 27, 2019

Test build #115821 has finished for PR 26477 at commit e5be2fe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 27, 2019

Test build #115822 has finished for PR 26477 at commit 51247b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assertEquals(fromString("数据砖头"), fromString("数据砖头").rpad(5, EMPTY_UTF8));
assertEquals(fromString("数据砖"), fromString("数据砖头").rpad(3, EMPTY_UTF8));
assertEquals(EMPTY_UTF8, EMPTY_UTF8.rpad(3, EMPTY_UTF8));
assertEquals(fromString(null), fromString("数据砖头").rpad(5, EMPTY_UTF8));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the previous behaivour was like PostgreSQL before but you propose to match it to Hive, Mysql and Oracle?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, I am less sure why we should necessarily change. The lpad and rpad implementations seem different per DBMS implementation. Spark's case at least has one reference and the current behaviour makes sense as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is reasonable to change this because as per description of this function
In case of empty pad string, the return value should be null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon, I feel reasonable because in other DBMS like hive, Mysql they handle this issue after they find this problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function behaviour varies per DBMS implementations and we have a reference as you described, PostgreSQL. Can you guys elaborate why it looks reasonable to you guys? I don't see a strong reason to change the current behaviour.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NULL is the better choice, that way we don't return unpadded data.
NULL makes sense with invalid input.
Please refer the JIRA how Hive handled
https://issues.apache.org/jira/browse/HIVE-15792
So as per me we should follow this as per Hive.

@07ARB 07ARB requested a review from HyukjinKwon December 27, 2019 05:49
@cloud-fan
Copy link
Contributor

I agree that NULL makes more sense in this case, but returning the original string is not a bad idea. I don't have a strong opinion on this one. cc @gatorsmile @viirya @dongjoon-hyun

@srowen
Copy link
Member

srowen commented Dec 27, 2019

Same here, I guess I wouldn't change it if it's not clear whether before or after is better.

@viirya
Copy link
Member

viirya commented Dec 27, 2019

I have the same option. This looks not a strong reason to change. We can change if it is a standard. Seems there are different implementation across different DBs.

@07ARB
Copy link
Contributor Author

07ARB commented Dec 28, 2019

Thank you all, for such a good discussion, If we are not finding strong reason to change old behaviours, then we can close this PR.
One more issue related to Lpad/Rpad, please review #27024

@HyukjinKwon
Copy link
Member

Closing this as discussed.

@maropu
Copy link
Member

maropu commented Dec 28, 2019

+1 for the close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants