Skip to content

Fix translate doesn't work on CHAR values#14478

Closed
albericgenius wants to merge 2 commits intotrinodb:masterfrom
albericgenius:trinoTranslate
Closed

Fix translate doesn't work on CHAR values#14478
albericgenius wants to merge 2 commits intotrinodb:masterfrom
albericgenius:trinoTranslate

Conversation

@albericgenius
Copy link
Copy Markdown
Contributor

@albericgenius albericgenius commented Oct 5, 2022

Description

Fix #14038
Fix translate doesn't work on CHAR values

This PR is used to proposal the solution, there are some test cases not passed.
if the direction is correct, I will fix the test cases asap.

Non-technical explanation

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`14038`)

@albericgenius albericgenius requested a review from findepi October 5, 2022 14:30
@cla-bot cla-bot bot added the cla-signed label Oct 5, 2022
@albericgenius albericgenius force-pushed the trinoTranslate branch 2 times, most recently from 2f71209 to c86d9d6 Compare October 12, 2022 08:21
@albericgenius
Copy link
Copy Markdown
Contributor Author

@findepi if we need to replace the deprecated assertFunction by assertThat(assertions.function in TestStringFunctions.java.
If there is open issue for that, could you help to assign to me.
Thanks

@findepi
Copy link
Copy Markdown
Member

findepi commented Oct 12, 2022

if we need to replace the deprecated assertFunction by assertThat(assertions.function in TestStringFunctions.java.
If there is open issue for that, could you help to assign to me.

@martint might be the best person to speak to this

@albericgenius
Copy link
Copy Markdown
Contributor Author

commit to fix the CI issue.

@martint
Copy link
Copy Markdown
Member

martint commented Oct 12, 2022

@albericgenius, there's no issue for it, but feel free to do it if you want and have time. There are dozens of classes that still rely on the deprecated assertFunction and any help to migrate those is appreciated. Take a look at #14528 for some examples.

@albericgenius
Copy link
Copy Markdown
Contributor Author

albericgenius commented Oct 13, 2022

@albericgenius, there's no issue for it, but feel free to do it if you want and have time. There are dozens of classes that still rely on the deprecated assertFunction and any help to migrate those is appreciated. Take a look at #14528 for some examples.

@martint I have a look at #14528, Thanks for sharing.I have time and interested in Trino, but sometime i do not know where I could start. please feel free to assign the task to me.

@findepi I have rebased the lasted code, but there is still one Product test failed because of "Test execution exceeded timeout of 2.00h". Which is not relative to my commit(I think :) ). I just added a translate function for char.

I have changed to use assertFunction in the test, when we migrate to new assertions, i will do it later.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

translateChar

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add cases with trailing spaces in the result

        assertFunction("translate(CHAR 'abc', 'c', ' ')", VARCHAR, "ab ");
        assertFunction("translate(CHAR 'abc ', 'c', 'z')", VARCHAR, "abz ");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, add tests where the translation strings contain more than one character.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid unrelated changes

@findepi
Copy link
Copy Markdown
Member

findepi commented Oct 13, 2022

Code looks generally OK, but per #14038 (comment) i think the proper fix would be to do #9031.

@albericgenius albericgenius force-pushed the trinoTranslate branch 2 times, most recently from 37b2781 to 2ffd87d Compare October 13, 2022 13:50
@albericgenius
Copy link
Copy Markdown
Contributor Author

albericgenius commented Oct 14, 2022

Code looks generally OK, but per #14038 (comment) i think the proper fix would be to do #9031.

@findepi i agree with you, and I have tried to fix via char/varchar coercion in the beginning. But seems it affected a lot of functions, so i go this way to follow the same logic as charSubstring, castCharToCodePoints and concat.

For my opinion, at least it make sense for translate(CHAR 'a bc ', CHAR ' ', CHAR '_')

There are 47 checks cancelled, I will rebase to fix this.

Make pr ready to merge first. :)

@findepi
Copy link
Copy Markdown
Member

findepi commented Oct 14, 2022

I have tried to fix via char/varchar coercion in the beginning. But seems it affected a lot of functions,

yes, this is not easy

@albericgenius
Copy link
Copy Markdown
Contributor Author

I have tried to fix via char/varchar coercion in the beginning. But seems it affected a lot of functions,

yes, this is not easy

Hi @findepi
I try to work on #9031 as following. I have went though EXACT and IMPLICIT_COERCION resolveFunction logic.

+ return Optional.of(createVarcharType(((CharType) sourceType).getLength())); in coerceTypeBase function
+ use the min length for bindings. 

But for typeCoercion.getCommonSuperType, I do not know how to fit all requirement(all existed functions). So i probably track on #9031 and learn how to address this issue.

For current commit, add translateChar, for my side, it is good to merge, because for the case of translate(CHAR 'a bc ', CHAR ' ', CHAR '_'), translateChar will be hit by EXACT resolveFunction first, it will be more efficient compare to IMPLICIT_COERCION. What is your opinion?

@findepi findepi requested a review from martint October 17, 2022 08:36
@albericgenius
Copy link
Copy Markdown
Contributor Author

@martint sorry to bother again, I think this fix is good for translate(CHAR 'a bc ', CHAR ' ', CHAR '_') case, so still need your help to review this.

Copy link
Copy Markdown
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an ok workaround until the underlying issue is fixed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, add tests where the translation strings contain more than one character.

@albericgenius
Copy link
Copy Markdown
Contributor Author

@martint :) , I addressed the code style issues, Thanks for share this for me, if there any other issue, I will update as soon as possible.

@martint
Copy link
Copy Markdown
Member

martint commented Jan 14, 2023

Sorry for the late reply. Can you please add a few tests with various combinations of CHAR and VARCHAR? The existing tests only cover all values being of the same type or only one value being CHAR.

For instance, we should test translate(CHAR, CHAR, VARCHAR), translate(CHAR, VARCHAR, VARCHAR), translate(VARCHAR, CHAR, VARCHAR), and so on.

@zhaner08
Copy link
Copy Markdown
Contributor

zhaner08 commented Mar 21, 2023

Sorry for the late reply. Can you please add a few tests with various combinations of CHAR and VARCHAR? The existing tests only cover all values being of the same type or only one value being CHAR.

For instance, we should test translate(CHAR, CHAR, VARCHAR), translate(CHAR, VARCHAR, VARCHAR), translate(VARCHAR, CHAR, VARCHAR), and so on.

Just curious, by testing do you mean Trino wants to support those permutations as well or those tests should be added to ensure other permutations are not supported?

I understand the rational that input should be the same type as output, however the public document describe the inputs to be string instead of particular char/varchar and schemas and queries have to be updated to be compatible unless auto coercion is happening.
Returns the source string translated by replacing characters found in the from string with the corresponding characters in the to string

@albericgenius
Copy link
Copy Markdown
Contributor Author

@findepi @martint you are right, when char 'z', the z.intValue() is 1, when varchar 'z', the z.intValue() is 65536. it have not resolved the root cause.

@mosabua
Copy link
Copy Markdown
Member

mosabua commented Jan 12, 2024

👋 @albericgenius @findepi @martint - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 4, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Sep 4, 2024
@mosabua
Copy link
Copy Markdown
Member

mosabua commented Sep 10, 2024

@martint and @findepi .. how do you want to proceed here with @albericgenius

@github-actions github-actions bot removed the stale label Sep 10, 2024
@github-actions
Copy link
Copy Markdown

github-actions bot commented Oct 1, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Oct 1, 2024
@github-actions
Copy link
Copy Markdown

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

translate doesn't work on CHAR values

5 participants