-
-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add built-in functions UNICODE_CHAR and UNICODE_VAL to convert between Unicode code point and character #6798
Comments
It would be nice if inside UNICODE_VAL worked a little more clever than any charset -> unicode -> UTF-8 -> unicode -> take the first codepoint... |
Exactly what do you mean with "a little more clever"? |
At least "any charset -> unicode -> take the first codepoint". Best of all would be to take only one character from source string but taking into account surrogate pairs. |
It would also be good if we have some form of string supporting Unicode escaped character numbers. If we do that with a function, that function may transform things at compile time for fixed strings. BTW idea of functions doing things at compile time may be done for |
That sounds good to me as well. I'm not sure how you want to account for surrogate pairs, because you can either resolve the code point of the individual surrogate, or the code point of the combined surrogates. The last one might be 'more correct', but might be a bit of a hassle. |
For that we would need to add support for Unicode string literals (see 5.3 <literal>, <Unicode character string literal> in the SQL:2016-2 standard).
That would be an interesting optimization, but I think that should be done separately outside the scope of this request. |
Firebird considers a surrogate pair as a single character in |
Ok, then |
…o convert between Unicode code point and character.
Did you test it with 1GB BLOB? |
Currently, Firebird has
ASCII_CHAR
andASCII_VAL
which allows conversion between ASCII code point and ASCII characters. It would be helpful to have equivalent functionsUNICODE_CHAR
andUNICODE_VAL
to convert between Unicode code points andCHAR(1) CHARACTER SET UTF8
characters.The input of
UNICODE_CHAR
would be an integer value in the range of 0x00 and 0x10FFFF and the result would be aCHAR(1) CHARACTER SET UTF8
with the equivalent character.The input of
UNICODE_VAL
would be any string type (including blobs) with character set UTF8 (character strings of other character sets should be converted to UTF8), and returns the Unicode code point of the first character of the string.The text was updated successfully, but these errors were encountered: