-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add string.toByte() and string.toCodepoint() #13553
Conversation
@mppf - Would you mind reviewing this? |
var nbytes: c_int; | ||
var multibytes = localThis.buff: c_string; | ||
var maxbytes = localThis.len: ssize_t; | ||
qio_decode_char_buf(cp, nbytes, multibytes, maxbytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC, why isn't the return value of qio_decode_char_buf()
checked here or in the rest of the file? It looks like it has error cases that can leave nbytes
looking reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In expected operation with UTF8 strings (which is all this is supposed to handle), it will return an error code and set the codepoint to the unicode replacement character. That's reasonable in some ways. But beyond that, the design is that a string
can only hold valid UTF-8, so if there is an invalid UTF-8 character in it, that should have been raised as an error somewhere in the process of creating the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about checking the return code, but I ended up not doing it because of the reasons Michael gave, and I convinced myself that nbytes
would return what I wanted even in an error condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! I have some minor requests.
compiler/resolution/postFold.cpp
Outdated
@@ -410,7 +417,14 @@ static Expr* postFoldPrimop(CallExpr* call) { | |||
|
|||
INT_ASSERT(ie && ie->symbol()->isParameter() && get_int(ie, &val)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, get_int
is computing something that is used later in the function in the control flow, but it's being called from an INT_ASSERT
. That's bad because one might reasonably assume that the INT_ASSERT
can be removed at only the cost of less error checking.
Could you move the get_int
call to a separate line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move the
get_int
call to a separate line?
Done.
var nbytes: c_int; | ||
var multibytes = localThis.buff: c_string; | ||
var maxbytes = localThis.len: ssize_t; | ||
qio_decode_char_buf(cp, nbytes, multibytes, maxbytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In expected operation with UTF8 strings (which is all this is supposed to handle), it will return an error code and set the codepoint to the unicode replacement character. That's reasonable in some ways. But beyond that, the design is that a string
can only hold valid UTF-8, so if there is an invalid UTF-8 character in it, that should have been raised as an error somewhere in the process of creating the string.
@@ -0,0 +1,4 @@ | |||
#! /bin/bash | |||
|
|||
sed -e 's/^.*String\.chpl:[0-9]*: //' <$2 >$2.tmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that these error line numbers should be in the user's code. I believe you can achieve that with pragma "insert line file info" pragma "always propagate line file info"
on the function in question but this case might need some other handling to do with the primitive error handling. But couldn't we make the compiler code with the primitive raise only internal errors and make the user-facing compilerError
s occur in the modules code implementing ascii
and toByte
? It looks like the -1
case which is handled that way is giving the right line number for the error...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the user errors must remain within the primitive because we don't know a string's length in param Chapel code.
I tried the pragmas, but they don't seem to do anything in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't know a string's length in param Chapel code.
I don't think that's right.
primitive.cpp shows "string_length"
next to "ascii"
, and in fact it is implemented in postFold just above ascii.
And this program works for me:
param len = "aa".length;
var tup:len*int;
writeln(len);
writeln(tup);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's very strange. I can't use string.len
in param code, but it does accept string.length
which is a proc that references string.len
. I will see what I can do with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what's going on. There is a separate param string.length
that calls the primitive.
I was trying to avoid string.length
in case its meaning changes to count codepoints, which would break this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could call the primitive directly. It's likely that we'll need different primitives for the codepoint vs bytes cases, once we get the compiler working with those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fixed. Thanks. I will retest today and merge this weekend.
# --fast sets --no-checks and this test is explicitly checking that one | ||
# of our runtime checks is working correctly. There's been debate as to | ||
# whether tests like this should be suppressed or skipped. For now we | ||
# suppress with the argument that we're testing that --fast actually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this argument actually explains why you should suppressif it. I think you should skipif it. I think we should use suppressif for cases that we expect to be resolved on their own at some point (e.g. backend C compiler issues) and .futures for things that will only resolve with some effort on our part. But this case seems to be neither of those - I don't have any expectation that --fast
will ever stop throwing --no-checks
.
All that said it doesn't bother me if you want to keep this a .suppressif for some reason. I just don't understand the reason to do so from what is written here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't care which way it goes myself. I copied this suppressif file byte-for-byte from one of Brad's, which is how I decided to suppress instead of skip. I think I'll leave it as a suppression only because it came from Brad. I don't mind if the whole issue gets decided differently later, in which case several suppressif files in various parts of the tests will need to be changed to skipifs.
The deprecated
ascii()
function was replaced bystring.byte(1)
, which provided extra functionality by varying its argument, but looked a little heavy-weight for the purposes for whichascii()
was used. This change addsstring.toByte()
as a better direct replacement forascii()
(while still retaining the more generalstring.byte()
). This change also addsstring.toCodepoint()
as the codepoint analog.The key difference between
ascii()
andstring.toByte()
is that the latter will issue an error message if the string is not exactly one byte long. Likewise,string.toCodepoint()
will issue an error message if the string is not exactly one codepoint long.Some existing code needed a param version of
ascii()
, so a param version ofstring.toByte()
has been created as well. There has not yet been a need for a paramstring.toCodepoint()
, but that could be added in the future. (See https://github.com/Cray/chapel-private/issues/348 .)Existing Chapel code that had been changed from using
ascii()
tostring.byte(1)
is changed here to usestring.toByte()
wherever it was operating on a string that was required to be one byte long.Some tests have been added to fill in gaps.
At some point, the String module should be migrated to use HaltWrappers. However, HaltWrappers are not yet able to handle some of the error messages in String, so the migration has not occurred yet. Consequently, this change stays consistent with the rest of String by using
halt()
and leaves the migration to a future PR. (See https://github.com/Cray/chapel-private/issues/349 .)Note:
modules/packages/LAPACK.chpl
is a very large diff. The only change to that file is to change all occurrences of.byte(1)
to.toByte()
. These were all former uses ofascii()
.Passed full local and full GASNet testing on linux64.
Passed Cray XC testing (particularly LAPACK) as well as
--fast
.Retested after review.
Closes #13167
Closes https://github.com/Cray/chapel-private/issues/80