Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Char reinterpret is (kind of) broken #29181

Closed
StefanKarpinski opened this issue Sep 14, 2018 · 3 comments
Closed

Char reinterpret is (kind of) broken #29181

StefanKarpinski opened this issue Sep 14, 2018 · 3 comments
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@StefanKarpinski
Copy link
Member

julia> reinterpret(UInt32, reinterpret(Char, 0x00ff0000))
0x00000000

julia> u2c(u::UInt32) = reinterpret(Char, u)
u2c (generic function with 1 method)

julia> c2u(c::Char) = reinterpret(UInt32, c)
c2u (generic function with 1 method)

julia> u2c(0x00ff0000)
'\0': ASCII/Unicode U+0000 (category Cc: Other, control)

julia> c2u(u2c(0x00ff0000))
0x00000000

julia> u2u(u) = c2u(u2c(u))
u2u (generic function with 1 method)

julia> u2u(0x00ff0000)
0x00ff0000

julia> c2u(u2c(0x00ff0000))
0x00000000

The fortunate thing is that it doesn't seem to be broken in a way that affects any Char values that can be produced naturally—you can only produce the values for which this is broken through reinterpret of UInt32 values.

@StefanKarpinski StefanKarpinski added the bug Indicates an unexpected problem or unintended behavior label Sep 14, 2018
@chethega
Copy link
Contributor

I think the problem is weirder, i.e. looks like a wrong optimization somwhere? I cannot reproduce outside of the REPL, so far.

julia> f(x)=reinterpret(UInt32, reinterpret(Char, x));
julia> f(0x00ff0000)
0x00ff0000
julia> reinterpret(UInt32, reinterpret(Char, 0x00ff0000))
0x00000000
julia> x=0x00ff0000;
julia> @noinline to_i(x)=reinterpret(UInt32,x);
julia> @noinline to_c(x)=reinterpret(Char, x);
julia> to_i(to_c(x))
0x00000000
julia> g(x)=to_i(to_c(x));
julia> g(x)
0x00ff0000

@StefanKarpinski
Copy link
Member Author

It's actually the opposite of a mis-optimization: the optimized version is correct, it's the version that's told not to optimize that gives the wrong answer. @vtjnash said on Slack that he thought it might have to do with a bug in our boxed_char_cache except that we don't have one... except that we do. So that might be it.

@StefanKarpinski
Copy link
Member Author

Yup, that's it. Fix incoming.

StefanKarpinski added a commit that referenced this issue Sep 14, 2018
This code was assuming that character values only have bit-patterns
that decoding a string can produce, but of course `reinterpret` can
produce any bit pattern in a `Char` whatsoever. The fix doesn't use
that assumption and only uses the cache for actual ASCII characters.
StefanKarpinski added a commit that referenced this issue Sep 16, 2018
This code was assuming that character values only have bit-patterns
that decoding a string can produce, but of course `reinterpret` can
produce any bit pattern in a `Char` whatsoever. The fix doesn't use
that assumption and only uses the cache for actual ASCII characters.
StefanKarpinski added a commit that referenced this issue Sep 17, 2018
This code was assuming that character values only have bit-patterns
that decoding a string can produce, but of course `reinterpret` can
produce any bit pattern in a `Char` whatsoever. The fix doesn't use
that assumption and only uses the cache for actual ASCII characters.
KristofferC pushed a commit that referenced this issue Oct 6, 2018
…29192)

This code was assuming that character values only have bit-patterns
that decoding a string can produce, but of course `reinterpret` can
produce any bit pattern in a `Char` whatsoever. The fix doesn't use
that assumption and only uses the cache for actual ASCII characters.

(cherry picked from commit 88f74b7)
KristofferC pushed a commit that referenced this issue Oct 10, 2018
…29192)

This code was assuming that character values only have bit-patterns
that decoding a string can produce, but of course `reinterpret` can
produce any bit pattern in a `Char` whatsoever. The fix doesn't use
that assumption and only uses the cache for actual ASCII characters.

(cherry picked from commit 88f74b7)
KristofferC pushed a commit that referenced this issue Feb 11, 2019
…29192)

This code was assuming that character values only have bit-patterns
that decoding a string can produce, but of course `reinterpret` can
produce any bit pattern in a `Char` whatsoever. The fix doesn't use
that assumption and only uses the cache for actual ASCII characters.

(cherry picked from commit 88f74b7)
KristofferC pushed a commit that referenced this issue Feb 20, 2020
…29192)

This code was assuming that character values only have bit-patterns
that decoding a string can produce, but of course `reinterpret` can
produce any bit pattern in a `Char` whatsoever. The fix doesn't use
that assumption and only uses the cache for actual ASCII characters.

(cherry picked from commit 88f74b7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants