-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqrt_fx16_16_to_fx16_16 overflowing. #3
Comments
Thank you for testing out my code and reporting an issue. 0x61a80000 is a negative number since the bit 31 is 1. The computation is indeed overflowing but that is normal since we can only compute square roots of positive numbers. Do you need to compute the square root of an unsigned floating point ? |
It is positive. The bit 31 is 0. The limit seems to be at 0x4fffffff, or 20479.99998
|
It is weird at 0x64000000 good answers return again. |
You are right, sorry. 0x61A80000 is a positive number, the bit 31 is zero. You also found a problem. I ran a brute force test by computing all square root values from 0 to 0x7FFFFFFF and indeed I get invalid values when 0x40000000 is reached. Every value smaller than that yields a valid value. Here is the code I used in case we might need it later. You may start the loop a bit before 0x40000000 to check. // -- test computation
for (i = 0; i < 0x7FFFFFFF; i++) {
int32_t x = i, y = sqrt_fx16_16_to_fx16_16(x), e;
double xf, yf, ef;
xf = (double)i/k;
yf = (double)y/k;
ef = sqrt(xf);
e = (int32_t)(ef*k);
if ((i&0xFFFFF) == 0) {
printf("0x%08X -------------\n", i);
}
if (e != y) {
printf("x: 0x%08X (%f) y: 0x%08X (%f) e: 0x%08X (%f)\n", x, xf, y, yf, e, ef);
}
} |
|
I came to the same observation. We are missing just one bit. There is no problem when using int64_t variables, or by using the 64bit integer fx16_16_t sqrt_fx16_16_to_fx16_16(fx16_16_t v) {
return (fx16_16_t)sqrt_i64((int64_t)v << 16);
} Your suggestion works but is a hack and is slightly less performant. I'll update the code. |
Repo has been updated. Tell me if it works for you ? The tests have been updated to test all integers in the range 0 to 0x7FFFFFF included. |
Thanks. Another idea is to make a copy of |
I didn't understand. Could you send me a PR ? |
Never mind that. I am pretty satisfied with this:
except the obvious failure with v = 1, which can be looked up instead. Yes. there is a loss of precision, which is unavoidable for large arguments anyway. Or, perhaps, q can be initialized more cleverly to recover that bit. Corrected: initialized q so that it works well for for small odd v = {1, 3, 5, ...}. Edited: added one extra cycle and rounded the return value. Note that it seems to work for full range of 32-bit unsigned inputs including bit_31 = 1. |
FYI, this is what was adopted in FreeType, which is thoroughly fixed-point, It is used on special occasions only. |
Repo has been updated. Tell me if it works for you ? The tests have been updated to test all integers in the range 0 to 0x7FFFFFF included. |
If speed is important as it is probably the case for FreeType than it could be interesting to remove the The instructions if( r >= t ) {
r -= t;
q += b;
} may be replaced by m = (uint32_t)(r < t) - 1; // m is 0xFFFFFFFF when r >= t , 0 otherwise
r -= t & m;
q |= b & m; Unwinding the loop is another way to increase speed. It is then possible to add a special handling to avoid the overflow. Unfortunately, I don't have time to test and measure its performance. |
sqrt_fx16_16_to_fx16_16( 0x61a80000 )
returns0x81a54a
, which corresponds tosqrt(25000) = 129.6457
. Ouch!The correct answer is 158.1139.
The text was updated successfully, but these errors were encountered: