- 
                Notifications
    You must be signed in to change notification settings 
- Fork 83
fix 128bits ctlz intrinsincs UB #635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| It looks like the some CI jobs are failing on which is some code I didn't touch. Can my changes have some side effects ? Any hints ? | 
| Not sure it's related. I opened a new dummy PR to check if the tests pass, though. | 
| The CI passed, so the problem is in here. I suspect this is because you changed the return type: it used to return an array type and your change makes it return a  | 
| The return type of the  which will get an element in the array and cast it as result_type which is  Edit: formatting | 
| Ok, finally found the line pointing in my change  The problem I see is that the if branch should have been taken. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay: I've been very busy.
        
          
                src/intrinsic/mod.rs
              
                Outdated
          
        
      | let result = self.current_func() | ||
| .new_local(None, array_type, "count_loading_zeroes_results"); | ||
|  | ||
| // Algorithm from: https://stackoverflow.com/a/28433850/389119 updated to check for high 64bits being 0 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems your new algorithm significantly diverges from the one linked here.
If so, could you please replace this link with another pointing to the algorithm you used? If you can't, could you please add comments below to make this easier to understand?
| Hi, sorry for the delay, I missed your reply notification and got caught up by IRL stuff. I'm going to add a comment on the new algorithm which is: This adds an if to the generated code but we need to check for 0 before calling  I think there is also a problem of UB in intrinsic/simd.rs                 let index = bx.context.new_rvalue_from_long(bx.i32_type, i as i64);
                let value = bx.extract_element(vector, index).to_rvalue();
                let value_type = value.get_type();
                let element = if name == sym::simd_ctlz {
                    bx.count_leading_zeroes(value_type.get_size() as u64 * 8, value)
                } else {
                    bx.count_trailing_zeroes(value_type.get_size() as u64 * 8, value)
                };where 0 is not special cased like in intrinsic/mod.rs. So I'll move the 0 special case handling in  As found by @FractalFir, the same problems apply to trailing_zeros / cttz | 
1f5f68e    to
    68a69d8      
    Compare
  
    | There is still a diff in the handling of clz and ctz intrinsics in the      fn count_leading_zeroes(&mut self, width: u64, arg: RValue<'gcc>) -> RValue<'gcc> {
...
            if arg_type.is_uint(self.cx) {
                "__builtin_clz"
            }vs     fn count_trailing_zeroes(&mut self, _width: u64, arg: RValue<'gcc>) -> RValue<'gcc> {
...
            if arg_type.is_uchar(self.cx) || arg_type.is_ushort(self.cx) || arg_type.is_uint(self.cx) {
                // NOTE: we don't need to & 0xFF for uchar because the result is undefined on zero.
                ("__builtin_ctz", self.cx.uint_type)
            }They both have the same comment             // TODO(antoyo): write a new function Type::is_compatible_with(&Type) and use it here
            // instead of using is_uint().Should I add the  | 
| @antoyo Do you have any hints to reproduce the  | 
| 
 If you compile gcc yourself, you can apply this patch. Otherwise, you can try setting this variable to false, but I'm not sure if this will have the same behavior. | 
226574c    to
    7831340      
    Compare
  
    2f49779    to
    97aaccd      
    Compare
  
    | I finally made through it once I realized  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good!
Again, I'm very sorry for the long delay it took me to review this.
Thanks a lot for your contribution.
| } | ||
|  | ||
| fn count_trailing_zeroes(&mut self, _width: u64, arg: RValue<'gcc>) -> RValue<'gcc> { | ||
| fn count_trailing_zeroes(&mut self, width: u64, arg: RValue<'gcc>) -> RValue<'gcc> { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the code of this method is duplicated with count_leading_zeroes.
Would there be a way to refactor it to avoid this duplication?
| ctlz_then_block.end_with_jump(self.location, ctlz_after_block); | ||
|  | ||
| self.switch_to_block(ctlz_else_block); | ||
| let high_leading_zeroes = | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you mean:
| let high_leading_zeroes = | |
| let high_trailing_zeroes = | 
| .add_assignment(self.location, second_elem, second_value); | ||
| // __buildin_ctzll is UB when called with 0, so call it on the 64 low bits if they are not 0, | ||
| // else call it on the 64 high bits and add 64. In the else case, 64 high bits can't be 0 | ||
| // because arg is not 0. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please specify that arg is not 0 since we're in the case of count_trailing_zeroes_nonzero?
This would make this comment a bit clearer, in my opinion.
| let array_type = self.context.new_array_type(None, arg_type, 3); | ||
| // __buildin_clzll is UB when called with 0, so call it on the 64 high bits if they are not 0, | ||
| // else call it on the 64 low bits and add 64. In the else case, 64 low bits can't be 0 | ||
| // because arg is not 0. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, please specify that arg is not 0 because we're in count_leading_zeroes_nonzero.
| // TODO(antoyo): do the correct verification in libgccjit to avoid an error at the | ||
| // compilation stage. | ||
| let index = self.context.new_cast(self.location, index, self.i32_type); | ||
| let result = self.current_func() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also a lot duplication with count_leading_zeroes_nonzero here, but I'm not sure a refactoring would make it clearer since we need to use high first in one method and low first in the other.
So, I'll let you judge if a refactoring would help clarity here.
| } | ||
|  | ||
| fn count_leading_zeroes(&mut self, width: u64, arg: RValue<'gcc>) -> RValue<'gcc> { | ||
| let func = self.current_func(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to this method to describe what it does (return 0 if arg is zero, otherwise call the leading_zero builtin).
|  | ||
| fn count_trailing_zeroes(&mut self, _width: u64, arg: RValue<'gcc>) -> RValue<'gcc> { | ||
| fn count_trailing_zeroes(&mut self, width: u64, arg: RValue<'gcc>) -> RValue<'gcc> { | ||
| let func = self.current_func(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, please add a comment.
| self.switch_to_block(after_block); | ||
|  | ||
| result.to_rvalue() | ||
| sym::ctlz => self.count_leading_zeroes(width, args[0].immediate()), | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a UI test that tests this and now pass?
If not, could you please add a test in tests/run?
Fixes #604
This is the GIMPLE code generated