Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: add a test that builds with clang14 #1498

Merged
merged 4 commits into from
May 4, 2022

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented May 2, 2022

A few minor changes were needed to address new warnings.

clang14 warns if you use bitwise | or & in an if, but there are
some places where we do that purposely to avoid branches.

Also some batch shading related vectorized loops fail to vectorize
with clang14 -- though they worked fine with clang12 -- and so we
needed to mark those as "complex" SIMD loops. Is this a regression
with clang14?

Signed-off-by: Larry Gritz [email protected]

@@ -106,6 +106,10 @@ namespace // Unnamed
bool
are_op_results_always_implicitly_varying(ustring opname)
{
OIIO_PRAGMA_WARNING_PUSH
#if OIIO_CLANG_VERSION >= 140000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels pretty noisy in the source, why not just disable it via the command line flag -Wno-bitwise-instead-of-logical?
Every use of | or & should have been an explicit decision.

Copy link
Collaborator Author

@lgritz lgritz May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I could do that I suppose.

I wasn't sure of the merits of disabling the warning for the whole codebase, which of course would have the effect of NOT catching any times we accidentally used bitwise ops when we meant logical. That warning, for the vast majority of code, is not a bad idea. I think it serves a good purpose.

On the other hand, we never had the warning before, and now that we've turned it on, the only places it caught were instances where we purposely used the bitwise ops. So maybe it's trying to catch a mistake that we never make in practice, so it's always a false positive.

Let me mull this over (@fpsunflower or others, do you have an opinion?) but I think you have me leaning toward thinking that you are right and I should just disable the warning wholesale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the warning it self doesn't can't actually determine if you intended to skip the short-circuit behavior of logical || and &&, IMO it shouldn't be on by default. If tracking down a bug, maybe it would be nice to get such a report of all places bitwise was used to verify they were all intentional, but certainly not as part of a -Werror build configuration.
As a typical use is
if (ptr && ptr->foobar())
If that were just a bitwise &, then both sides will execute and the order is undefined:
if (ptr & ptr->foobar())
then a crash might occur. However, C++ has made it 37 years without this warning,
mostly because the call stack should lead them right to the offending line. However if not a simple invalid pointer some incorrect behavior of the "order undefined" and "both sides" execute behavior could happen.

The warning itself actually says
note: cast one or both operands to int to silence this warning
Cast to an int? But it was already a bool and adding additional casts just to make this warning happy doesn't make sense. Bitwise | and & on bools is legitimate, and not a cause for warning. Maybe there should be a -Wlook-for-common-typos flag that does this (and others like assignment inside if statement).

For performance code we often want to skip the short circuit and if an additional branch is generated is a performance degradation, so there might be a trade off between warning for a "potential" typo and performance.

Take a look at example
https://godbolt.org/z/bnKMKnn5K
We can see a jmp in the assembly, change MY_OR at the top to bitwise | and it goes away. Branch prediction penalties and additional management of SIMD masks and mask registers are real costs. So developer should choose || or | depending on their needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm usually in favor of adopting all warnings, but this one doesn't seem to be fully thought out. I'm fine disabling this one for now.

@@ -5933,7 +5933,7 @@ LLVMGEN (llvm_gen_gettextureinfo)
FuncSpec func_spec("get_textureinfo_uniform");
const char * funcName = rop.build_name(func_spec);

bool requires_binning = (!Filename.is_uniform()) | (!Dataname.is_uniform()) | use_coords;
bool requires_binning = (!Filename.is_uniform()) || (!Dataname.is_uniform()) || use_coords;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were all changed to bitwise from logical to avoid generating branching code. As this is during compilation, probably not measurable impact. But if we disable the check with "-Wno-bitwise-instead-of-logical" then these files would not even need to be in this pull request.

Copy link
Collaborator Author

@lgritz lgritz May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the guards (i.e. kept the bitwise ops) in all code paths that are called during shader execution or that are in code we are trying to vectorize.

There were a few instances -- including this one, which is where we're generating IR -- where the code is not trying to be vectorized, not called by the shaders themselves, and not especially performance sensitive. And in those circumstances, I reverted to the logical operations because that's the real meaning of this statement and there is nothing to gain from eliminating a single branch here.

I also looked at godbolt and see that whether && and || are turned into branchless & and | automatically by the C++ optimizer depends heavily on the specific compiler and version. Modern icc and clang are pretty good at it in many circumstances. gcc is not. So I think the use of bitwise ops is only important in areas that are extremely performance sensitive and we don't want to take any chances that the compiler will fail to optimize away the branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that the change for batched_llvm_gen.cpp and other places isn't performance critical. And that change is fine. Just noting as they were all bools it is somewhat equivalent when the compiler does automatically turn them into & and |, kinda shows that they are valid to be & and | to begin with.

Also a compiler "might" turn && and || are turned into branchless & and | automatically. Turning optimization off though probably won't turn && and || are turned into branchless & and |. So one might prefer to never generate the branch in the first place, to better ensure similar execution during debug. Also the example I posted, clearly a jmp is still generated with the || at O3, because it is contractually obligated to by language rules around short-circuit. And Clang might be good and that automatic conversion if it can prove no side effects (which is good feature), but other compilers or optimization levels may not be and still generate jmps.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to revert all the individual line-by-line pragmas in favor of just disabling this warning for all of liboslexec.

@@ -309,13 +309,13 @@ wide_transformc (ColorSystem cs, StringParam fromspace, StringParam tospace,
Wide<COLOR> wCrgb(bCrgb);
if (fromspace == STRING_PARAMS(RGB) || fromspace == STRING_PARAMS(rgb)
|| fromspace == STRING_PARAMS(linear) || fromspace == cs.colorspace()) {
OSL_OMP_PRAGMA(omp simd simdlen(__OSL_WIDTH))
OSL_OMP_COMPLEX_SIMD_LOOP(simdlen(__OSL_WIDTH))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not loose vectorization for clang 13 and below, you might want to conditionally use

#if OSL_CLANG_VERSION > 140000
OSL_OMP_COMPLEX_SIMD_LOOP(simdlen(__OSL_WIDTH))
#else
OSL_OMP_PRAGMA(omp simd simdlen(__OSL_WIDTH))
#endif

This does appear to be a regression in Clang14 of some kind. Its odd that these color operations, especially this super simple assignment loop, are having issues (where as much more complex noises and spline operations continue to vectorize). So I suspect it has something to do with the underlying Color3 class itself, of the organization/optimization of wide_transformc. Given that other functions in wide_opcolor.cpp are vectorizing fine using Color3, that leads me to think its wide_transformc itself.
I don't have an easy way to reproduce without moving to clang14 myself.

Some experiments to try:

  1. Change ColorSystem parameters to be const & vs. copy by value (this looks like it might be an existing issue):
wide_transformc (ColorSystem cs, ...

to

wide_transformc (const ColorSystem &cs, ...
  1. SIMD loop isolation. I have seen compilers try to do loop fusion and other magic that messes up otherwise valid vectorizing loops. We do this in wide_opmatrix.cpp utilizing a simple noinline helper functor called invoke. This "isolation" technique can also help with register allocation by limiting the scope of what the optimizer has to deal with in a single function:
namespace {

// invoke is helper to ensure a functor is compiled inside a
// actual function call vs. inlining.
// Useful for keeping the slow path from fusing
// with a streamlined SIMD loop or influencing
// register usage
template<typename FunctorT>
OSL_NOINLINE void
invoke(FunctorT f);

template<typename FunctorT>
void
invoke(FunctorT f)
{
    f();
}
}

then inside of wide_transformc wrap each simd loop in its own invoke call with a lambda expression

        invoke([=]() -> void {
            OSL_OMP_PRAGMA(omp simd simdlen(__OSL_WIDTH))
            for (int lane = 0; lane < __OSL_WIDTH; ++lane) {
                COLOR C = wInput[lane];
                wCrgb[lane] = C;
            }
        });

for the loops that use the ColorSystem argument cs might want to add that as a reference in the lamba capture clause.

        invoke([=,&cs]() -> void {

Copy link
Contributor

@AlexMWells AlexMWells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Once merged, might want to come back and experiment to see if vectorization can be restored for wide_opcolor.cpp with clang14.

lgritz added 3 commits May 3, 2022 21:22
A few minor changes were needed to address new warnings.

clang14 warns if you use bitwise `|` or `&` in an `if`, but there are
some places where we do that purposely to avoid branches.

Also some batch shading related vectorized loops fail to vectorize
with clang14 -- though they worked fine with clang12 -- and so we
needed to mark those as "complex" SIMD loops. Is this a regression
with clang14?

Signed-off-by: Larry Gritz <[email protected]>
Revert the invididual line-by-line pragmas, do it for the whole library.

Signed-off-by: Larry Gritz <[email protected]>
Signed-off-by: Larry Gritz <[email protected]>
@lgritz lgritz merged commit 10b7264 into AcademySoftwareFoundation:main May 4, 2022
@lgritz lgritz deleted the lg-clang14 branch May 4, 2022 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants