[QUESTION] Color blend #1172

feverzsj · 2022-01-17T11:03:18Z

feverzsj
Jan 17, 2022

Color blend is quite common in image processing and fits SIMD well. For example, to blend two premultiplied rgba8 images, pixels will have their channels casted and deinterleaved as uint16_t vectors, then actual blend operations are applied, and finally, channels are casted, interleaved and stored as original pixels. You may take a look at skia's implementation.

Blend operations are mostly trivial for eve, besides interleave/deinterleave. Is anything I missed?

Answered by DenisYaroshevskiy

Feb 1, 2022

I think this is a good as it going to get any time soon: https://godbolt.org/z/1rd1E6nE1
Feel free to experiment and report if you find better ways of doing it.

View full answer

DenisYaroshevskiy · 2022-01-17T11:32:06Z

DenisYaroshevskiy
Jan 17, 2022
Collaborator

Tbh, the code is quite large - any chance you can sketch a scalar version to have a look at?

For most of our algorithms so far we assume the uniformity of data (so - SoA and not AoS how images are layed out).

We don't have shuffle for 2 registers implemented as a general utility yet, though we do have interleave
https://godbolt.org/z/PYYj63q3G

interleave is added recently, probably not in package managers yet, that's solvable.

We were definitely going to add support for that at some point, if you can tell me what exactly it is you need - I can have a look at prioritising that (no promises).

0 replies

DenisYaroshevskiy · 2022-01-17T11:43:01Z

DenisYaroshevskiy
Jan 17, 2022
Collaborator

Wait! I goodled, it seems like you want to just have a formula for your resulting rgba from 2 input rgba. I'll write you a possible implementation a bit latet

0 replies

DenisYaroshevskiy · 2022-01-17T12:12:33Z

DenisYaroshevskiy
Jan 17, 2022
Collaborator

Generated code for shuffles is pretty terrible but that's an easy fix.

Can you have a look and see if this is sort of what you wanted? https://godbolt.org/z/fGd6Wrcoa

0 replies

DenisYaroshevskiy · 2022-01-17T13:14:37Z

DenisYaroshevskiy
Jan 17, 2022
Collaborator

Some of the bad codegen is me messing up the pattern.
Some - multiplication of bytes.

What's the formula you actually need?

0 replies

jfalcou · 2022-01-17T13:27:53Z

jfalcou
Jan 17, 2022
Maintainer

Updated code with proper masking + shuffle pattern:
https://godbolt.org/z/zq8fY16oT

The "bad" codegen is due to a * a_alpha in uint8 having no proper codegen anyway.
I suspect you want to convert the 8 bytes data to somethign else before doing any computation ?

Similar code working on 32bits pixel data:
https://godbolt.org/z/9edv6rGb9

Main loop is:

.L4:
        vpermd  ymm1, ymm3, YMMWORD PTR [rcx+rax]
        vpermd  ymm0, ymm3, YMMWORD PTR [rdi+rax]
        vpaddd  ymm2, ymm0, ymm1
        vpmulld ymm0, ymm0, YMMWORD PTR [rdi+rax]
        vpmulld ymm1, ymm1, YMMWORD PTR [rcx+rax]
        vpaddd  ymm0, ymm0, ymm1
        vpsrld  ymm2, ymm2, 1
        vpblendvb       ymm0, ymm0, ymm2, ymm4
        vmovdqu YMMWORD PTR [r8+rax], ymm0
        add     rax, 32
        cmp     rax, rdx
        jne     .L4

To get further than that, we need the actual formula on the pixels/alpha channels to make a fair comparison.

0 replies

feverzsj · 2022-01-17T13:41:37Z

feverzsj
Jan 17, 2022
Author

Updated code with proper masking + shuffle pattern:
https://godbolt.org/z/zq8fY16oT

yes, it's much better, though I'm expecting each channel gets it's own vector, as not all channel uses same formula.

The "bad" codegen is due to a * a_alpha in uint8 having no proper codegen anyway.
I suspect you want to convert the 8 bytes data to somethign else before doing any computation ?

yes, uint8_t channle should be casted to uint16_t before arithmetic operations to avoid overflow.

To get further than that, we need the actual formula on the pixels/alpha channels to make a fair comparison.

Skia has implementation for both scalar and various simd arch. load4()/store4() to cast/interlace/deinterlace pixels. BLEND_MODEs contain actual blend operation/formula for each channel. Most functions are defined for both uint16_t(U16) and float(F) vectors.

0 replies

DenisYaroshevskiy · 2022-01-17T14:18:25Z

DenisYaroshevskiy
Jan 17, 2022
Collaborator

On a quick glance we failed to see a different formula between colors. Can you show us one?

0 replies

feverzsj · 2022-01-17T15:06:40Z

feverzsj
Jan 17, 2022
Author

On a quick glance we failed to see a different formula between colors. Can you show us one?

These blend mode are relatively rare, and skia seems not implement them directly. One example is tangent normal map, which applies slightly different formula on blue.

But, any blending involving hue/saturation/lightness requires convert rgb to hsl/hsv, which in turn requires separated channel vectors. For example, saturation blend, blend source pixel's saturation into dest. Skia uses separated stages for them(set_sat/set_lum). To effectively chain them together, separated channels are required.

0 replies

jfalcou · 2022-01-17T15:42:33Z

jfalcou
Jan 17, 2022
Maintainer

OK, so I took one random blend from skia, updated the code with conversion.
https://godbolt.org/z/W8jvGsT6M

It looks OK. There is one bug I fixed locally that makes >> worse on AVX2. Once I push it to main, this is the expected asm for the innermost loop:

.L4:
        vpmovzxbw       ymm1, XMMWORD PTR [rcx+rax]
        vpmovzxbw       ymm7, XMMWORD PTR [rdx+rax]
        vpshufb ymm6, ymm1, ymm2
        vpshufb ymm0, ymm7, ymm2
        vpmullw ymm1, ymm1, ymm6
        vpsubw  ymm0, ymm3, ymm0
        vpmullw ymm0, ymm0, ymm7
        vpaddw  ymm1, ymm1, ymm3
        vpaddw  ymm0, ymm0, ymm1
        vpsrlw  ymm0, ymm0, 8
        vpblendvb       ymm0, ymm0, ymm6, ymm5
        vpand   ymm0, ymm4, ymm0
        vmovdqa xmm1, xmm0
        vextracti128    xmm0, ymm0, 0x1
        vpackuswb       xmm0, xmm1, xmm0
        vmovdqu XMMWORD PTR [r8+rax], xmm0
        add     rax, 16
        cmp     rdi, rax
        jne     .L4

I'll ping this issue as soon as the fix for operator>> is up.

0 replies

DenisYaroshevskiy · 2022-01-17T15:57:26Z

DenisYaroshevskiy
Jan 17, 2022
Collaborator

As far as doing complex operations in interleaved format:
Are you sure that deinterleaving and storing deinterleaved data won't be better?

We have a very solid support for soa and conversion I believe is autovrctorized

0 replies

feverzsj · 2022-01-17T16:17:04Z

feverzsj
Jan 17, 2022
Author

As far as doing complex operations in interleaved format:
Are you sure that deinterleaving and storing deinterleaved data won't be better?

Performance wise, deinterleaved data would be faster, but, as most existing libs/frameworks only accept interleaved data, deinterleaved data is very inconvenient. So deinterleave/interleave on the fly would be the best of both worlds.

0 replies

DenisYaroshevskiy · 2022-01-17T16:40:22Z

DenisYaroshevskiy
Jan 17, 2022
Collaborator

I'm not so sure about that - you pay a really big cost at least in terms of parallelism (assuming the shuffles are free which they are not) for interleaved data.
Anyways, we can probably provide a deinterleave shuffle in some foreseeable future but as far as smth like "deinterleave_view" - not so sure.
Do you know if the library works for you in other aspects: we do not work with MSVC (clang-cl is a maybe) and no wasm support at least for now. C++20 is a requirement.

0 replies

feverzsj · 2022-01-17T17:17:34Z

feverzsj
Jan 17, 2022
Author

I'm not so sure about that - you pay a really big cost at least in terms of parallelism (assuming the shuffles are free which they are not) for interleaved data.

as image data could be very large, ram access will finally become a bottleneck. The shuffles are actually not much of concern.

Anyways, we can probably provide a deinterleave shuffle in some foreseeable future but as far as smth like "deinterleave_view" - not so sure.

that will be good enough.

Do you know if the library works for you in other aspects: we do not work with MSVC (clang-cl is a maybe) and no wasm support at least for now. C++20 is a requirement.

I've tried it on latest msvc, tons of errors and even ICEs. Mostly because msvc has a rather poor implementation of c++ standard. If you really want to support msvc, you should test on it from very beginning. As for now, it's kinda too later.

clang-cl may pass compilation if used with libc++ (msvc's std lib is also quite error-prone for c++20).

For now, I'd mostly use eve::wide as a portable vector type, something like clang's ext_vector_type. And do rather complicated things using intrinsics.

0 replies

jfalcou · 2022-01-17T22:00:38Z

jfalcou
Jan 17, 2022
Maintainer

Let moves this to discussion

0 replies

DenisYaroshevskiy · 2022-01-26T18:02:28Z

DenisYaroshevskiy
Jan 26, 2022
Collaborator

FYI: there is an implementation now but i'm not too happy with code-gen: #1206

0 replies

DenisYaroshevskiy · 2022-02-01T10:29:38Z

DenisYaroshevskiy
Feb 1, 2022
Collaborator

I think this is a good as it going to get any time soon: https://godbolt.org/z/1rd1E6nE1
Feel free to experiment and report if you find better ways of doing it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Color blend #1172

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 16 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

[QUESTION] Color blend #1172

feverzsj Jan 17, 2022

Replies: 16 comments

DenisYaroshevskiy Jan 17, 2022 Collaborator

DenisYaroshevskiy Jan 17, 2022 Collaborator

DenisYaroshevskiy Jan 17, 2022 Collaborator

DenisYaroshevskiy Jan 17, 2022 Collaborator

jfalcou Jan 17, 2022 Maintainer

feverzsj Jan 17, 2022 Author

DenisYaroshevskiy Jan 17, 2022 Collaborator

feverzsj Jan 17, 2022 Author

jfalcou Jan 17, 2022 Maintainer

DenisYaroshevskiy Jan 17, 2022 Collaborator

feverzsj Jan 17, 2022 Author

DenisYaroshevskiy Jan 17, 2022 Collaborator

feverzsj Jan 17, 2022 Author

jfalcou Jan 17, 2022 Maintainer

DenisYaroshevskiy Jan 26, 2022 Collaborator

DenisYaroshevskiy Feb 1, 2022 Collaborator

feverzsj
Jan 17, 2022

DenisYaroshevskiy
Jan 17, 2022
Collaborator

DenisYaroshevskiy
Jan 17, 2022
Collaborator

DenisYaroshevskiy
Jan 17, 2022
Collaborator

DenisYaroshevskiy
Jan 17, 2022
Collaborator

jfalcou
Jan 17, 2022
Maintainer

feverzsj
Jan 17, 2022
Author

DenisYaroshevskiy
Jan 17, 2022
Collaborator

feverzsj
Jan 17, 2022
Author

jfalcou
Jan 17, 2022
Maintainer

DenisYaroshevskiy
Jan 17, 2022
Collaborator

feverzsj
Jan 17, 2022
Author

DenisYaroshevskiy
Jan 17, 2022
Collaborator

feverzsj
Jan 17, 2022
Author

jfalcou
Jan 17, 2022
Maintainer

DenisYaroshevskiy
Jan 26, 2022
Collaborator

DenisYaroshevskiy
Feb 1, 2022
Collaborator