Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BC1 and BC4 rounding behavior doesn't match DirectXTex #17

Open
RunDevelopment opened this issue Jun 10, 2024 · 9 comments
Open

BC1 and BC4 rounding behavior doesn't match DirectXTex #17

RunDevelopment opened this issue Jun 10, 2024 · 9 comments

Comments

@RunDevelopment
Copy link

Hi! I'm currently implementing a new DDS decoder for the image crate and wanted to use bcdec_rs for decoding BC1-7 images. Unfortunately, bcdec_rs does not decode BC1-5 images correctly. Due to rounding errors, colors can be off by up to one for both BC1 and BC4.

Example

While the error is only a single bit, it is noticeable. In BC1, the error affects G differently than R and B. This can lead to situations where R is one more than it should be and R and B are one less than they should be, resulting in a noticeable green tint. Here's an example BC1 image that should have the color RGB=47 47 47 everywhere (this is also what image programs like Paint.net and Gimp will show). Using bcdec_rs to decode this image will yield the color RGB=46 48 46 instead.

BC1_UNORM_SRGB-47.zip

I also want to point out that these off-by-one errors aren't rare. I magnified the error of the following image, so we can see it:

Image:
image

Error:
image

BC1

Before I explain the cause of the issue, I quickly want to point out that this bug is not with the source translation of bcdec. The original bcdec C library has the same bug. bcdec_rs simply faithfully reproduces the bug.

The bug itself is actually quite simple: when calculating color2 and color3, bcdec is interpolating the already rounded 8-bit values of color0 and color1 instead of the original 5/6-bit values.

Using the 8-bit values for color0/1 is incorrect, because the conversion from 5/6 bits to 8 bits adds a bit of rounding error (e.g. 30 (5 bit) converted to 8 bit is 246.774 exactly but rounded 247) that is then passed along to color2/3 which are rounded again causing even more rounding error.

The fix is to do the interpolation with the original values and then convert to 8 bit. In code, this can be done like this:

let c0 = u16::from_le_bytes(compressed_block[0..2].try_into().unwrap());
let c1 = u16::from_le_bytes(compressed_block[2..4].try_into().unwrap());

// separate 565 colors
let r0_5 = (c0 >> 11) & 0x1F;
let g0_6 = (c0 >> 5) & 0x3F;
let b0_5 = c0 & 0x1F;

let r1_5 = (c1 >> 11) & 0x1F;
let g1_6 = (c1 >> 5) & 0x3F;
let b1_5 = c1 & 0x1F;

// Expand 565 ref colors to 888
let r0 = (r0_5 * 527 + 23) >> 6;
let g0 = (g0_6 * 259 + 33) >> 6;
let b0 = (b0_5 * 527 + 23) >> 6;
ref_colors[0] = [r0 as u8, g0 as u8, b0 as u8, 255u8];

let r1 = (r1_5 * 527 + 23) >> 6;
let g1 = (g1_6 * 259 + 33) >> 6;
let b1 = (b1_5 * 527 + 23) >> 6;
ref_colors[1] = [r1 as u8, g1 as u8, b1 as u8, 255u8];

if c0 > c1 || only_opaque_mode {
    // Standard BC1 mode (also BC3 color block uses ONLY this mode)
    // color_2 = 2/3*color_0 + 1/3*color_1
    // color_3 = 1/3*color_0 + 2/3*color_1
    let r = 2 * r0_5 + r1_5;
    let g = 2 * g0_6 + g1_6;
    let b = 2 * b0_5 + b1_5;
    let r = (r * 351 + 61) >> 7;
    let g = (g as u32 * 2763 + 1039) >> 11;
    let b = (b * 351 + 61) >> 7;
    ref_colors[2] = [r as u8, g as u8, b as u8, 255u8];

    let r = r0_5 + 2 * r1_5;
    let g = g0_6 + 2 * g1_6;
    let b = b0_5 + 2 * b1_5;
    let r = (r * 351 + 61) >> 7;
    let g = (g as u32 * 2763 + 1039) >> 11;
    let b = (b * 351 + 61) >> 7;
    ref_colors[3] = [r as u8, g as u8, b as u8, 255u8];
} else {
    // Quite rare BC1A mode
    // color_2 = 1/2*color_0 + 1/2*color_1;
    // color_3 = 0;
    let r = r0_5 + r1_5;
    let g = g0_6 + g1_6;
    let b = b0_5 + b1_5;
    let r = (r * 1053 + 125) >> 8;
    let g = (g as u32 * 4145 + 1019) >> 11;
    let b = (b * 1053 + 125) >> 8;
    ref_colors[2] = [r as u8, g as u8, b as u8, 255u8];

    ref_colors[3] = [0u8; 4];
}

In case you're curious about the crazy conversions like (r * 351 + 61) >> 7: they use the same trick as bcdec for its 5/6-bit number to 8-bit number conversion. E.g. (r * 351 + 61) >> 7 is equivalent to (r as f64 / (3 * 63) * 255).round() for values 0 <= r <= 3*63. I got all of these constants using a brute force script that verified that it correctly maps the entire range of inputs.

BC4

Here the bug is simpler: the rounding is wrong. E.g.

alpha[2] = (6 * alpha[0] + alpha[1] + 1) / 7;

That + 1 should have been + 3.

Similarly:

alpha[2] = (4 * alpha[0] + alpha[1] + 1) / 5;

That + 1 should have been + 2.

Why is +1 wrong for the /7 values? E.g. if the interpolated alpha value turns out to be 5, then 5/7 should be rounded to 1. But (5+1) / 7 == 0 because integer division is floor division.

In general, if you have 2 unsigned integers x and n and want to find the rounded division of x/n, then it can be calculated as (x + (n>>1)) / n. This is why the small number we have to add to the interpolated value is 3 for /7 and 2 for /5.


Would you like me to make a PR?

@ScanMountGoat
Copy link
Owner

RenderDoc on my PC is showing the colors for that DDS as 46, 47, 46 in RGB. The preview app on my MacBook Air shows 47, 47, 47. I don't believe the older compression modes require bit accurate implementations based on the discussion in iOrange/bcdec#12. Your solution looks reasonable, but I'm not sure if it's worth breaking the test suite for a small difference in pixel values.

@ScanMountGoat
Copy link
Owner

Hi! I'm currently implementing a new DDS decoder for the image crate and wanted to use bcdec_rs for decoding BC1-7 images.

I would get in contact with the image-rs maintainers if you haven't already to see if they have any opinions on how they want to approach DDS.

@RunDevelopment
Copy link
Author

Thanks for the quick response!

As for correctness and bit accuracy, I outlined my reasoning for what I see as correct in the BC1 issue I opened on the bcdec repo: iOrange/bcdec#15. To quote it here:

(me) I define "correct" here are doing all calculations in float32 and then converting the decoded float32 image to uint8 at the very end. I'm using this definition, because that's how DirectXTex is implemented, and it's how Paint.net and GIMP decode DDS images. So this definition of correct is widely-used in modern software.

I hope this makes it clear why I see the current behavior of bcdec_rs as incorrect.

I would get in contact with the image-rs maintainers if you haven't already to see if they have any opinions on how they want to approach DDS.

I already made a PR draft: image-rs/image#2258. bcdec_rs was recommended to me by them.

@ScanMountGoat
Copy link
Owner

I hope this makes it clear why I see the current behavior of bcdec_rs as incorrect.

There are multiple implementations of the older compression formats. The specifications for GPU implementations I can find only require perfect bit accuracy for bc6h and bc7. A difference of 1/255 is allowed for bc1.
https://fgiesen.wordpress.com/2021/10/04/gpu-bcn-decoding/
https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#19.5%20Block%20Compression%20Formats

bcdec_rs was recommended to me by them.

I would recommend just using bcdec_rs in its entirety in image-rs. It doesn't seem worth it to maintain a completely separate decoder implementation for legacy formats.

I would like bcdec_rs to match the C implementation due to how the test suite is implemented. If your change doesn't compromise the benchmarking performance, I may consider patching the C code in the sys crate. This would also require additional tests to make sure nothing else has regressed in terms of the output.

@RunDevelopment
Copy link
Author

A difference of 1/255 is allowed for bc1.

I think this is a weird thing to say. I understand that the spec may allow for some error to allow optimizations in use cases where perfect color accuracy isn't necessary, but isn't bcdec_rs is a general-purpose DDS decoder? As I see it, a general-purpose decoder can't just say "a bit of error is fine" and then leave the users to deal with it. This just means that bcdec_rs can't be used when color accuracy is required.

This is also why I cannot use bcdec_rs in its current form for BC1-5 in my PR for the image crate. My use case for the new DDS decoder requires accurate colors. If it isn't accurate, I won't be able to use it for the very thing I built it for.

I would recommend just using bcdec_rs in its entirety in image-rs. It doesn't seem worth it to maintain a completely separate decoder implementation for legacy formats.

Aside from color accuracy, there's also the problem that bcdec_rs doesn't support signed BC4 and BC5. If you want to, we can upstream my code and integrate it into bcdec_rs. Would that be something you'd be interested in?

If your change doesn't compromise the benchmarking performance, I may consider patching the C code in the sys crate.

That would be great.

As for performance: there should be any perf difference for BC4 since we just changed a constant. BC1 should be a tiny bit slower since we'll need 1 or 2 instructions more to calculate each channel of color2/3. Whether those additional instructions meaningfully impact performance needs to be measured though.

@ScanMountGoat
Copy link
Owner

isn't bcdec_rs is a general-purpose DDS decoder

That's the goal of image_dds. Not all DDS files use BCN compression and some use cases need BCN decoding/encoding without DDS. bcdec_rs is just a safe Rust port of the original C implementation.

there's also the problem that bcdec_rs doesn't support signed BC4 and BC5

This should be a separate issue. The signed formats are almost never used from my experience.

This is also why I cannot use bcdec_rs in its current form for BC1-5 in my PR for the image crate

I'm all for adding bit fiddling to get slightly more precision but not without the proper testing infrastructure first. I've outlined some ideas I may explore later in #4.

The image crate is designed around standard 2D images. DDS is meant store GPU textures and contains a lot of features that aren't supported by the image crate like 3D textures, cube maps, array layers, mipmaps, etc. Adding better decoding support to image-rs would be great, but the design of image-rs limits its usefulness as a DDS converter.

@RunDevelopment
Copy link
Author

isn't bcdec_rs is a general-purpose DDS decoder

That's the goal of image_dds.

I worded that badly. I meant to express that "bcdec_rs is a general-purpose BC block decoder", with focus on "general-purpose."

[...] but not without the proper testing infrastructure first. I've outlined some ideas I may explore later in #4.

The author of the bcdec C library just committed the fix for BC1 that I suggested here. Does this allow you to side-step the testing issue by using your existing fuzzing infrastructure instead?

The image crate is designed around standard 2D images. DDS is meant store GPU textures and contains a lot of features that aren't supported by the image crate like 3D textures, cube maps, array layers, mipmaps, etc. Adding better decoding support to image-rs would be great, but the design of image-rs limits its usefulness as a DDS converter.

Certainly. Even for single-surface images, there are pixel formats that can't be supported, like the *_TYPELESS formats and the non-normalized (s)int formats.

However, I see DDS support for the image crate as a best-effort implementation. Yes, we can't support all features of DDS, but we can try to support the features that can be supported. After all, the DDS decoder doesn't have to support all DDS features to be useful.

@ScanMountGoat
Copy link
Owner

ScanMountGoat commented Jun 14, 2024

Does this allow you to side-step the testing issue by using your existing fuzzing infrastructure instead?

Yes. I'll update bcndecode-sys and bdec_rs with the new changes soon.

I've also made good progress on implementing #4 for testing the decoding process end to end in image_dds with special test images for BC1-BC5. You can generate the DDS test files on this repo with cargo run -p bcn_test. This also makes it much easier to test the decoding and rounding behavior of a particular software or GPU implementation.

@ScanMountGoat
Copy link
Owner

bcdec_rs 0.1.2 is now available with the new rounding behavior.

@ScanMountGoat ScanMountGoat changed the title BC1 and BC4 blocks are not rounded correctly BC1 and BC4 blocks rounded behavior doesn't match DirectXTex Sep 5, 2024
@ScanMountGoat ScanMountGoat changed the title BC1 and BC4 blocks rounded behavior doesn't match DirectXTex BC1 and BC4 rounding behavior doesn't match DirectXTex Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants