Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10 bit per channel proposal #357

Open
Myndex opened this issue Oct 24, 2023 · 13 comments
Open

10 bit per channel proposal #357

Myndex opened this issue Oct 24, 2023 · 13 comments

Comments

@Myndex
Copy link
Member

Myndex commented Oct 24, 2023

10bit-png

Proposed: TENB (TeNB interim)

A proposal for a compression efficient 10bit variant of png

While it is possible to use the sBIT chunk to put a 10bit per channel image into a 16bit container.....

....More efficient coding of 10-bit (and 16-bit) words requires a different coding technique, but that is a totally different (and more complex) issue IMHO....

This problem spawned an idea.....

Not your granddad's DPX

When I stumbled onto this thread, first thought was the existing three channels of 10bit into four bytes format of DPX, but I don't think that fits well into png.

The Alpha Wolf Shares with the Pack

But if there is a desire to conserve bandwidth, it occurred to me that the color type 6 8bit rgba png, might easilly be modified so the two LSbits of each 10bit channel are mapped onto the 6 LSbs of the alpha channel, and the 2 MSbs of the alpha could still be used for a one or two bit alpha.

A two bit alpha could be combined with the tRNS chunk to have 4 indexed transparency values (though that may cause compatibility issues), otherwise, 0b00 = 0% opaque, 0b01 = 33%, 0b10 = 66%, 0b11 = 100% opaque.

Fall Backwards (compatibility)?

The next question is, is there a configuration where a decoder/viewer that was not capable of handling this segmented10bit format, and just discarded the bits in the alpha as if fully opaque? In this case, the LSbs would be truncated, and while truncation is a poor way to handle down sampling, and does have artifacts, the image would still be reasonably viewable.

The sBIT chunk provides the way to make this happen, to mask the LSbs in the alpha and also show the 2 MSbs of transparency, so a 10 bit image could display with a reasonable fallback in a naïve/legacy viewer.

Setting sBIT to 8 8 8 2, then current decoders/viewers should display the truncated-to-8 image okay, and with the alpha sBIT at 2 bits, only the two MSbs would be used, with the 6 LSb rgb bits being hidden.

Virtual Signaling

As per the graphic below, the IHDR chunk would indicate 8 bit and color type 6, for fallback compatibility. So how to tell the decoder we're a segmented 10bit image? We use a tEXt chunk with one string that says segmented10bit to signal the format, and this maintains backwards compatibility.

10 bit png a visual info graphic that shows the data arrangement for a 10 bit PNG, and the chunks that would be involved

10 bit png file pixel structure oct29-2023 a visual info graphic that shows the data arrangement for a 10 bit PNG, and the chunks that would be involved

Advantages

  1. A 10bit png format with all the advantages of png, but a bit depth to match many current video and HDR formats.

  2. Should compress similarly to an 8bit RGBA png. First three bytes are not unlike an 8 bit image, then the one LSb/alpha byte, which might not compress as small as a typical alpha, but overall this scheme should be substantially more efficient than 10-in-16.

  3. So long as the decoder supports sBIT, this should be backwards compatible, with the caveat that images with truncated LSbs may have artifacts.

  4. And finally, though tests need to be run, this seems like an efficient way to handle 10bit images as far as the compression and total data size is concerned.

    • IMO it has the potential for use in an image-sequence stream.

I started a repo to commence work on this if there is interest.

Thank you for reading.

Andrew Somers
Director of Research
Inclusive Reading Technologies, Inc.

@Myndex
Copy link
Member Author

Myndex commented Oct 25, 2023

A couple thoughts:

  1. A reason for wanting backwards compatibility for viewing with legacy decoders, is the usefulness for OSes/file systems for seeing image content at file system level. Perhaps more useful here than for backwards compatibility with user agents or content, where best practice would still be progressive enhancement, serving a proper 8 bit image unless a 10bit viewer was available.

  2. Instead of using a tEXT chunk with a string ID, it is probably better to use a new custom chunk for the purpose. A custom chunk should be ignored by a legacy viewer, and thus still maintain backwards compatibility.

    • Proposed: TENB .
    • A custom chunk for signaling 10bit mode is expected to be more robust & also permits a terse ID, possibly with option flags for how the data is segmented into the alpha side-car.
  3. Tests need to be done, but it occurs to me that some images might do better by segmenting bits other than the LSbs over to the alpha side-car.

    • For some image types where the one or two MSbs are 0 (or even 00) for the entire image, the encoder could choose to segment off the one MSb and the one LSb, (or the two MSbs), to improve the 8bit fallback.
    • I.e. a dynamic choice per image as to which bits are best to segment off, with the flags as to which bits are segmented off held in the TENB chunk.
    • The thinking being that many 10bit images make no use of one or even two MSbs, but as this is only to optimize the quality of the fallback, it may be of limited utility, and the added complexity may not be warranted here.

@ProgramMax
Copy link
Collaborator

Something I personally would like to add is both color models beyond RGB and non-color channels.
For example, it would be nice if a PNG could state that it carries normal map data rather than RGB which the app must know isn't actually RGB. This feels like it aligns with what you propose.

I don't know if others want to add support for this. We haven't discussed it yet.
It would likely be up for discussion in Fourth Edition. But anything that breaks current PNG decoders is out of scope. So we would probably need to have a discussion about whether this counts as breaking.

@Myndex
Copy link
Member Author

Myndex commented Oct 27, 2023

Hi @ProgramMax

Sure, normals, motion vectors, 3 channel 10bit alphas, any of the independent lighting passes, zdepth...

So, in #3 of my second post, I indicate how the TENB (or TeNB) chunk could have flags for how the two bonus bits are segmented off, such as the one MSb and the one LSb. In the case of normals, motion vectors, zdepth, etc etc, then that would likely be the preferred scheme, as when those data types are put in an RGB container, 0 is mapped to the middle.

For an 8 bit container, that's 128, but for 10 bit it's 512. So for that data type, we might prefer to take one top and one bottom bit, then for the legacy 8 bit decoders truncating, the center would be at 128, so would look like any other normals map when viewed as the truncated 8bit. Though if a CGI program was to read it in as if an 8bit normal there may be unexpected results.

But as far as an image viewer, should be no different than using those channels for normals and such now, or motiono vectors, or whatnot. As I mentioned, I'm thinking the sBIT is the way to make this concept backwards compatible for viewing.

Type ID Tho

Rereading your post, perhaps you're suggesting more about having an ID chunk? For that, tEXT should work, or exif? though I suppose a dedicated type-ID chunk might lead the way to a more common interchange file standard for things like normals (which can be different for different apps/engines/etc.)

The next question is, does any app/engine use 10 bit normals, or the others? I can't remember ever doing so... Even if the image data is 10 bit, it's not uncommon to pair with an 8 bit alpha tiff for instance.

10bit is popular for transport & streaming. It was, once upon a time, popular for intermediates in film work, though EXR (16 bit half float) is a better route for post and VFX work, and finishing for that matter. EXF uses float for image and alpha data, is inherently linear (gamma 1.0) and can handle n-channels of data, including int data (you can mix 16bit fload, 32 bit float, and 32bit unsigned int, etc etc...)

On the otherhand, 10 bit per chan is a significant improvement in sample size for a pixel, at minimal cost if you don't need float or a full alpha (clearly cheaper than 16bit in a few ways).

@ProgramMax
Copy link
Collaborator

Right. My thought is if someone opens a PNG which holds normal for a 3D model, the image data itself isn't terribly useful. They aren't seeing a real image in the usual sense. If we completely swapped the "red" and "blue" channels, would they care? Probably not. So we could completely change how that appears without it feeling broken suddenly.

My thought is to add a new chunk (not having some magic text inside tEXT) that specified what each channel of data represents. This chunk could be marked as required-to-view. And conforming image viewers will no longer show the data as if it was plain old RGB. I feel like this is the most correct approach. But this would mean an image which was previously viewable is no longer viewable. And maybe that counts as breaking the viewer? My vote is no, since the data wasn't meant to be viewed as RGB anyway.

If we're able to specify what each channel's meaning is, it would make sense to also specify each channel's data format (EG 10-bit int).

The details of how the data formats are packed are more along your original comment. What you propose makes a lot of sense to me for 10-bit int.

@Myndex
Copy link
Member Author

Myndex commented Oct 27, 2023

Hi @ProgramMax

...the image data itself isn't terribly useful.

I find it useful....as someone who does 3D cgi, animation, and compositing... and I'd say the same for motion vector files and in fact most data types that use image containers—and I'd recon most of the other VFX artists I know would echo that... Just looking at it I can tell if its a normals map, a motion vector, a bump, etc... and when dealing with a folder full of files, this is helpful.

I do agree if you are implying possible future viewers might be able to parse normals, motion vectors, occlusions, etc etc, and present them in a viewer that permitted applying that data and/or compositing it, at least as a proxy, onto some other image—that could be useful.

But I don't think it's necessary to mute the output per se, it isn't noise/raw data that displays as garbage... And even that is desirable at times, depending. But most of the stuff we stick into image containers, even if ostensibly "data", we stick in image container because it is essentially a form of or derivative of image data, and seeing it we can see exactly what it's all about.

That said, I do like the idea of labeling each channel, especially for such data formats. And a specific chunk like that would be useful for the 16 bit RGBA also...

@ProgramMax
Copy link
Collaborator

I think we agree. I should clarify.

It is useful for those purposes (normals, motion, etc). And it is useful for people in the know to see a weird-looking image and say "That is definitely normal data".

The guideline I've been using (could be wrong) for whether or not a chunk should be ancillary is would a picture of an apple still convey as a picture of an apple without this chunk being understood? For example, perhaps a colorspace chunk is not understood by a given image viewer. But it's still a red apple. The image isn't conveyed 100% correctly. The red will be slightly off. But it is correct enough to be useful to the average user.

But for normals, motion, etc the whole apple & average user analogy breaks down. To them, the image data isn't useful. To them, there was never anything to see. There was no apple. So if a new chunk breaks existing viewers--but only for images that the average person found useless--does it matter?

You have a great point though that people in the know still get value out of seeing that image. It would put a burden on them to use an image viewer that either understands the new chunk or ignore PNG ancillary rules. That might not be a large ask for a person in the know. But it is still worth considering.

@Myndex
Copy link
Member Author

Myndex commented Oct 28, 2023

Oh I see what you're saying...

But... should a new chunk break anything? Shouldn't any viewer just ignore chunks it doesn't understand? I would hope they would, if only for stability...

@Myndex
Copy link
Member Author

Myndex commented Oct 28, 2023

...color models beyond RGB...

This had me thinking. Here are the thoughts: a mini float, and a way to put 12bit PQ into an 8bpc container.

Edit new:

Updated chart, changed direction here a bit, working on a dynamic asymmetrical bias for the exponents, need to work out subnorm numbers for instance. But by pencil tests, seems like we're getting 11-12 bit performance... maybe...

MiniFloat bias chart interim oct29-2023 v2

Far from finished.

Meanwhile here's the updated map:

MiniFloat png file pixel structure oct29-2023 v2

Working on a revision, click for legacy content

Below is under revision due to some ideas that developed, will show soon.

Mini Float png

Proposed: MFLT (MfLT interim)

A mini float format wedged into an 8 bit per channel RGBA png. Each 8
bit RGB channel has 6 Significand with the 2 LS exponent bits, the
remaining exponent bits are in what is normally the alpha channel.

In the 8 repourposed alpha bits, the MSb is a 1 bit alpha, next is a
signed/unsigned mode flag, then the 2 MSbs of each RGB channel
exponent/sign bit.

Asymetrical bias with a per-pixel control of signed/unsigned mode,
default bias weighted to favor positive numbers.

4 bit exponent in unsigned mode, and 3 bit exponent in signed mode. The
mode flag in the alpha sets signed/unsigned for the pixel, applying to
all three RGB channels. In signed mode, each individual RGB channel can
be either positive or negative in the range ±1.984375

The top MSb of the exponent is either a 4th exponent bit for positive
floats, or is the sign bit.

  • Apply a bitwise OR between the flag and the MSb (flag|MSb) which
    will always show sign for the number, 1==(+) 0==(-), regardless of
    mode.

Bit segmenting is the two MSb of the exponent, the full mantissa is in
position on each RGB channel.

Significand (Mantissa) is 7 bits (6 bits explicit, 1 implied).

Bias and Base can be arbitrary

Set in the MFLT chunk, along with optional scale.

  • Default bias of 8 and base 2, provides a range of -1.98 to +508.0
  • Signed mode range is ± 1.984375
  • Unsigned mode is 0 to 508.0
  • In signed mode, the top MSb is the sign bit.
  • In unsigned mode, the top MSb it given to the exponent & indicates
    the exponent's sign.
  • Signed or unsigned mode is per pixel, and can be dynamically
    switched.
    • caveat: values less than zero can not be in the same pixel
      with values greater than 1.98
    • This only applies to dynamic signed/unsigned mode.
    • Encoder can resolve edgecase conflicts by clamping negative to 0
      or overs to 1.98
    • The MFLT chunk should have an image-wide lock for the
      sign/unsign flag.

An arbitrary base can be used

  • The base can be 10 for instance
  • For more granularity, the base "could" be something like square root
    of 2 (1.414...)
    • Though it might be more efficient to introduce a scale factor in
      the MFLT chunk
    • The one bit used as alpha might be better used to dynamically
      select/enable a scale factor.

Considerations

  1. There perhaps should be separate biases in the MFLT chunk, one for signed and one for unsigned, though this assumes an image may have a combination of signed and unsigned minifloat pixels, which may or may not be advisable.

  2. It also could be useful (or might be necessary) to have a spearate base for signed and unsigned.

  3. For the purpose, it may be useful to support moving a bit from exponent to the mantissa, so the mantissa is 7+1 implied. Identified in the MFLT chunk.

  4. The one alpha bit perhaps should be referred to as a utility bit, and can be assigned in the MFLT chunk to act as one of:
    a) Alpha
    b) Per Pixel Scale
    c) Per Pixel Bias
    d) Per Pixel Base
    e) Per Pixel bit structure (6 bit or 7 bit mantissa)

  5. Which bits go to the alpha, there are arguments for MSb exponent, LSb exponent, or LSb mantissa.

Summary

  • Effective precision of an 11 bit float (4bit exponent, 7bit
    mantissa)
  • Maps to an 8 bit per channel png (RGBA) container
  • Expected to compress well using the standard png compression
    schemes.
  • Arbitrary base supported (image wide)
  • Scale factor supported (image wide)
  • Adjustable bias (image wide)
  • Precision is acheived by:
    • Asymmetric bias to maximize useful range for images.
    • Dynamic per pixel sign/unsign mode assumes that:
      • negative RGB data is less common, and
      • when present in a pixel, the other pixels are likely to be
        in the ±1.98 range,
      • encoder can resolve edge case conflicts by clamping
        negatives to 0 if a high value is present or vice versa,
        with minimal issues.

MiniFloat png file pixel structure oct28-2023

@Myndex
Copy link
Member Author

Myndex commented Oct 28, 2023

YPQUV png

Proposed: YPUV (YpUV interim)

This converts 12bit per channel PQ RGB to YUV, where Y is PQ gamma at 12
bits, and the U and V are 10 bits each.

The advantage is maintaining the essential luminance resolution, but
compacting the full PQ signal into an 8bpc png container.

Coefficients are applied to the gamma encoded RGB tuples for
computational efficiency, as is common practice in broadcasting.

In the layout below, the V is inverted to -V, essentially putting blue
on top. In this way, the truncated U Y -V channels very roughly fit R G
B, for a semi-viewable image at the file system level.

Advantages

  • A 12 bit PQ image in an 8bpc container.
  • Uses standard png compression
  • YPQ is created with the PQ gamma for computational
    simplicity
  • UV is used as it is a common, simple transform.

YpgUV  png file pixel structure oct29-2023


YPQUV 422 png

Proposed: YPUV (YpUV interim)

This converts 12bit per channel PQ RGB to YUV, where Y is PQ gamma at 12
bits, and the U and V are 12 bits each, but only U or V is present on
any given pixel.

The advantage is maintaining the essential luminance resolution, but
compacting the full PQ signal into an 8bpc png container, and maintiaing
higher color depth than the previous method, trading off for spatial
resolution instead.

Coefficients are applied to the gamma encoded RGB tuples for
computational efficiency, as is common practice in broadcasting.

Unlike the above version (12 10 10), this version will not be
particularly visible in legacy viewers, but it does retain a full 8 bit
alpha channel.

It should compress well, as the vertically adjecent pixels will both be either U or V type, so though the horizontal adjacent pixels alternate U or V, the prefilter should in theory select the vertically adjacent pixel for the deltas.

UVUVUV
UVUVUV

An alternate scheme is, horizonally UUVVUUVV, and offsetting each line by 1 pixel as:

The stagger should progress right:

UUVVUUVV
VUUVVUUV
VVUUVVUU
UVVUUVVU

This way a U (or V) will always have a U (or V) either above or to the left, and
should give the prefilter more options for which pixel to select for the deltas.

. C B . .
. A x . .

Among other things, if the prefilter selects the linear (A+B+C)/3 mode, two of the three pixels will always be the same type as the present pixel.

Find delta for A V pixel, and the adjacent A,B,C pixels.

Advantages

  • A 12 bit PQ image in an 8 bpc container.
  • Uses standard png compression
  • U and V are distributed to pixels in a vertically aligned way that should compress well.
    • Alternately, UU and VV pairs alternate, with a march right per line
  • YPQ is created with the PQ gamma for computational simplicity
  • UV is used, as it is a common, simple transform.
  • U and V are each 12 bits signed (or offset?) and alternate each pixel, or a marching-offset pair.
  • The full 8 bit Alpha is maintained for the alpha version. The non alpha version works with 8bpc RGB container.

Unknown Issues

  • Not sure if this scheme will compress usefully, it seems it should but tests are needed.
  • No legacy safe fallback view as I propose for the 10bit RGB at the top of the thread.

YpqUV 422  png file pixel structure oct29-2023

Sampling U and V at half the spatial is a common strategy, as the Y holds all the important spatial detail. THis is in accordance with the human vision system's handling of hue/chroma at a third or less the resolution of luminance.

Rather than UV, other color difference modes could of course be used... But thinking efficiency, simplest transforms, avoid any unnecesarry math for converting from YUV to RGB..

The question, is LZ77 decompress fast enough for a streaming use case...

@randy408
Copy link

randy408 commented Nov 1, 2023

Proposed: TENB (TeNB interim)

To test compression efficiency you could do the bit shuffling yourself, encode as 8-bit truecolor-alpha and compare file size to the next closest thing (16-bit truecolor if we ignore the 2-bit alpha). Just make sure the test images are actually 10-bit, if they're upscaled from 8-bit and the LSB's stuffed into the alpha channel are all the same it's gonna compress better than it should.

@Myndex
Copy link
Member Author

Myndex commented Nov 2, 2023

....Just make sure the test images are actually 10-bit...

Yes I have a lot of 10 bit material, currently as DPX files. should be straight forward to modify a png library for this testing, to see if there is value in proceeding... looking for a suitable JS png lib...

@jbowler
Copy link

jbowler commented Feb 1, 2024

10 bit isn't particularly general. One of the core problems of PNG is that the channel encoders are not separated, so patterns in the channels get obfuscated in the higher (or is it lower?) level LZ77 compression. The general solution is a bit-agnostic encoder in each channel and a good model of the data being encoded. This, of course, sounds like LZW :-)

But not really; the problem is the data model (RGBA, linear, equal precision) not the compression technique. The productive approach is not specific ad hoc encodings but consideration of the underlying data model. I suggest it is CIELuv. YCbCr is an example of an encoding more appropriate to CIELuv than RGB.

PNG (or JPEG, or TIFF) isn't beholden to the specific encoding it uses. An optimal encoding does not require any relationship to the nominal encoding; for example RGB might be optimally encoded as CIELuv then decoded back into RGB without loss.

The problem is that PNG is beholden to the mindset of computer programmers and to their approach. The basic science is clear; humans have limited discrimination of colour but a remarkable ability to perceive luminance variations over a massive range (11D to 16D?) The colour part of that range is about 8D and within that range how many colors are there? (Anyone who says "all the colors of the rainbow" please step outside and stick your head in a bucket of water; trichromats only see seven colors in the rainbow and I can only see six of them).

The key is getting the right model first then developing the encoding.

PNG makes the approach simple: the IHDR contains a "color type" field and a "compression" field. Either or both have private values; anything >=128. Just do it; this is a well established approach. Use the "private" definitions to implement and use a better encoding/model, standardization will follow. Don't standardize from the altar!

@Myndex
Copy link
Member Author

Myndex commented Feb 2, 2024

Hi John @jbowler

...10 bit isn't particularly general...

I'm not sure what you mean. 10 bit per channel is the second most used bit depth after 8 bit for images and especially for streaming image sequences/video. 10bit per channel is very common, and required for higher gamuts and higher dynamic range methods.

I'm not certain about the remainder of the post: .png is a defined standard, with a defined method for data compression. The 10-bit png I proposed is designed specifically to work within the existing framework with minimal issues, and maximum data compression given the existing paradigm, without created an entirely different "not .png" model.

The other potential encodings I should probably delete from this thread as they may be causing confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants