Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate concepts of image file format / compression vs encoding #3804

Closed
jleibs opened this issue Oct 11, 2023 · 5 comments
Closed

Separate concepts of image file format / compression vs encoding #3804

jleibs opened this issue Oct 11, 2023 · 5 comments

Comments

@jleibs
Copy link
Member

jleibs commented Oct 11, 2023

⚠️ This is a fairly large issue that's going to require some thought and refactoring.

Status: Draft Being Written

Context

Our Images currently exist as a thin wrapper over TensorData.

As we started to add support for compressed images, we introduced a special-case for JPEG data inside the TensorData type. This should be cleaned up in:

However, support for new image encodings, added in:

Because multi-plane / multi-resolution images are not well represented by dense tensors, we followed the pattern established for jpegs. We can continue to do this even through the cleanup of #3803, but ultimately this is not really a semantically correct representation.

Semantically:

  • A multi-plane image (e.g. NV12) should still have a deterministic mapping of pixel-coordinates -> data values. While multi-plane images might involve looking up and combining multiple values, the offsets into the separate channels are still deterministic.
  • A compressed image requires a specific decoding step (likely with caching) to convert it into an image that can be used by the viewer.

Proposal

The data representation of images and tensors ultimately needs to be split into 3 separate things:

  • Tensors - dense multi-dimensional arrays
  • Images - N planes of dense two-dimensional arrays
  • ImageAssets - BinaryBlobs with a specified format.

Ideally we should be able to view an image as a tensor, but even then that view should feature a choice of decoder. For example, converting to 3 planes of RGB vs leaving as up-sampled YCrCb.

@teh-cmc
Copy link
Member

teh-cmc commented Oct 11, 2023

I took some notes in parallel, before reading those. Dumping them below.


N-D tensors vs. N-D ragged tensors vs. encoded blobs

Context

Types

Some definitions first so we're on the same page:

  • N-D (regular) tensor: an homogeneous multi-dimensional array with fixed and consistent dimensions for each axis.
    This corresponds to Rerun's TensorData datatype, which is the underlying datatype for both our Tensor and {Depth,Segmentation,Color}Image archetypes (the difference between all of those basically comes down to indicator components).
    Tensors are always guaranteed to provide data-agnostic, fast random-access and slice-and-dice capabilities.

  • N-D ragged tensor: a multi-dimensional array with varying dimensions along one or more axes.
    I use the term "ragged tensor" here as this is what TensorFlow calls them.
    There is no archetype for this in Rerun at this time, although if you squint hard enough you could say that we already have an example of a ragged tensor type in practice: this is how we support JPEGs (we'll come back to this).

  • Encoded blobs (images/tensors/meshes/etc): a blob of bytes whose semantics will depend on the associated media-type (MIME).
    Since the encoding can be anything (e.g. JPEG, PNG... or even binary glTF!), the underlying data might or might not be homogeneous (i.e. ragged).
    Similarly, some encodings might allow for cheap random-access, while others don't.
    Encoded blobs have to be handled on a case-by-case basis: what you are able to do with the data depends on the associated media-type (MIME).
    Our Asset3D archetype uses our Blob component to store its data, along with a media-type (gltf, gltf, obj...).
    Similarly, our JPEG support is implemented using a blob, although for historical reasons this blob is merged into the definition of TensorData.

Views

We provide a generic tensor view that allows user to visualize any homogeneous N-D tensor as raw data (i.e. no particular semantics assumed).
The data can be sliced-and-diced through and colormapped as needed.
Anything backed by TensorData can be visualized using this view (well, ideally).
Well, anything except JPEG: this is fine though, since slicing-and-dicing through still encoded JPEG data isn't particularly helpful anyway.

Our 2D view provides an alternative visualization for regular tensors: they can be rendered as textured rectangles, provided they have a "renderable shape" (e.g. HxWxG, HxWxRGB, etc).
Things that don't fit into regular tensors (e.g. JPEG blobs) can also be rendered in 2D by decoding their data on the fly (and then caching it).

Finally, our 3D view knows how to turn Asset3D blobs into triangles based on their media types.

What's new

#3541 introduces NV12-encoded images, which are essentially ragged tensors, but other than that can still be accessed and interpreted directly (i.e. no compression or other complex encoding).
Since they are ragged, our Tensor view cannot work with them.
Our 2D view can of course display them as textured rectangles by decoding the data as needed, but it is a shame that we lose the ability to inspect the invidividual luminance/chroma channels in the process (the data is right there!).

Our 2D view should expose contextual settings that vary depending on the media type of the encoded data.
These settings should help recover most of the value lost from not being able to visualize the data with our regular tensor viewer.
E.g.:

  • YUV channel toggles for NV12 images (e.g. by scaling the chroma channels by n so they can be visualized with the same dimensions as the final image)
  • RGBA channel toggles for JPEGs (using the decoded data already present in the cache).
  • etc

Proposal

  • Asset3D should be renamed Mesh3DEncoded.
  • We should introduce a new ImageEncoded archetype that is backed by the same Blob and MediaType components as Mesh3DEncoded.
  • TensorData should never contain anything but regular tensors.
  • JPEGs should be implemented using ImageEncoded.
  • The 2D view should provide contextual settings depending on the media-type of the ImageEncoded, to recoup some of the value lost due to not being able to slice-and-dice through the underlying data.

Questions

  • Should we somehow expose a way of creating a tensor view using the decoded JPEG data (which is once again a regular tensor at this point)?
    I think the answer to this is: wait for datatype conversions to be a thing.

@SeaOtocinclus
Copy link
Contributor

@jleibs We can't compress gray data, is it expected?
image

Just do the following with an img being a gray matrix

rr.Image(img).compress(jpeg_quality=75),

@jleibs
Copy link
Member Author

jleibs commented Jan 17, 2024

@SeaOtocinclus thanks for reporting. That doesn't seem expected. I'll take a look.

@pmoulon
Copy link

pmoulon commented Jan 23, 2024

@jleibs Found out that the compress options does not seems accessible from the C++ API, ideally it should if we want the different API to be on par
https://ref.rerun.io/docs/cpp/stable/image_8hpp_source.html

@emilk
Copy link
Member

emilk commented Jul 10, 2024

@emilk emilk closed this as not planned Won't fix, can't repro, duplicate, stale Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants