Separate concepts of image file format / compression vs encoding #3804

jleibs · 2023-10-11T13:40:35Z

⚠️ This is a fairly large issue that's going to require some thought and refactoring.

Status: Draft Being Written

Context

Our Images currently exist as a thin wrapper over TensorData.

As we started to add support for compressed images, we introduced a special-case for JPEG data inside the TensorData type. This should be cleaned up in:

Replace TensorBuffer::JPEG with ImageEncoded archetype #3803

However, support for new image encodings, added in:

Support NV12-encoded images #3541

Because multi-plane / multi-resolution images are not well represented by dense tensors, we followed the pattern established for jpegs. We can continue to do this even through the cleanup of #3803, but ultimately this is not really a semantically correct representation.

Semantically:

A multi-plane image (e.g. NV12) should still have a deterministic mapping of pixel-coordinates -> data values. While multi-plane images might involve looking up and combining multiple values, the offsets into the separate channels are still deterministic.
A compressed image requires a specific decoding step (likely with caching) to convert it into an image that can be used by the viewer.

Proposal

The data representation of images and tensors ultimately needs to be split into 3 separate things:

Tensors - dense multi-dimensional arrays
Images - N planes of dense two-dimensional arrays
ImageAssets - BinaryBlobs with a specified format.

Ideally we should be able to view an image as a tensor, but even then that view should feature a choice of decoder. For example, converting to 3 planes of RGB vs leaving as up-sampled YCrCb.

The text was updated successfully, but these errors were encountered:

teh-cmc · 2023-10-11T14:30:39Z

I took some notes in parallel, before reading those. Dumping them below.

N-D tensors vs. N-D ragged tensors vs. encoded blobs

Context

Types

Some definitions first so we're on the same page:

N-D (regular) tensor: an homogeneous multi-dimensional array with fixed and consistent dimensions for each axis.
This corresponds to Rerun's TensorData datatype, which is the underlying datatype for both our Tensor and {Depth,Segmentation,Color}Image archetypes (the difference between all of those basically comes down to indicator components).
Tensors are always guaranteed to provide data-agnostic, fast random-access and slice-and-dice capabilities.
N-D ragged tensor: a multi-dimensional array with varying dimensions along one or more axes.
I use the term "ragged tensor" here as this is what TensorFlow calls them.
There is no archetype for this in Rerun at this time, although if you squint hard enough you could say that we already have an example of a ragged tensor type in practice: this is how we support JPEGs (we'll come back to this).
Encoded blobs (images/tensors/meshes/etc): a blob of bytes whose semantics will depend on the associated media-type (MIME).
Since the encoding can be anything (e.g. JPEG, PNG... or even binary glTF!), the underlying data might or might not be homogeneous (i.e. ragged).
Similarly, some encodings might allow for cheap random-access, while others don't.
Encoded blobs have to be handled on a case-by-case basis: what you are able to do with the data depends on the associated media-type (MIME).
Our Asset3D archetype uses our Blob component to store its data, along with a media-type (gltf, gltf, obj...).
Similarly, our JPEG support is implemented using a blob, although for historical reasons this blob is merged into the definition of TensorData.

Views

We provide a generic tensor view that allows user to visualize any homogeneous N-D tensor as raw data (i.e. no particular semantics assumed).
The data can be sliced-and-diced through and colormapped as needed.
Anything backed by TensorData can be visualized using this view (well, ideally).
Well, anything except JPEG: this is fine though, since slicing-and-dicing through still encoded JPEG data isn't particularly helpful anyway.

Our 2D view provides an alternative visualization for regular tensors: they can be rendered as textured rectangles, provided they have a "renderable shape" (e.g. HxWxG, HxWxRGB, etc).
Things that don't fit into regular tensors (e.g. JPEG blobs) can also be rendered in 2D by decoding their data on the fly (and then caching it).

Finally, our 3D view knows how to turn Asset3D blobs into triangles based on their media types.

What's new

#3541 introduces NV12-encoded images, which are essentially ragged tensors, but other than that can still be accessed and interpreted directly (i.e. no compression or other complex encoding).
Since they are ragged, our Tensor view cannot work with them.
Our 2D view can of course display them as textured rectangles by decoding the data as needed, but it is a shame that we lose the ability to inspect the invidividual luminance/chroma channels in the process (the data is right there!).

Our 2D view should expose contextual settings that vary depending on the media type of the encoded data.
These settings should help recover most of the value lost from not being able to visualize the data with our regular tensor viewer.
E.g.:

YUV channel toggles for NV12 images (e.g. by scaling the chroma channels by n so they can be visualized with the same dimensions as the final image)
RGBA channel toggles for JPEGs (using the decoded data already present in the cache).
etc

Proposal

Asset3D should be renamed Mesh3DEncoded.
We should introduce a new ImageEncoded archetype that is backed by the same Blob and MediaType components as Mesh3DEncoded.
TensorData should never contain anything but regular tensors.
JPEGs should be implemented using ImageEncoded.
The 2D view should provide contextual settings depending on the media-type of the ImageEncoded, to recoup some of the value lost due to not being able to slice-and-dice through the underlying data.

Questions

Should we somehow expose a way of creating a tensor view using the decoded JPEG data (which is once again a regular tensor at this point)?
I think the answer to this is: wait for datatype conversions to be a thing.

SeaOtocinclus · 2024-01-17T18:37:20Z

@jleibs We can't compress gray data, is it expected?

Just do the following with an img being a gray matrix

rr.Image(img).compress(jpeg_quality=75),

jleibs · 2024-01-17T19:04:09Z

@SeaOtocinclus thanks for reporting. That doesn't seem expected. I'll take a look.

pmoulon · 2024-01-23T00:02:33Z

@jleibs Found out that the compress options does not seems accessible from the C++ API, ideally it should if we want the different API to be on par
https://ref.rerun.io/docs/cpp/stable/image_8hpp_source.html

emilk · 2024-07-10T09:26:09Z

This has now been split into multiple other issues:

jleibs added 💬 discussion 🔩 data model labels Oct 11, 2023

jleibs mentioned this issue Jan 24, 2024

Missing documentation for ImageEncoded and NV12 support #4820

Closed

Wumpf mentioned this issue May 20, 2024

Encode images as a byte blob + pixel format + resolution; not as a tensor #6386

Closed

jleibs assigned Wumpf Jul 9, 2024

Wumpf removed their assignment Jul 9, 2024

emilk self-assigned this Jul 10, 2024

emilk mentioned this issue Jul 10, 2024

Refactor Tensor and Image archetypes #6844

Open

1 task

emilk closed this as not planned Won't fix, can't repro, duplicate, stale Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate concepts of image file format / compression vs encoding #3804

Separate concepts of image file format / compression vs encoding #3804

jleibs commented Oct 11, 2023

teh-cmc commented Oct 11, 2023

SeaOtocinclus commented Jan 17, 2024

jleibs commented Jan 17, 2024

pmoulon commented Jan 23, 2024

emilk commented Jul 10, 2024

Separate concepts of image file format / compression vs encoding #3804

Separate concepts of image file format / compression vs encoding #3804

Comments

jleibs commented Oct 11, 2023

Status: Draft Being Written

Context

Proposal

teh-cmc commented Oct 11, 2023

N-D tensors vs. N-D ragged tensors vs. encoded blobs

Context

Types

Views

What's new

Proposal

Questions

SeaOtocinclus commented Jan 17, 2024

jleibs commented Jan 17, 2024

pmoulon commented Jan 23, 2024

emilk commented Jul 10, 2024