Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support NV12-encoded images #3541

Merged
merged 25 commits into from
Oct 16, 2023
Merged

Support NV12-encoded images #3541

merged 25 commits into from
Oct 16, 2023

Conversation

zrezke
Copy link
Contributor

@zrezke zrezke commented Sep 29, 2023

Added support for logging NV12 encoded images via the existing log_image_file api.

As promissed a while back :)

from rerun import ImageFormat
import rerun as rr

rr.log_image_file("NV12 image", img_bytes=bytes(frame), img_format=ImageFormat.NV12(width=1920, height=1080))

NV12 image example

The raw (encoded) image data is stored in an R8Uint texture. The contents of the texture get decoded in the rectangle_fs.wgsl via the decoder written in decodings.wgsl.

Other YUV image formats could be added easily, following the NV12 implementation with slight modifications to the decoder.

Checklist

@Wumpf Wumpf self-requested a review September 29, 2023 12:26
@zrezke
Copy link
Contributor Author

zrezke commented Sep 29, 2023

Example with example image:
nv12_example.zip

Copy link
Member

@Wumpf Wumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my, this is amazing. Thank you so much for following up and coming back with this <3

Overall looks really good already. The main two points that I think we need to address is what I wrote on gamma and finding a way to have format not embed a size.

It would also be really nice to ship with some example so we can test and/or demonstrate this. Don't have NV12 data at hand right now and would like to play around it myself. Do you know of a decent data source that we could download? (edit: oh, you posted one!) I suppose there should be a way to have opencv to convert image stream from a webcam to NV12 as a toy example (haven't checked yet)
Relatedly, did you get a chance to test it on the web? (I doubt this will cause any issues but any time we touch the shaders I'm not so sure)
If not I can look after that next week, no worries :)

It does highlight the issues we have with extensibility in this area but fixing that is all out of scope and I think you navigated around the existing limitations very well (:

In any case landing this PR will have to wait for 0.9 release which will happen mid next week - we have a lot of stuff changing in this release as you might have noticed and I'm hesitant to add more things that can go wrong, hope you don't mind too much.

@@ -73,4 +76,5 @@ union TensorBuffer (
F32: F32Buffer (transparent, order:1000),
F64: F64Buffer (transparent, order:1200),
JPEG: JPEGBuffer (transparent, order:1300),
NV12: NV12Buffer (transparent, order:1400),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one warrants some documentation on how the data is layed out. I wouldn't mind though if it forwards to wikipedia or the gist (https://gist.github.com/Jim-Bar/3cbba684a71d1a9d468a6711a6eddbeb) you added to the pr description for the exact details.

crates/re_types/src/datatypes/tensor_data_ext.rs Outdated Show resolved Hide resolved
crates/re_space_view_tensor/src/space_view_class.rs Outdated Show resolved Hide resolved
crates/re_renderer/src/renderer/rectangles.rs Outdated Show resolved Hide resolved
crates/re_renderer/src/renderer/rectangles.rs Outdated Show resolved Hide resolved

name = "NV12"

def __init__(self, width: int | None = None, height: int | None = None) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not put size information into the image format. Instead the expected shape for NV12 tensors should be documented. To allow the convenient automatic reshaping as down below in response to NV12 we can instead add a size hint parameter to a custom init method of Tensor and do the necessary reshaping & validation there in response. Let me know if you want help/clarification with this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wumpf If I understand correctly this would look something like this:

In the ImageEncoded component class the init method would accept the following as contents:

        contents: bytes | IO[bytes] | npt.NDArray | rr.Tensor | None = None,

npt.NDArray to allow for np arrays (because bytes don't have shapes) and rr.Tensor to leverage an updated TensorExt init which accepts a size hint to allow for an automatic resize. Now I am not too sure how an automatic resize would look like on a tensor, since it's kind of a general purpose datatype: did you have a specific api in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at things a bit closer I realize that the ImageEncoded api is a little bit in flux actually (and log_image_file which you used in the PR description will be removed on 0.10) and I got the entries of the whole thing wrong. Also I didn't give it much thought that you wanted to pass in just bytes which totally makes sense here.
I'm actually not sure if we want ImageEncoded to mean only file formats (status quo) or also anything that isn't rgb (like used here now) in the long run 🤔 ...
Anyways, let's not get into a rabbit hole here and keep it simple, sticking with ImageEncoded.

What do you think of this (incomplete diff against main):

diff --git a/rerun_py/rerun_sdk/rerun/_image.py b/rerun_py/rerun_sdk/rerun/_image.py
index 65ca974eb..85738a992 100644
--- a/rerun_py/rerun_sdk/rerun/_image.py
+++ b/rerun_py/rerun_sdk/rerun/_image.py
@@ -3,11 +3,13 @@ from __future__ import annotations
 import io
 import pathlib
 from enum import Enum
-from typing import IO, Iterable
+from typing import IO, Iterable, Sequence
 
 import numpy as np
 from PIL import Image as PILImage
 
+from rerun.error_utils import _send_warning_or_raise
+
 from ._log import AsComponents, ComponentBatchLike
 from .archetypes import Image
 from .components import DrawOrderLike, TensorData
@@ -53,6 +55,7 @@ class ImageEncoded(AsComponents):
         contents: bytes | IO[bytes] | None = None,
         format: ImageFormat | None = None,
         draw_order: DrawOrderLike | None = None,
+        size_hint: Sequence[int] | None = None,
     ) -> None:
         """
         Create a new image with a given format.
@@ -72,6 +75,10 @@ class ImageEncoded(AsComponents):
             An optional floating point value that specifies the 2D drawing
             order. Objects with higher values are drawn on top of those with
             lower values.
+        size_hint:
+            An optional tuple/list of (width, height) that specifies the image size.
+            This is *required* for formats that do not implicitly specify their size
+            and is ignored for formats that do (all file formats).
         """
         if (path is None) == (contents is None):
             raise ValueError("Must provide exactly one of 'path' or 'contents'")
@@ -89,6 +96,11 @@ class ImageEncoded(AsComponents):
 
         if format is not None:
             formats = (str(format),)
+            if isinstance(format, NV12):
+                if size_hint is None:
+                    raise ValueError("Must provide size_hint for NV12 images")
+            elif size_hint is not None:
+                _send_warning_or_raise(f"size_hint is ignored for images with {format}")
         else:
             formats = None

Not so sure about size_hint as name. And admittedly this is still rather nasty since now Nv12 is the one and only format that needs it and everything else just ignores the size :/. But I think that's better to have some of the formats embed a size and some not (if we keep on this path there's gonna be more of them that follow the Nv12 pattern).
It's pretty clear that we'll have to refactor this again later, but would good to have some path forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea let's not complicate it too much. Hear me out: I think embedding the size hint along with the color_standard would make a lot of sense here. I think this will be a better UX, since you wouldn't be able to try and log an NV12 image without supplying it's size + I think it's also a nice place for specifying the color standard used... These are all NV12 specific right now, that's why I'd like to keep these fields hidden from the general ImageEncoded api...

class ColorStandard(Enum):
    Bt601 = "Bt601"
    Bt709 = "Bt709"


class NV12(ImageFormat):
    """NV12 format."""

    name = "NV12"
    size_hint: Tuple[int, int]
    color_standard: ColorStandard

    def __init__(self, size_hint: Tuple[int, int], color_standard: ColorStandard = ColorStandard.Bt601) -> None:
        """
        An NV12 encoded image.

        Parameters
        ----------
        size_hint:
            (height, width), the RGB size of the image
        color_standard:
            rr.ColorStandard (default: rr.ColorStandard.Bt601),
            the color standard used to decode the image.(ref. https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion)
        """
        self.size_hint = size_hint
        self.color_standard: color_standard

I am biased because I wrote this code, but I really like the way this comes out as an API, as long as you are using a linter.

This allows for:

rr.log("NV12", contents=byte_contents, format=rr.ImageFormat.NV12((1080, 1920), rr.ColorFormat.Bt601))

Instead of:

rr.log("NV12", contents=byte_contents, format=rr.ImageFormat.NV12, size_hint, color_format)

The latter doesn't look bad for NV12 either, but I think when you are logging let's say a JPEG and see the size_hint and color_format fields they may confuse you...

In addition, I think being able to pass a np.array to ImageEncoded in combination with bytes, could also be a very welcome future improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies for the reply delay!
Checked with some other folks on the team as well and concluded you're right, the ergonomics of this are way better :)
One thing thing though, it would be nice to put ColorStandard as a separate component on image similar to how DepthMeter is done on depth image. You'd in any case run into problems of how to pipe it through otherwise. (and yes we'd just warn for Bt601 on anything but NV12, since effectively we're assuming BT709/srgb everywhere and don't want to implement the conversion right now)

But I realize that this is another big change on top to throw at you and given how big the pr is already I'd even prefer if you just hardcode the primaries that you need (maybe with a note pointing here) and we'll ship that, adding ColorStandard later :)
Meaning we're landing this file here pretty much as is 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I think this means we're good to go then unless you have something more to add?

on my side I need yet to do a quick run on web and maybe add a "test" to the test_tensor.py but otherwise this can go to 0.10 :)

Copy link
Contributor Author

@zrezke zrezke Oct 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to hear that! Let's :shipit:. From my side, I could add a simple example that logs NV12 frames from a webcam using cv2 (frames are converted from BGR -> NV12 on host), as you thought of somewhere in this discussion:

#!/usr/bin/env python3
"""
Stream NV12 images from a webcam.

Run:
```sh
pip install -r examples/python/nv12/requirements.txt
python examples/python/nv12/main.py
```
"""
from __future__ import annotations

import argparse

import rerun as rr  # pip install rerun-sdk
import cv2
import numpy as np
import time


def bgr2nv12(bgr: np.ndarray) -> np.ndarray:
    yuv = cv2.cvtColor(bgr, cv2.COLOR_BGR2YUV_I420)
    uv_row_cnt = yuv.shape[0] // 3
    uv_plane = np.transpose(yuv[uv_row_cnt * 2 :].reshape(2, -1), [1, 0])
    yuv[uv_row_cnt * 2 :] = uv_plane.reshape(uv_row_cnt, -1)
    return yuv


def main() -> None:
    parser = argparse.ArgumentParser(description="Example of using the Rerun visualizer to display NV12 images.")
    rr.script_add_args(parser)
    parser.add_argument(
        "-t",
        "--timeout",
        type=float,
        default=5,
        help="Timeout in seconds, after which the script will stop streaming frames.",
    )
    args = parser.parse_args()

    rr.script_setup(args, "NV12 image example")

    cap = cv2.VideoCapture(0)
    if not cap.isOpened():
        raise RuntimeError("This example requires a webcam.")
    start_time = time.time()
    print(f"Started streaming NV12 images for {args.timeout} seconds.")
    while start_time + args.timeout > time.time():
        ret, frame = cap.read()
        if not ret:
            time.sleep(0.01)
            continue
        rr.log(
            "NV12",
            rr.ImageEncoded(
                contents=bytes(bgr2nv12(frame)),
                format=rr.ImageFormat.NV12((frame.shape[0], frame.shape[1])),
            ),
        )
        time.sleep(0.01)
    rr.script_teardown(args)


if __name__ == "__main__":
    main()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I'd really aprreciate it if you tested for web, thanks!

crates/re_renderer/shader/decodings.wgsl Show resolved Hide resolved
crates/re_viewer_context/src/gpu_bridge/tensor_to_gpu.rs Outdated Show resolved Hide resolved
@Wumpf Wumpf added enhancement New feature or request 🔺 re_renderer affects re_renderer itself 📺 re_viewer affects re_viewer itself include in changelog labels Sep 29, 2023
@Wumpf Wumpf added this to the 0.10 Polish milestone Sep 29, 2023
@emilk
Copy link
Member

emilk commented Oct 2, 2023

If you merge in latest main we should have a working CI for external PRs like this one :)

zrezke and others added 4 commits October 2, 2023 09:02
…ix gamma correction, fix shader clamping errors. Added a reference for the YUV2sRGB conversion coefficients. Made the image_height_width_channels TensorBuffer matching exhaustive.
@Wumpf Wumpf self-requested a review October 9, 2023 07:51
@emilk emilk changed the title Support for NV12 encoded images! Support NV12-encoded images Oct 9, 2023
@Wumpf Wumpf self-requested a review October 15, 2023 17:29
@@ -266,6 +337,22 @@ tensor_type!(arrow2::types::f16, F16);
tensor_type!(f32, F32);
tensor_type!(f64, F64);

// Manual expension of tensor_type! macro for `u8` types. We need to do this, because u8 can store encoded data
impl<'a> TryFrom<&'a TensorData> for ::ndarray::ArrayViewD<'a, u8> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this covers how to go from TensorData -> ArrayView, but the other way around is not defined now, causing a bunch of tests to fail compiling

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm splitting the macro up to fix it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks!

@Wumpf
Copy link
Member

Wumpf commented Oct 16, 2023

Fixed remaining warnings and ArrayView conversion. Added a simple python test based on the capture script you posted (not using a capture tho, but just static rerun logo).
Confirmed working on the web just fine! :)
image

Copy link
Member

@Wumpf Wumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to go!
Oh looks like my push didn't work :O

@Wumpf
Copy link
Member

Wumpf commented Oct 16, 2023

@zrezke looks like I can't push to this PR. I think there's a setting somewhere, but I get confused about this feature all the time.
Can you cherry-pick in

thx!

@zrezke
Copy link
Contributor Author

zrezke commented Oct 16, 2023

Required a bit of force :), should be good now, thanks for the cleanup and fixes! 🤞

@zrezke
Copy link
Contributor Author

zrezke commented Oct 16, 2023

I added the example I posted above as examples/python/nv12. Didn't upload a picture to the readme tho, not sure how or if I can do that.

@Wumpf
Copy link
Member

Wumpf commented Oct 16, 2023

I think you can just kill the fromtatter line with the link that linkinator complained about 🤷

@Wumpf
Copy link
Member

Wumpf commented Oct 16, 2023

ah I see this thing is giving you a hard time, sorry about that 😅
There's a few shortcuts for the python stuff in the justfile:

just py-format
just py-lint

(the latter only shows issues, doesn't fix them yet)

And the other thing that probably bites you now is that codegen is now no longer automatic, so you need to manually run cargo codegen :/

edit: Ah the manual codegen thing hasn't landed yet. Anyways, it's a good habit already; things moved around there a bit...

@zrezke
Copy link
Contributor Author

zrezke commented Oct 16, 2023

Thanks for the commands, I also had to setup my venv with requirements-lint.txt... I installed llvm and flatbuffers compiler like it's specified in the ci_docker/Dockerfile but the cargo codegen doesn't seem to generate any diff on my machine

@Wumpf
Copy link
Member

Wumpf commented Oct 16, 2023

but the cargo codegen doesn't seem to generate any diff on my machine

having the same issue Oo. Looking into it!

@Wumpf
Copy link
Member

Wumpf commented Oct 16, 2023

ah ofc that ci step runs against the codegenerator on main... and things are moving fast..

Tried again to push to the PR to no avail. I think this is an option one needs to opt-in on creating the PR.

Fixes:

that should be all unless main changes again :/

@Wumpf
Copy link
Member

Wumpf commented Oct 16, 2023

yeaaaah that merge was also too old again as well. Somebody again merged stuff meanwhile to main that changes the codegen's output...
image

@Wumpf
Copy link
Member

Wumpf commented Oct 16, 2023

... if everything else goes green I'll just merge this and fix the issue on main then.

@Wumpf Wumpf merged commit a363a6c into rerun-io:main Oct 16, 2023
27 of 28 checks passed
@Wumpf
Copy link
Member

Wumpf commented Oct 16, 2023

🎉

@zrezke
Copy link
Contributor Author

zrezke commented Oct 16, 2023

Awesome! 💯

Wumpf added a commit that referenced this pull request Oct 17, 2023
### What

Small mistake (maybe merge error) that slipped into #3541 


before:

![image](https://github.com/rerun-io/rerun/assets/1220815/0d3e94d7-6a8d-4171-9ddf-4924f61ce5a1)

After:
<img width="488" alt="image"
src="https://github.com/rerun-io/rerun/assets/1220815/00cb7c3f-46be-437d-b4ec-d67e7ee5819b">
(actually curious it says depth 4 here; that must be a bug on the
logging side though. Same behavior [as
before](https://demo.rerun.io/pr/3838/examples/structure_from_motion/)!)

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3885) (if
applicable)
* [x] The PR title and labels are set such as to maximize their
usefulness for the next release's CHANGELOG

- [PR Build Summary](https://build.rerun.io/pr/3885)
- [Docs
preview](https://rerun.io/preview/d99315e47a273c4dfef158a34fc806005c2f51e8/docs)
<!--DOCS-PREVIEW-->
- [Examples
preview](https://rerun.io/preview/d99315e47a273c4dfef158a34fc806005c2f51e8/examples)
<!--EXAMPLES-PREVIEW-->
- [Recent benchmark results](https://ref.rerun.io/dev/bench/)
- [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
@Wumpf Wumpf removed this from the Triage milestone Jan 16, 2024
@oxkitsune oxkitsune mentioned this pull request Jan 20, 2024
4 tasks
Wumpf added a commit that referenced this pull request Jan 25, 2024
<!--
Open the PR up as a draft until you feel it is ready for a proper
review.

Do not make PR:s from your own `main` branch, as that makes it difficult
for reviewers to add their own fixes.

Add any improvements to the branch as new commits to make it easier for
reviewers to follow the progress. All commits will be squashed to a
single commit once the PR is merged into `main`.

Make sure you mention any issues that this PR closes in the description,
as well as any other related issues.

To get an auto-generated PR description you can put "copilot:summary" or
"copilot:walkthrough" anywhere.
-->

### What
This PR Introduces support for YUV422/YUY2 encoded images based on
#3541.

<img width="1117" alt="Image showcasing RGB, NV12 and YUV422 encoded
images"
src="https://github.com/rerun-io/rerun/assets/14833076/5ead1748-f765-4d68-b349-b920a89a6512">

The raw (encoded) image data is stored in an R8Uint texture. The
contents of the texture get decoded in the `rectangle_fs.wgsl` via the
decoder written in `decodings.wgsl`.

Other
[YUV](https://gist.github.com/Jim-Bar/3cbba684a71d1a9d468a6711a6eddbeb)
image formats could be added easily, following the NV12 implementation
with slight modifications to the decoder.

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested the web demo (if applicable):
* Using newly built examples:
[app.rerun.io](https://app.rerun.io/pr/4877/index.html)
* Using examples from latest `main` build:
[app.rerun.io](https://app.rerun.io/pr/4877/index.html?manifest_url=https://app.rerun.io/version/main/examples_manifest.json)
* Using full set of examples from `nightly` build:
[app.rerun.io](https://app.rerun.io/pr/4877/index.html?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json)
* [x] The PR title and labels are set such as to maximize their
usefulness for the next release's CHANGELOG

- [PR Build Summary](https://build.rerun.io/pr/4877)
- [Docs
preview](https://rerun.io/preview/3f787b44ebb4a566224aba821024b27214245575/docs)
<!--DOCS-PREVIEW-->
- [Examples
preview](https://rerun.io/preview/3f787b44ebb4a566224aba821024b27214245575/examples)
<!--EXAMPLES-PREVIEW-->
- [Recent benchmark results](https://build.rerun.io/graphs/crates.html)
- [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)

---------

Co-authored-by: Harold <[email protected]>
Co-authored-by: Harold Ruiter <[email protected]>
Co-authored-by: Andreas Reich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request include in changelog 🔺 re_renderer affects re_renderer itself 📺 re_viewer affects re_viewer itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants