-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support NV12-encoded images #3541
Conversation
Example with example image: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my, this is amazing. Thank you so much for following up and coming back with this <3
Overall looks really good already. The main two points that I think we need to address is what I wrote on gamma and finding a way to have format not embed a size.
It would also be really nice to ship with some example so we can test and/or demonstrate this. Don't have NV12 data at hand right now and would like to play around it myself. Do you know of a decent data source that we could download? (edit: oh, you posted one!) I suppose there should be a way to have opencv to convert image stream from a webcam to NV12 as a toy example (haven't checked yet)
Relatedly, did you get a chance to test it on the web? (I doubt this will cause any issues but any time we touch the shaders I'm not so sure)
If not I can look after that next week, no worries :)
It does highlight the issues we have with extensibility in this area but fixing that is all out of scope and I think you navigated around the existing limitations very well (:
In any case landing this PR will have to wait for 0.9 release which will happen mid next week - we have a lot of stuff changing in this release as you might have noticed and I'm hesitant to add more things that can go wrong, hope you don't mind too much.
@@ -73,4 +76,5 @@ union TensorBuffer ( | |||
F32: F32Buffer (transparent, order:1000), | |||
F64: F64Buffer (transparent, order:1200), | |||
JPEG: JPEGBuffer (transparent, order:1300), | |||
NV12: NV12Buffer (transparent, order:1400), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one warrants some documentation on how the data is layed out. I wouldn't mind though if it forwards to wikipedia or the gist (https://gist.github.com/Jim-Bar/3cbba684a71d1a9d468a6711a6eddbeb) you added to the pr description for the exact details.
rerun_py/rerun_sdk/rerun/_image.py
Outdated
|
||
name = "NV12" | ||
|
||
def __init__(self, width: int | None = None, height: int | None = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not put size information into the image format. Instead the expected shape for NV12 tensors should be documented. To allow the convenient automatic reshaping as down below in response to NV12 we can instead add a size hint parameter to a custom init method of Tensor and do the necessary reshaping & validation there in response. Let me know if you want help/clarification with this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Wumpf If I understand correctly this would look something like this:
In the ImageEncoded component class the init method would accept the following as contents:
contents: bytes | IO[bytes] | npt.NDArray | rr.Tensor | None = None,
npt.NDArray to allow for np arrays (because bytes don't have shapes) and rr.Tensor to leverage an updated TensorExt init which accepts a size hint to allow for an automatic resize. Now I am not too sure how an automatic resize would look like on a tensor, since it's kind of a general purpose datatype: did you have a specific api in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at things a bit closer I realize that the ImageEncoded
api is a little bit in flux actually (and log_image_file
which you used in the PR description will be removed on 0.10) and I got the entries of the whole thing wrong. Also I didn't give it much thought that you wanted to pass in just bytes which totally makes sense here.
I'm actually not sure if we want ImageEncoded
to mean only file formats (status quo) or also anything that isn't rgb (like used here now) in the long run 🤔 ...
Anyways, let's not get into a rabbit hole here and keep it simple, sticking with ImageEncoded
.
What do you think of this (incomplete diff against main):
diff --git a/rerun_py/rerun_sdk/rerun/_image.py b/rerun_py/rerun_sdk/rerun/_image.py
index 65ca974eb..85738a992 100644
--- a/rerun_py/rerun_sdk/rerun/_image.py
+++ b/rerun_py/rerun_sdk/rerun/_image.py
@@ -3,11 +3,13 @@ from __future__ import annotations
import io
import pathlib
from enum import Enum
-from typing import IO, Iterable
+from typing import IO, Iterable, Sequence
import numpy as np
from PIL import Image as PILImage
+from rerun.error_utils import _send_warning_or_raise
+
from ._log import AsComponents, ComponentBatchLike
from .archetypes import Image
from .components import DrawOrderLike, TensorData
@@ -53,6 +55,7 @@ class ImageEncoded(AsComponents):
contents: bytes | IO[bytes] | None = None,
format: ImageFormat | None = None,
draw_order: DrawOrderLike | None = None,
+ size_hint: Sequence[int] | None = None,
) -> None:
"""
Create a new image with a given format.
@@ -72,6 +75,10 @@ class ImageEncoded(AsComponents):
An optional floating point value that specifies the 2D drawing
order. Objects with higher values are drawn on top of those with
lower values.
+ size_hint:
+ An optional tuple/list of (width, height) that specifies the image size.
+ This is *required* for formats that do not implicitly specify their size
+ and is ignored for formats that do (all file formats).
"""
if (path is None) == (contents is None):
raise ValueError("Must provide exactly one of 'path' or 'contents'")
@@ -89,6 +96,11 @@ class ImageEncoded(AsComponents):
if format is not None:
formats = (str(format),)
+ if isinstance(format, NV12):
+ if size_hint is None:
+ raise ValueError("Must provide size_hint for NV12 images")
+ elif size_hint is not None:
+ _send_warning_or_raise(f"size_hint is ignored for images with {format}")
else:
formats = None
Not so sure about size_hint
as name. And admittedly this is still rather nasty since now Nv12 is the one and only format that needs it and everything else just ignores the size :/. But I think that's better to have some of the formats embed a size and some not (if we keep on this path there's gonna be more of them that follow the Nv12 pattern).
It's pretty clear that we'll have to refactor this again later, but would good to have some path forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea let's not complicate it too much. Hear me out: I think embedding the size hint along with the color_standard would make a lot of sense here. I think this will be a better UX, since you wouldn't be able to try and log an NV12 image without supplying it's size + I think it's also a nice place for specifying the color standard used... These are all NV12 specific right now, that's why I'd like to keep these fields hidden from the general ImageEncoded api...
class ColorStandard(Enum):
Bt601 = "Bt601"
Bt709 = "Bt709"
class NV12(ImageFormat):
"""NV12 format."""
name = "NV12"
size_hint: Tuple[int, int]
color_standard: ColorStandard
def __init__(self, size_hint: Tuple[int, int], color_standard: ColorStandard = ColorStandard.Bt601) -> None:
"""
An NV12 encoded image.
Parameters
----------
size_hint:
(height, width), the RGB size of the image
color_standard:
rr.ColorStandard (default: rr.ColorStandard.Bt601),
the color standard used to decode the image.(ref. https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion)
"""
self.size_hint = size_hint
self.color_standard: color_standard
I am biased because I wrote this code, but I really like the way this comes out as an API, as long as you are using a linter.
This allows for:
rr.log("NV12", contents=byte_contents, format=rr.ImageFormat.NV12((1080, 1920), rr.ColorFormat.Bt601))
Instead of:
rr.log("NV12", contents=byte_contents, format=rr.ImageFormat.NV12, size_hint, color_format)
The latter doesn't look bad for NV12 either, but I think when you are logging let's say a JPEG and see the size_hint and color_format fields they may confuse you...
In addition, I think being able to pass a np.array
to ImageEncoded in combination with bytes, could also be a very welcome future improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apologies for the reply delay!
Checked with some other folks on the team as well and concluded you're right, the ergonomics of this are way better :)
One thing thing though, it would be nice to put ColorStandard
as a separate component on image similar to how DepthMeter is done on depth image. You'd in any case run into problems of how to pipe it through otherwise. (and yes we'd just warn for Bt601 on anything but NV12, since effectively we're assuming BT709/srgb everywhere and don't want to implement the conversion right now)
But I realize that this is another big change on top to throw at you and given how big the pr is already I'd even prefer if you just hardcode the primaries that you need (maybe with a note pointing here) and we'll ship that, adding ColorStandard later :)
Meaning we're landing this file here pretty much as is 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I think this means we're good to go then unless you have something more to add?
on my side I need yet to do a quick run on web and maybe add a "test" to the test_tensor.py
but otherwise this can go to 0.10 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to hear that! Let's . From my side, I could add a simple example that logs NV12 frames from a webcam using cv2 (frames are converted from BGR -> NV12 on host), as you thought of somewhere in this discussion:
#!/usr/bin/env python3
"""
Stream NV12 images from a webcam.
Run:
```sh
pip install -r examples/python/nv12/requirements.txt
python examples/python/nv12/main.py
```
"""
from __future__ import annotations
import argparse
import rerun as rr # pip install rerun-sdk
import cv2
import numpy as np
import time
def bgr2nv12(bgr: np.ndarray) -> np.ndarray:
yuv = cv2.cvtColor(bgr, cv2.COLOR_BGR2YUV_I420)
uv_row_cnt = yuv.shape[0] // 3
uv_plane = np.transpose(yuv[uv_row_cnt * 2 :].reshape(2, -1), [1, 0])
yuv[uv_row_cnt * 2 :] = uv_plane.reshape(uv_row_cnt, -1)
return yuv
def main() -> None:
parser = argparse.ArgumentParser(description="Example of using the Rerun visualizer to display NV12 images.")
rr.script_add_args(parser)
parser.add_argument(
"-t",
"--timeout",
type=float,
default=5,
help="Timeout in seconds, after which the script will stop streaming frames.",
)
args = parser.parse_args()
rr.script_setup(args, "NV12 image example")
cap = cv2.VideoCapture(0)
if not cap.isOpened():
raise RuntimeError("This example requires a webcam.")
start_time = time.time()
print(f"Started streaming NV12 images for {args.timeout} seconds.")
while start_time + args.timeout > time.time():
ret, frame = cap.read()
if not ret:
time.sleep(0.01)
continue
rr.log(
"NV12",
rr.ImageEncoded(
contents=bytes(bgr2nv12(frame)),
format=rr.ImageFormat.NV12((frame.shape[0], frame.shape[1])),
),
)
time.sleep(0.01)
rr.script_teardown(args)
if __name__ == "__main__":
main()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and I'd really aprreciate it if you tested for web, thanks!
If you merge in latest |
Co-authored-by: Andreas Reich <[email protected]>
Co-authored-by: Andreas Reich <[email protected]>
…ix gamma correction, fix shader clamping errors. Added a reference for the YUV2sRGB conversion coefficients. Made the image_height_width_channels TensorBuffer matching exhaustive.
@@ -266,6 +337,22 @@ tensor_type!(arrow2::types::f16, F16); | |||
tensor_type!(f32, F32); | |||
tensor_type!(f64, F64); | |||
|
|||
// Manual expension of tensor_type! macro for `u8` types. We need to do this, because u8 can store encoded data | |||
impl<'a> TryFrom<&'a TensorData> for ::ndarray::ArrayViewD<'a, u8> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this covers how to go from TensorData -> ArrayView, but the other way around is not defined now, causing a bunch of tests to fail compiling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm splitting the macro up to fix it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to go!
Oh looks like my push didn't work :O
Required a bit of force :), should be good now, thanks for the cleanup and fixes! 🤞 |
I added the example I posted above as |
I think you can just kill the fromtatter line with the link that linkinator complained about 🤷 |
ah I see this thing is giving you a hard time, sorry about that 😅 just py-format
just py-lint (the latter only shows issues, doesn't fix them yet) And the other thing that probably bites you now is that codegen is now no longer automatic, so you need to manually run edit: Ah the manual codegen thing hasn't landed yet. Anyways, it's a good habit already; things moved around there a bit... |
Thanks for the commands, I also had to setup my venv with requirements-lint.txt... I installed llvm and flatbuffers compiler like it's specified in the |
having the same issue Oo. Looking into it! |
... if everything else goes green I'll just merge this and fix the issue on main then. |
🎉 |
Awesome! 💯 |
### What Small mistake (maybe merge error) that slipped into #3541 before: ![image](https://github.com/rerun-io/rerun/assets/1220815/0d3e94d7-6a8d-4171-9ddf-4924f61ce5a1) After: <img width="488" alt="image" src="https://github.com/rerun-io/rerun/assets/1220815/00cb7c3f-46be-437d-b4ec-d67e7ee5819b"> (actually curious it says depth 4 here; that must be a bug on the logging side though. Same behavior [as before](https://demo.rerun.io/pr/3838/examples/structure_from_motion/)!) ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3885) (if applicable) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG - [PR Build Summary](https://build.rerun.io/pr/3885) - [Docs preview](https://rerun.io/preview/d99315e47a273c4dfef158a34fc806005c2f51e8/docs) <!--DOCS-PREVIEW--> - [Examples preview](https://rerun.io/preview/d99315e47a273c4dfef158a34fc806005c2f51e8/examples) <!--EXAMPLES-PREVIEW--> - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
<!-- Open the PR up as a draft until you feel it is ready for a proper review. Do not make PR:s from your own `main` branch, as that makes it difficult for reviewers to add their own fixes. Add any improvements to the branch as new commits to make it easier for reviewers to follow the progress. All commits will be squashed to a single commit once the PR is merged into `main`. Make sure you mention any issues that this PR closes in the description, as well as any other related issues. To get an auto-generated PR description you can put "copilot:summary" or "copilot:walkthrough" anywhere. --> ### What This PR Introduces support for YUV422/YUY2 encoded images based on #3541. <img width="1117" alt="Image showcasing RGB, NV12 and YUV422 encoded images" src="https://github.com/rerun-io/rerun/assets/14833076/5ead1748-f765-4d68-b349-b920a89a6512"> The raw (encoded) image data is stored in an R8Uint texture. The contents of the texture get decoded in the `rectangle_fs.wgsl` via the decoder written in `decodings.wgsl`. Other [YUV](https://gist.github.com/Jim-Bar/3cbba684a71d1a9d468a6711a6eddbeb) image formats could be added easily, following the NV12 implementation with slight modifications to the decoder. ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using newly built examples: [app.rerun.io](https://app.rerun.io/pr/4877/index.html) * Using examples from latest `main` build: [app.rerun.io](https://app.rerun.io/pr/4877/index.html?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [app.rerun.io](https://app.rerun.io/pr/4877/index.html?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG - [PR Build Summary](https://build.rerun.io/pr/4877) - [Docs preview](https://rerun.io/preview/3f787b44ebb4a566224aba821024b27214245575/docs) <!--DOCS-PREVIEW--> - [Examples preview](https://rerun.io/preview/3f787b44ebb4a566224aba821024b27214245575/examples) <!--EXAMPLES-PREVIEW--> - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) --------- Co-authored-by: Harold <[email protected]> Co-authored-by: Harold Ruiter <[email protected]> Co-authored-by: Andreas Reich <[email protected]>
Added support for logging NV12 encoded images via the existing log_image_file api.
As promissed a while back :)
The raw (encoded) image data is stored in an R8Uint texture. The contents of the texture get decoded in the
rectangle_fs.wgsl
via the decoder written indecodings.wgsl
.Other YUV image formats could be added easily, following the NV12 implementation with slight modifications to the decoder.
Checklist