Orientation metadata for VideoFrame #351

sandersdan · 2021-08-30T19:17:24Z

The first attempt at orientation metadata was paused due to a lack of agreement about the best way to represent it. This has been partly resolved now that a representation for color space (#47) has been selected.

Background:

VideoFrames have rotation and flip properties (together "orientation") that describe the transformation from the raw pixels produced from readTo() to the intended rendering.
Most implementations restrict this to the four 90 degree rotations and a flipY/mirrored flag, or equivalently the EXIF orientation tag which encodes a rotation and any flips in a single number. Some implementations don't consider flips at all (flips are rare except as a hardcoded parameter of texture upload).
We already decided to account for orientation in the CanvasImageSource representation (createImageBitmap(videoFrame) Semantics #159). Exposing these properties remains necessary for applications to correctly interpret readTo() and to configure new VideoFrame(BufferSource) frames. (In theory the implementation could reorient the readTo() data to avoid the problem, but Chrome's implementation does not currently do so.)
Separating rotation from flip metadata can create confusion because the operations are not commutative. EXIF orientation by comparison is unambiguous but not as widely familiar.
There is also ambiguity with respect to coded size, visibleRect, and display size.

Open questions:

Should we separate rotation from flip metadata? Provide both representations?
- I lean toward orientation only to minimize ambiguity, but recognize that this could introduce translation errors.
EXIF orientations do not have names, just numbers and descriptions, so if we use orientations we would need to create our own names.
- Internally Chrome names these like kOriginTopLeft, kOriginLeftTop, ..., which seems like a reasonable approach (perhaps we could use eg. origin = "left-top".) It is ambiguous about row vs column ordering but errors there are likely to be obvious.
How to define sizing.
- I think coded size must be the unrotated data, and it follows that visibleRect should be the same. CanvasImageSource uses only care about the display size, so I think that should be the rotated version. (Note: this would require a tweak to our definition of the cropping constructor.)

The text was updated successfully, but these errors were encountered:

chcunningham · 2021-11-17T17:34:38Z

EC: Important gap to properly reimplement <video> and <img>

sandersdan · 2022-10-20T17:58:09Z

I'll prepare a PR that adds a new orientation property, with values like top-left, left-top, etc. in the style of EXIF orientations.

bc-lee · 2022-11-03T01:40:08Z

This feature is very useful for mediacapture-transform (MediaStreamTrack Insertable Media Processing using Streams) and bridges the feature gap between the native code and the browser.

sandersdan · 2022-11-16T18:09:17Z

This feature is very useful for mediacapture-transform (MediaStreamTrack Insertable Media Processing using Streams)

Which parts in particular? Is this simply being able to pass the orientation through a VideoDecoderConfig, or do you also need to be able to configure or inspect the individual frames?

bc-lee · 2022-11-17T01:20:28Z

This is because mediacapture-transform allows us to process each video frame as WebCodec's VideoFrame and send them to WebRTC inputs.

A simple example is as follows: example, code

If the VideoFrame has orientation metadata, mediacapture-transform can rotate the video at no cost. (in terms of copying and assigning). I'm not sure about flip/mirroring. (WebRTC's VideoFrame has a rotation field, but nothing related to flip.)

sandersdan · 2023-04-20T18:58:05Z

I've just gotten back to this issue, and my opinion of the best path forward has changed (I now prefer making orientation an unexposed implementation detail). I'll outline both proposals here for discussion.

Proposal: Unexposed Implementation Detail

In this approach, frame orientation is considered to be an implementation detail. If the underlying frame resource has an orientation, the VideoFrame implementation will act as though the transformation has been applied. This means that codedWidth, codedHeight, and visibleRect are measured in coordinates matching the rendering orientation, consistent with displayWidth and displayHeight. copyTo() implementations would need to apply the transformations, so that the produced data is in the rendering orientation. Similarly, encoder implementations would need to apply the transformation so that the encoded frames are in the default orientation.

Pros

Applications do not need to concern themselves with orientation metadata. They do not need to be aware that for example, capture frames can be rotated, and the obvious code will just work.
There is consistency between frame metadata, copyTo() behavior, rendering, and encoding.

Cons

Applying orientation transformations can be slower than not applying them, which could be undesirable for applications that could operate correctly on oriented frames. (There are future extensions that could mitigate this.)
If we add a memory mapping API in the future, it will need its own orientation metadata.
The correct behavior of image-orientation: none on <canvas> is unclear.

Future Extensions

We can add orientation configuration to VideoDecoderConfig, so that applications can pass orientation metadata from the container and produce compatible VideoFrames. Logically this metadata would be interpreted as a transformation to apply to the decoded content, though the implementation is likely just tagging the physical frames.
We can add orientation configuration to new VideoFrame(), providing a way to change the orientation of frames in a way that may be low-cost for some implementations.
We can add orientation configuration to VideoEncoderConfig, allowing application to request oriented encoding. If this orientation happens to match the physical frames, then no transformation step needs to be applied while encoding.

Proposal: Exposed Metadata

In this approach, frame orientation is exposed on VideoFrame, as orientation ("none" or "flipY") and rotation (0, 90, 180, or 270 in degrees clockwise). codedWidth, codedHeight, and visibleRect are in physical coordinates while displayWidth and displayHeight remain in the rendering orientation. copyTo() operates on the physical orientation, as does video encoding.

While the set of extensions is the same, it makes sense to prioritize them since average applications are likely to desire those features immediately.

Pros

It is much simpler to implement.
Performance is never impacted.
A future memory mapping API requires no new interpretation.
The behavior of image-orientation: none has an obvious choice.

Cons

Applications will need to be orientation-aware, or risk unexpected results when oriented frames are encountered.
There is inconsistency in the behavior of copyTo() compared to rendering.

dalecurtis · 2023-04-20T19:42:33Z

cc: @youennf @aboba @padenot

marcello3d · 2023-04-20T22:09:42Z

As an interested lurker, is the orientation information encoded in individual video chunks or only part of the container?

If it's the latter, how would the "Unexposed Implementation Detail" approach work in VideoDecoder? It seems like you would at least need the configuration option in your first "future extension" bullet.

sandersdan · 2023-04-21T17:03:43Z

As an interested lurker, is the orientation information encoded in individual video chunks or only part of the container?

There is variability here between codecs, but typically container metadata is the primary source.

This is the first listed extension, adding orientation metadata to VideoDecoderConfig. I would be proposing the same format there, orientation ("none" or "flipY") and rotation (0, 90, 180, or 270).

marcello3d · 2023-04-21T21:56:46Z

So how would it work before that extension is added?

sandersdan · 2023-04-21T22:08:03Z

So how would it work before that extension is added?

The same as right now, which is to say that you have to manage orientation metadata yourself and apply the rendering transformation yourself.

sandersdan · 2023-05-30T16:49:56Z

This was discussed at the Media WG meeting on May 30 (minutes). In general the "Unexposed Implementation Detail" approach was supported, and there was also support for adding some extensions immediately.

Some specific cases that came up for discussion:

When encoding for WebRTC, it would be preferable to preserve the capture orientation to reduce computation. It should be possible to configure the encoder to do that, and also to know what the orientation is for higher-level signaling.
When the orientation changes (eg. a phone is rotated), this looks like a change in size. If applications do not manually reconfigure encoding, then the current spec text requires encoders to scale the new frames to the old size.
When processing frames, orientation metadata would be lost. Some operations (eg. background blur) may be more efficient if the metadata were exposed instead. (Note: image-orientation: none preserves orientation for images drawn to <canvas>, but there is no matching metadata to predict its behavior.)

tiuvi · 2024-09-05T06:56:48Z

This has been open for a long time, orientation metadata is needed, any solution?
For example, I can detect the width of the video that matches the width of the frame, but I can only deduce that it is rotated, I need to know if 90º up or down.
I could use ffmpeg or mp4box, but both solutions to load a video use a lot of memory space.

After the transformation

sandersdan · 2024-09-16T18:22:45Z

I have realized a flaw in the proposal above: it's not possible to hide the underlying orientation of 4:2:2 content, since there is no pixel format that describes 4:2:2 rotated by 90 degrees.

The alternatives seem to be exposing the underlying orientation or forcing implementations to convert rotated frames (eg. to 4:4:4).

sandersdan · 2024-09-18T20:56:02Z

Given the above issue with the existing proposal, I have prepared a new proposal similar to the "Exposed Metadata" option. The main difference is that I no longer use ImageOrientation-like orientation, and instead use a boolean flip, which is inspired by the CSS3 image-orientation syntax for explicit rotations.

Proposal: Exposed Metadata (Take 2)

In this approach, the transformation from the output of copyTo() to the intended rendering orientation is exposed as two attributes on VideoFrame:

rotation: 0, 90, 180, or 270 (in degrees clockwise)
flip: false or true (whether horizontally mirrored after rotation)

Where the output of copyTo() is interpreted to be in top-left (EXIF) orientation before the transformation is applied.

Alternatives

Rename flip to flipped, mirror, or mirrored. I went with the closest to the CSS spec.
Use EXIF orientations (top-left, left-bottom, ...). Probably not convenient to use.
Use DOMMatrix. Integrates with some canvas operations but otherwise less convenient to use.

VideoFrame Constructors

VideoFrame constructors gain rotation and flip options, which are applied to the source image. In the case that the source image is a VideoFrame, this means that leaving the orientation unchanged is always {rotation: 0, flip: false}.

I would recommend using the same numeric handling as CSS specifies for image-orientation, which means accepting arbitrary (even negative) rotations and snapping them to the nearest right angle.

I would not require that implementations produce frames that have exactly the specified orientation values. Instead it would also be valid to apply the transformation to the underlying frame resource (the same as how we handle cropping).

Alternatives

We might also want to add imageOrientation, copied from ImageBitmapOptions, which can be used to discard the source image orientation if desired.

VideoDecoder

The same rotation and flip options are added to VideoDecoderConfig, and affect output frames.

VideoEncoder

The strictest option is to add rotation and flip options to VideoEncoderConfig, and then reject any VideoFrames that mismatch. This forces applications to reconfigure at orientation changes.

Alternatives

Encoders could ignore frame orientation entirely, as if they were configured with image-orientation: "none". (This is what Chrome does today.)

padenot · 2024-09-27T16:39:19Z

I'll note in passing that all browser do things differently in the area: w3c/csswg-drafts#4666
https://mcc.id.au/2020/image-orientation/ (test-case).

It's possible to also disable the orientation on the thing "receiving" the frame, and it should be somewhat consistent.

alvestrand · 2024-09-27T19:27:39Z

Note that if frames end up being sent by WebRTC, WebRTC needs access to the metadata indicating rotation in order to set the CVO header extension correctly.

https://www.rfc-editor.org/rfc/rfc7742.html#section-4 is the place where support is required.

This change implements `rotation` and `flip` options in `VideoFrameInit` and `VideoFrameBufferInit`, as described in w3c/webcodecs#351 (comment). Bug: 40243431 Change-Id: I9eae4f4f101df7a285abd6575f7271b7589a512c

This change implements `rotation` and `flip` options in `VideoFrameInit` and `VideoFrameBufferInit`, as described in w3c/webcodecs#351 (comment). Bug: 40243431 Change-Id: I9eae4f4f101df7a285abd6575f7271b7589a512c Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5939863 Reviewed-by: Eugene Zemtsov <[email protected]> Commit-Queue: Eugene Zemtsov <[email protected]> Cr-Commit-Position: refs/heads/main@{#1379353}

…ptions on VideoFrame, a=testonly Automatic update from web-platform-tests [webcodecs] Add rotation and flip init options on VideoFrame This change implements `rotation` and `flip` options in `VideoFrameInit` and `VideoFrameBufferInit`, as described in w3c/webcodecs#351 (comment). Bug: 40243431 Change-Id: I9eae4f4f101df7a285abd6575f7271b7589a512c Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5939863 Reviewed-by: Eugene Zemtsov <[email protected]> Commit-Queue: Eugene Zemtsov <[email protected]> Cr-Commit-Position: refs/heads/main@{#1379353} -- wpt-commits: a525424ff744bca3cb7e11c8aa19127c95a60cb0 wpt-pr: 49013

tangobravo mentioned this issue Oct 31, 2021

Video data rotation should be explicit w3c/mediacapture-transform#65

Open

chcunningham added extension Interface changes that extend without breaking. p1 labels Nov 17, 2021

sandersdan mentioned this issue May 9, 2022

[VideoDecoder] Frames are rotated when the video is in portrait mode / vertical-oriented #490

Closed

yjbanov mentioned this issue Mar 7, 2023

[canvaskit] read pixels back in Picture.toImage flutter/engine#40004

Merged

Djuffin added the TPAC2024 For discussion at TPAC 2024 label Sep 19, 2024

sandersdan mentioned this issue Oct 14, 2024

Add orientation to VideoFrame #840

Merged

chromium-wpt-export-bot mentioned this issue Nov 6, 2024

[webcodecs] Add rotation and flip init options on VideoFrame web-platform-tests/wpt#49013

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orientation metadata for VideoFrame #351

Orientation metadata for VideoFrame #351

sandersdan commented Aug 30, 2021 •

edited

Loading

chcunningham commented Nov 17, 2021

sandersdan commented Oct 20, 2022

bc-lee commented Nov 3, 2022

sandersdan commented Nov 16, 2022

bc-lee commented Nov 17, 2022

sandersdan commented Apr 20, 2023

dalecurtis commented Apr 20, 2023

marcello3d commented Apr 20, 2023

sandersdan commented Apr 21, 2023 •

edited

Loading

marcello3d commented Apr 21, 2023

sandersdan commented Apr 21, 2023

sandersdan commented May 30, 2023 •

edited by chrisn

Loading

tiuvi commented Sep 5, 2024

sandersdan commented Sep 16, 2024 •

edited

Loading

sandersdan commented Sep 18, 2024

padenot commented Sep 27, 2024

alvestrand commented Sep 27, 2024

Orientation metadata for VideoFrame #351

Orientation metadata for VideoFrame #351

Comments

sandersdan commented Aug 30, 2021 • edited Loading

chcunningham commented Nov 17, 2021

sandersdan commented Oct 20, 2022

bc-lee commented Nov 3, 2022

sandersdan commented Nov 16, 2022

bc-lee commented Nov 17, 2022

sandersdan commented Apr 20, 2023

Proposal: Unexposed Implementation Detail

Pros

Cons

Future Extensions

Proposal: Exposed Metadata

Pros

Cons

dalecurtis commented Apr 20, 2023

marcello3d commented Apr 20, 2023

sandersdan commented Apr 21, 2023 • edited Loading

marcello3d commented Apr 21, 2023

sandersdan commented Apr 21, 2023

sandersdan commented May 30, 2023 • edited by chrisn Loading

tiuvi commented Sep 5, 2024

sandersdan commented Sep 16, 2024 • edited Loading

sandersdan commented Sep 18, 2024

Proposal: Exposed Metadata (Take 2)

Alternatives

VideoFrame Constructors

Alternatives

VideoDecoder

VideoEncoder

Alternatives

padenot commented Sep 27, 2024

alvestrand commented Sep 27, 2024

sandersdan commented Aug 30, 2021 •

edited

Loading

sandersdan commented Apr 21, 2023 •

edited

Loading

sandersdan commented May 30, 2023 •

edited by chrisn

Loading

sandersdan commented Sep 16, 2024 •

edited

Loading