Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orientation metadata for VideoFrame #351

Open
sandersdan opened this issue Aug 30, 2021 · 17 comments
Open

Orientation metadata for VideoFrame #351

sandersdan opened this issue Aug 30, 2021 · 17 comments
Labels
extension Interface changes that extend without breaking. p1 TPAC2024 For discussion at TPAC 2024

Comments

@sandersdan
Copy link
Contributor

sandersdan commented Aug 30, 2021

The first attempt at orientation metadata was paused due to a lack of agreement about the best way to represent it. This has been partly resolved now that a representation for color space (#47) has been selected.

Background:

  • VideoFrames have rotation and flip properties (together "orientation") that describe the transformation from the raw pixels produced from readTo() to the intended rendering.
  • Most implementations restrict this to the four 90 degree rotations and a flipY/mirrored flag, or equivalently the EXIF orientation tag which encodes a rotation and any flips in a single number. Some implementations don't consider flips at all (flips are rare except as a hardcoded parameter of texture upload).
  • We already decided to account for orientation in the CanvasImageSource representation (createImageBitmap(videoFrame) Semantics #159). Exposing these properties remains necessary for applications to correctly interpret readTo() and to configure new VideoFrame(BufferSource) frames. (In theory the implementation could reorient the readTo() data to avoid the problem, but Chrome's implementation does not currently do so.)
  • Separating rotation from flip metadata can create confusion because the operations are not commutative. EXIF orientation by comparison is unambiguous but not as widely familiar.
  • There is also ambiguity with respect to coded size, visibleRect, and display size.

Open questions:

  • Should we separate rotation from flip metadata? Provide both representations?
    • I lean toward orientation only to minimize ambiguity, but recognize that this could introduce translation errors.
  • EXIF orientations do not have names, just numbers and descriptions, so if we use orientations we would need to create our own names.
    • Internally Chrome names these like kOriginTopLeft, kOriginLeftTop, ..., which seems like a reasonable approach (perhaps we could use eg. origin = "left-top".) It is ambiguous about row vs column ordering but errors there are likely to be obvious.
  • How to define sizing.
    • I think coded size must be the unrotated data, and it follows that visibleRect should be the same. CanvasImageSource uses only care about the display size, so I think that should be the rotated version. (Note: this would require a tweak to our definition of the cropping constructor.)
@chcunningham
Copy link
Collaborator

EC: Important gap to properly reimplement <video> and <img>

@sandersdan
Copy link
Contributor Author

I'll prepare a PR that adds a new orientation property, with values like top-left, left-top, etc. in the style of EXIF orientations.

@bc-lee
Copy link

bc-lee commented Nov 3, 2022

This feature is very useful for mediacapture-transform (MediaStreamTrack Insertable Media Processing using Streams) and bridges the feature gap between the native code and the browser.

@sandersdan
Copy link
Contributor Author

This feature is very useful for mediacapture-transform (MediaStreamTrack Insertable Media Processing using Streams)

Which parts in particular? Is this simply being able to pass the orientation through a VideoDecoderConfig, or do you also need to be able to configure or inspect the individual frames?

@bc-lee
Copy link

bc-lee commented Nov 17, 2022

This is because mediacapture-transform allows us to process each video frame as WebCodec's VideoFrame and send them to WebRTC inputs.

A simple example is as follows: example, code

If the VideoFrame has orientation metadata, mediacapture-transform can rotate the video at no cost. (in terms of copying and assigning). I'm not sure about flip/mirroring. (WebRTC's VideoFrame has a rotation field, but nothing related to flip.)

@sandersdan
Copy link
Contributor Author

I've just gotten back to this issue, and my opinion of the best path forward has changed (I now prefer making orientation an unexposed implementation detail). I'll outline both proposals here for discussion.

Proposal: Unexposed Implementation Detail

In this approach, frame orientation is considered to be an implementation detail. If the underlying frame resource has an orientation, the VideoFrame implementation will act as though the transformation has been applied. This means that codedWidth, codedHeight, and visibleRect are measured in coordinates matching the rendering orientation, consistent with displayWidth and displayHeight. copyTo() implementations would need to apply the transformations, so that the produced data is in the rendering orientation. Similarly, encoder implementations would need to apply the transformation so that the encoded frames are in the default orientation.

Pros

  • Applications do not need to concern themselves with orientation metadata. They do not need to be aware that for example, capture frames can be rotated, and the obvious code will just work.
  • There is consistency between frame metadata, copyTo() behavior, rendering, and encoding.

Cons

  • Applying orientation transformations can be slower than not applying them, which could be undesirable for applications that could operate correctly on oriented frames. (There are future extensions that could mitigate this.)
  • If we add a memory mapping API in the future, it will need its own orientation metadata.
  • The correct behavior of image-orientation: none on <canvas> is unclear.

Future Extensions

  • We can add orientation configuration to VideoDecoderConfig, so that applications can pass orientation metadata from the container and produce compatible VideoFrames. Logically this metadata would be interpreted as a transformation to apply to the decoded content, though the implementation is likely just tagging the physical frames.
  • We can add orientation configuration to new VideoFrame(), providing a way to change the orientation of frames in a way that may be low-cost for some implementations.
  • We can add orientation configuration to VideoEncoderConfig, allowing application to request oriented encoding. If this orientation happens to match the physical frames, then no transformation step needs to be applied while encoding.

Proposal: Exposed Metadata

In this approach, frame orientation is exposed on VideoFrame, as orientation ("none" or "flipY") and rotation (0, 90, 180, or 270 in degrees clockwise). codedWidth, codedHeight, and visibleRect are in physical coordinates while displayWidth and displayHeight remain in the rendering orientation. copyTo() operates on the physical orientation, as does video encoding.

While the set of extensions is the same, it makes sense to prioritize them since average applications are likely to desire those features immediately.

Pros

  • It is much simpler to implement.
  • Performance is never impacted.
  • A future memory mapping API requires no new interpretation.
  • The behavior of image-orientation: none has an obvious choice.

Cons

  • Applications will need to be orientation-aware, or risk unexpected results when oriented frames are encountered.
  • There is inconsistency in the behavior of copyTo() compared to rendering.

@dalecurtis
Copy link
Contributor

cc: @youennf @aboba @padenot

@marcello3d
Copy link

As an interested lurker, is the orientation information encoded in individual video chunks or only part of the container?

If it's the latter, how would the "Unexposed Implementation Detail" approach work in VideoDecoder? It seems like you would at least need the configuration option in your first "future extension" bullet.

@sandersdan
Copy link
Contributor Author

sandersdan commented Apr 21, 2023

As an interested lurker, is the orientation information encoded in individual video chunks or only part of the container?

There is variability here between codecs, but typically container metadata is the primary source.

This is the first listed extension, adding orientation metadata to VideoDecoderConfig. I would be proposing the same format there, orientation ("none" or "flipY") and rotation (0, 90, 180, or 270).

@marcello3d
Copy link

So how would it work before that extension is added?

@sandersdan
Copy link
Contributor Author

So how would it work before that extension is added?

The same as right now, which is to say that you have to manage orientation metadata yourself and apply the rendering transformation yourself.

@sandersdan
Copy link
Contributor Author

sandersdan commented May 30, 2023

This was discussed at the Media WG meeting on May 30 (minutes). In general the "Unexposed Implementation Detail" approach was supported, and there was also support for adding some extensions immediately.

Some specific cases that came up for discussion:

  • When encoding for WebRTC, it would be preferable to preserve the capture orientation to reduce computation. It should be possible to configure the encoder to do that, and also to know what the orientation is for higher-level signaling.
  • When the orientation changes (eg. a phone is rotated), this looks like a change in size. If applications do not manually reconfigure encoding, then the current spec text requires encoders to scale the new frames to the old size.
  • When processing frames, orientation metadata would be lost. Some operations (eg. background blur) may be more efficient if the metadata were exposed instead. (Note: image-orientation: none preserves orientation for images drawn to <canvas>, but there is no matching metadata to predict its behavior.)

@tiuvi
Copy link

tiuvi commented Sep 5, 2024

This has been open for a long time, orientation metadata is needed, any solution?
For example, I can detect the width of the video that matches the width of the frame, but I can only deduce that it is rotated, I need to know if 90º up or down.
I could use ffmpeg or mp4box, but both solutions to load a video use a lot of memory space.

image

After the transformation

image

@sandersdan
Copy link
Contributor Author

sandersdan commented Sep 16, 2024

I have realized a flaw in the proposal above: it's not possible to hide the underlying orientation of 4:2:2 content, since there is no pixel format that describes 4:2:2 rotated by 90 degrees.

The alternatives seem to be exposing the underlying orientation or forcing implementations to convert rotated frames (eg. to 4:4:4).

@sandersdan
Copy link
Contributor Author

Given the above issue with the existing proposal, I have prepared a new proposal similar to the "Exposed Metadata" option. The main difference is that I no longer use ImageOrientation-like orientation, and instead use a boolean flip, which is inspired by the CSS3 image-orientation syntax for explicit rotations.

Proposal: Exposed Metadata (Take 2)

In this approach, the transformation from the output of copyTo() to the intended rendering orientation is exposed as two attributes on VideoFrame:

  • rotation: 0, 90, 180, or 270 (in degrees clockwise)
  • flip: false or true (whether horizontally mirrored after rotation)

Where the output of copyTo() is interpreted to be in top-left (EXIF) orientation before the transformation is applied.

Alternatives

  • Rename flip to flipped, mirror, or mirrored. I went with the closest to the CSS spec.
  • Use EXIF orientations (top-left, left-bottom, ...). Probably not convenient to use.
  • Use DOMMatrix. Integrates with some canvas operations but otherwise less convenient to use.

VideoFrame Constructors

VideoFrame constructors gain rotation and flip options, which are applied to the source image. In the case that the source image is a VideoFrame, this means that leaving the orientation unchanged is always {rotation: 0, flip: false}.

I would recommend using the same numeric handling as CSS specifies for image-orientation, which means accepting arbitrary (even negative) rotations and snapping them to the nearest right angle.

I would not require that implementations produce frames that have exactly the specified orientation values. Instead it would also be valid to apply the transformation to the underlying frame resource (the same as how we handle cropping).

Alternatives

  • We might also want to add imageOrientation, copied from ImageBitmapOptions, which can be used to discard the source image orientation if desired.

VideoDecoder

The same rotation and flip options are added to VideoDecoderConfig, and affect output frames.

VideoEncoder

The strictest option is to add rotation and flip options to VideoEncoderConfig, and then reject any VideoFrames that mismatch. This forces applications to reconfigure at orientation changes.

Alternatives

  • Encoders could ignore frame orientation entirely, as if they were configured with image-orientation: "none". (This is what Chrome does today.)

@Djuffin Djuffin added the TPAC2024 For discussion at TPAC 2024 label Sep 19, 2024
@padenot
Copy link
Collaborator

padenot commented Sep 27, 2024

I'll note in passing that all browser do things differently in the area: w3c/csswg-drafts#4666
https://mcc.id.au/2020/image-orientation/ (test-case).

It's possible to also disable the orientation on the thing "receiving" the frame, and it should be somewhat consistent.

@alvestrand
Copy link

Note that if frames end up being sent by WebRTC, WebRTC needs access to the metadata indicating rotation in order to set the CVO header extension correctly.

https://www.rfc-editor.org/rfc/rfc7742.html#section-4 is the place where support is required.

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Nov 6, 2024
This change implements `rotation` and `flip` options in `VideoFrameInit`
and `VideoFrameBufferInit`, as described in
w3c/webcodecs#351 (comment).

Bug: 40243431
Change-Id: I9eae4f4f101df7a285abd6575f7271b7589a512c
aarongable pushed a commit to chromium/chromium that referenced this issue Nov 6, 2024
This change implements `rotation` and `flip` options in `VideoFrameInit`
and `VideoFrameBufferInit`, as described in
w3c/webcodecs#351 (comment).

Bug: 40243431
Change-Id: I9eae4f4f101df7a285abd6575f7271b7589a512c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5939863
Reviewed-by: Eugene Zemtsov <[email protected]>
Commit-Queue: Eugene Zemtsov <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1379353}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Nov 7, 2024
This change implements `rotation` and `flip` options in `VideoFrameInit`
and `VideoFrameBufferInit`, as described in
w3c/webcodecs#351 (comment).

Bug: 40243431
Change-Id: I9eae4f4f101df7a285abd6575f7271b7589a512c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5939863
Reviewed-by: Eugene Zemtsov <[email protected]>
Commit-Queue: Eugene Zemtsov <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1379353}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Nov 7, 2024
This change implements `rotation` and `flip` options in `VideoFrameInit`
and `VideoFrameBufferInit`, as described in
w3c/webcodecs#351 (comment).

Bug: 40243431
Change-Id: I9eae4f4f101df7a285abd6575f7271b7589a512c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5939863
Reviewed-by: Eugene Zemtsov <[email protected]>
Commit-Queue: Eugene Zemtsov <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1379353}
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Nov 7, 2024
…ptions on VideoFrame, a=testonly

Automatic update from web-platform-tests
[webcodecs] Add rotation and flip init options on VideoFrame

This change implements `rotation` and `flip` options in `VideoFrameInit`
and `VideoFrameBufferInit`, as described in
w3c/webcodecs#351 (comment).

Bug: 40243431
Change-Id: I9eae4f4f101df7a285abd6575f7271b7589a512c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5939863
Reviewed-by: Eugene Zemtsov <[email protected]>
Commit-Queue: Eugene Zemtsov <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1379353}

--

wpt-commits: a525424ff744bca3cb7e11c8aa19127c95a60cb0
wpt-pr: 49013
ErichDonGubler pushed a commit to erichdongubler-mozilla/firefox that referenced this issue Nov 9, 2024
…ptions on VideoFrame, a=testonly

Automatic update from web-platform-tests
[webcodecs] Add rotation and flip init options on VideoFrame

This change implements `rotation` and `flip` options in `VideoFrameInit`
and `VideoFrameBufferInit`, as described in
w3c/webcodecs#351 (comment).

Bug: 40243431
Change-Id: I9eae4f4f101df7a285abd6575f7271b7589a512c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5939863
Reviewed-by: Eugene Zemtsov <[email protected]>
Commit-Queue: Eugene Zemtsov <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1379353}

--

wpt-commits: a525424ff744bca3cb7e11c8aa19127c95a60cb0
wpt-pr: 49013
i3roly pushed a commit to i3roly/firefox-dynasty that referenced this issue Nov 16, 2024
…ptions on VideoFrame, a=testonly

Automatic update from web-platform-tests
[webcodecs] Add rotation and flip init options on VideoFrame

This change implements `rotation` and `flip` options in `VideoFrameInit`
and `VideoFrameBufferInit`, as described in
w3c/webcodecs#351 (comment).

Bug: 40243431
Change-Id: I9eae4f4f101df7a285abd6575f7271b7589a512c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5939863
Reviewed-by: Eugene Zemtsov <[email protected]>
Commit-Queue: Eugene Zemtsov <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1379353}

--

wpt-commits: a525424ff744bca3cb7e11c8aa19127c95a60cb0
wpt-pr: 49013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension Interface changes that extend without breaking. p1 TPAC2024 For discussion at TPAC 2024
Projects
None yet
Development

No branches or pull requests

9 participants