Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add captureTime, receiveTime and rtpTimestamp to VideoFrameMetadata #813

Merged
merged 3 commits into from
Dec 10, 2024

Conversation

guidou
Copy link

@guidou guidou commented Jul 5, 2024

These fields are useful for WebRTC-based applications.
See issue #601

@guidou
Copy link
Author

guidou commented Jul 5, 2024

cc @Djuffin, @padenot , @youennf, @aboba

@aboba
Copy link
Collaborator

aboba commented Jul 8, 2024

Does this PR imply any behavior in WebCodecs API?

For example, on encoding is there an expectation that VideoFrame.captureTime is copied to EncodedVideoChunk.captureTime? Or on decoding is EncodedVideoChunk.receiveTime or EncodedVideoChunk.rtpMetadata to be copied to VideoFrame.receiveTime or VideoFrame.rtpMetadata?

If there are no changes in behavior (e.g. if the attributes don't affect the encode or decode process or some other aspect of WebCodecs) then the attributes could be defined in another specification where behavior is affected (e.g. mediacapture-transform?), and added to the VideoFrame Metadata Registry.

@guidou
Copy link
Author

guidou commented Jul 8, 2024

This PR as currently written does not imply any behavior in the WebCodecs API, although I would expect the things you mentioned (e.g., forwarding them to/from EncodedVideoChunk) as potentially useful.

The idea for this PR is to provide information to applications so that they can do similar things to what they can do with requestVideoFrameCallback (e.g., better A/V sync and delay measurements). This doesn't require any other behavior changes in WebCodecs (at least for applications using mediacapture-transform + WebRTC).

@guidou
Copy link
Author

guidou commented Jul 8, 2024

I think we can specify forwarding to EncodedVideoChunk in a separate PR since this one has value on its own without specifying further changes to WebCodecs.

@aboba aboba requested review from Djuffin and padenot July 8, 2024 19:03
@Djuffin
Copy link
Contributor

Djuffin commented Jul 8, 2024

I used to be skeptical about these timestamps since they are not passed through the encoding-decoding cycle, but since we already have entries in VideoFrame Metadata Registry that don't do that, I think it's okay now.

And RTC software like Teams, Mean and Facetime can really use it for A/V sync and latency estimation, even if they have to pass this information via separate channels. So LGTM

@aboba
Copy link
Collaborator

aboba commented Jul 8, 2024

I agree that this metadata is useful. The question is whether behavior is well specified, so that interop is possible. For example, there is the question of where the metadata orginates:

  1. MAY/SHOULD/MUST the MediaStreamTrackProcessor method provide VideoFrame.captureTimestamp if the MST is obtained from a local capture?
  2. MAY/SHOULD/MUST the MediaStreamTrackProcessor method provide VideoFrame.receiveTimestamp and VideoFrame.rtpMetadata if the MST is obtained remotely via WebRTC-PC?

@Djuffin
Copy link
Contributor

Djuffin commented Jul 8, 2024

I thought for all metadata entries the answer to these questions is MAY.

@aboba
Copy link
Collaborator

aboba commented Jul 9, 2024

@Djuffin MAY might be ok for these metadata fields. However, is alignment of VideoFrame.timestamp and EncodedVideoChunk.timestamp optional for WebCodecs implementations?

@youennf
Copy link
Contributor

youennf commented Jul 9, 2024

I thought for all metadata entries the answer to these questions is MAY.

I agree from a WebCodecs POV.
But it is not sufficient from an interop point of view.
Probably each spec defining a MST video source should describe which metadata it generates, just like each spec defines which constraints are supported by a given source.
Putting the definition at the source ensures the same metadata is exposed via MSTP or via VideoFrame constructor (from a video element).

That would mean mediacapture-main and webrtc-pc here.
As of mediacapture-transform VideoTrackGenerator, nothing seems needed though we could add a note stating that metadata are preserved.

@guidou
Copy link
Author

guidou commented Jul 9, 2024

FWIW, the requestVideoFrameCallback spec where these fields are originally defined say that captureTime applies to local cameras and remote frames (WebRTC), receiveTime to WebRTC frames, and rtpTimestamp to WebRTC frames. But I agree with @youennf that having each MST source spec indicate the metadata it generates is the best way to organize that.

In any case, we need to have entries for these fields in the VideoFrameMetadata registry.

@chrisn
Copy link
Member

chrisn commented Jul 9, 2024

Media WG meets today, please add agenda label if you'd like to discuss.

@aboba aboba added the agenda Add to Media WG call agenda label Jul 9, 2024
@Djuffin
Copy link
Contributor

Djuffin commented Jul 9, 2024

@Djuffin MAY might be ok for these metadata fields. However, is VideoFrame.timestamp and EncodedVideoChunk.timestamp optional for WebCodecs implementations?

They're mandatory.

Is there some kind of deep connection here that I miss?

Copy link
Collaborator

@aboba aboba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to VideoFrame registry are ok, but should probably not reference RVFC. receiveTime and rtpMetadata could be defined in WebRTC-PC or WebRTC-Extensions and captureTime could be defined in Media Capture & Streams or Media Capture Extensions.

@chrisn
Copy link
Member

chrisn commented Jul 11, 2024

Minutes from 9 July 2024 Media WG meeting. @aboba summarised the conclusion in #813 (review).

@Djuffin
Copy link
Contributor

Djuffin commented Jul 15, 2024

Summary of WG discussion:
HTMLVideoElement.requestVideoFrameCallback is not the best spec to reference here, because it doesn't describe how and when these timestamps are set.
Corresponding changes need to be made in MediaStreamTrackProcessor and Media Capture and Streams specs. Something along the lines: "MediaStreamTrackProcessor sets capture timestamps for VideoFrames coming from camera..."

Later this PR should reference these specs.

@aboba
Copy link
Collaborator

aboba commented Sep 6, 2024

"And RTC software like Teams, Mean and Facetime can really use it for A/V sync and latency estimation, even if they have to pass this information via separate channels."

[BA] To do A/V sync, captureTime and receiveTime need to be provided for both audio and video.

Also, if they are to be usable for non-RTP transports, they need to be defined in a way that is independent of RTP/RTCP. For example, on the local peer, captureTime represents the capture time of the first byte according to the local wallclock. On a remote peer, captureTime is set by the receiver. For example, the local peer's captureTime can be serialized on the wire and then set on the receiver (e.g. not adjusted to the receiver wallclock). receiveTime is set on the receiver, based on the receiver's wallclock. (receiveTime - captureTime) can then be used to estimate the sender/receiver offset as well as Jitter.

@@ -61,6 +61,18 @@
<td>segments</td>
<td>[Human face segmentation](https://w3c.github.io/mediacapture-extensions/#human-face-segmentation)</td>
</tr>
<tr>
<td>captureTime</td>
<td>[Capture time](https://wicg.github.io/video-rvfc/#dom-videoframecallbackmetadata-capturetime)</td>
Copy link
Collaborator

@aboba aboba Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RVFC text is too RTP centric to be used here. I'd copy over the text and make some changes. For a WebCodecs application, captureTime can be serialized on the wire and set by a WebCodecs application for frames received from a remote peer. Also, NTP timestamp format does not imply a global clock so change “estimated using clock synchronization” to “aligned to the sender wallclock”.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to

  1. Define what the concept "capture time" is in mediacapture-extensions, and define "remote capture time" over at webrtc-extensions (this would refer to the new mediacapture-extensions "capture time" concept plus text copied from rVFC to describe estimation).
  2. Refer to these new definitions here (local tracks: look here; remote webrtc tracks: look here) instead of the rVFC reference, but use DOMHighResTimestamp relative to local Performance.timeOrigin.

That would leave captureTime unset otherwise. A webcodecs app could use whichever out-of-band techniques to compute & set a valid local DOMHighResTimestamp. Does it need to be mentioned in the spec?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is now updated to reference mediacapture-extensions, where these concepts are now defined.

</tr>
<tr>
<td>receiveTime</td>
<td>[Receive time](https://wicg.github.io/video-rvfc/#dom-videoframecallbackmetadata-receivetime)</td>
Copy link
Collaborator

@aboba aboba Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment: RVFC text is also too WebRTC-centric. Can we allow this to be present in a WebCodecs application as well (set by the receiver)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to describe this as "set for remote webrtc tracks" and use DOMHighResTimestamp relative to local Performance.timeOrigin. I agree it should be settable by a WebCodecs application.

</tr>
<tr>
<td>rtpTimestamp</td>
<td>[RTP timestamp](https://wicg.github.io/video-rvfc/#dom-videoframecallbackmetadata-rtptimestamp)</td>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RVFC text is ok here but I'd still probaby copy it over rather than referencing it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I planned to define what a "rtp timestamp" is over at webrtc-extensions and simply define here that it's present for "remote webrtc tracks" together with a reference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with the reference to mediacapture-extensions.
Did not copy text to follow the format for Human face segmentation

@guidou
Copy link
Author

guidou commented Nov 12, 2024

This PR has been updated to reference mediacapture-extensions where these concepts are now properly defined (similar to human face segmentation).

@guidou guidou requested a review from aboba November 18, 2024 09:41
@aboba
Copy link
Collaborator

aboba commented Dec 5, 2024

@guidou Does this PR also resolve #599 ?

@guidou guidou changed the title Add captureTime, receiveTime and rtpMetadata to VideoFrameMetadata Add captureTime, receiveTime and rtpTimestamp to VideoFrameMetadata Dec 5, 2024
@Djuffin Djuffin merged commit 41636a6 into w3c:main Dec 10, 2024
2 checks passed
github-actions bot added a commit that referenced this pull request Dec 10, 2024
SHA: 41636a6
Reason: push, by Djuffin

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agenda Add to Media WG call agenda
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants