-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add captureTime, receiveTime and rtpTimestamp to VideoFrameMetadata #813
Conversation
Does this PR imply any behavior in WebCodecs API? For example, on encoding is there an expectation that If there are no changes in behavior (e.g. if the attributes don't affect the encode or decode process or some other aspect of WebCodecs) then the attributes could be defined in another specification where behavior is affected (e.g. mediacapture-transform?), and added to the VideoFrame Metadata Registry. |
This PR as currently written does not imply any behavior in the WebCodecs API, although I would expect the things you mentioned (e.g., forwarding them to/from EncodedVideoChunk) as potentially useful. The idea for this PR is to provide information to applications so that they can do similar things to what they can do with |
I think we can specify forwarding to EncodedVideoChunk in a separate PR since this one has value on its own without specifying further changes to WebCodecs. |
I used to be skeptical about these timestamps since they are not passed through the encoding-decoding cycle, but since we already have entries in VideoFrame Metadata Registry that don't do that, I think it's okay now. And RTC software like Teams, Mean and Facetime can really use it for A/V sync and latency estimation, even if they have to pass this information via separate channels. So LGTM |
I agree that this metadata is useful. The question is whether behavior is well specified, so that interop is possible. For example, there is the question of where the metadata orginates:
|
I thought for all metadata entries the answer to these questions is MAY. |
@Djuffin MAY might be ok for these metadata fields. However, is alignment of |
I agree from a WebCodecs POV. That would mean mediacapture-main and webrtc-pc here. |
FWIW, the In any case, we need to have entries for these fields in the VideoFrameMetadata registry. |
Media WG meets today, please add agenda label if you'd like to discuss. |
They're mandatory. Is there some kind of deep connection here that I miss? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to VideoFrame
registry are ok, but should probably not reference RVFC. receiveTime and rtpMetadata could be defined in WebRTC-PC or WebRTC-Extensions and captureTime could be defined in Media Capture & Streams or Media Capture Extensions.
Minutes from 9 July 2024 Media WG meeting. @aboba summarised the conclusion in #813 (review). |
Summary of WG discussion: Later this PR should reference these specs. |
"And RTC software like Teams, Mean and Facetime can really use it for A/V sync and latency estimation, even if they have to pass this information via separate channels." [BA] To do A/V sync, Also, if they are to be usable for non-RTP transports, they need to be defined in a way that is independent of RTP/RTCP. For example, on the local peer, |
Partly addresses w3c/webcodecs#813 (review).
@@ -61,6 +61,18 @@ | |||
<td>segments</td> | |||
<td>[Human face segmentation](https://w3c.github.io/mediacapture-extensions/#human-face-segmentation)</td> | |||
</tr> | |||
<tr> | |||
<td>captureTime</td> | |||
<td>[Capture time](https://wicg.github.io/video-rvfc/#dom-videoframecallbackmetadata-capturetime)</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RVFC text is too RTP centric to be used here. I'd copy over the text and make some changes. For a WebCodecs application, captureTime can be serialized on the wire and set by a WebCodecs application for frames received from a remote peer. Also, NTP timestamp format does not imply a global clock so change “estimated using clock synchronization” to “aligned to the sender wallclock”.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to
- Define what the concept "capture time" is in mediacapture-extensions, and define "remote capture time" over at webrtc-extensions (this would refer to the new mediacapture-extensions "capture time" concept plus text copied from rVFC to describe estimation).
- Refer to these new definitions here (local tracks: look here; remote webrtc tracks: look here) instead of the rVFC reference, but use DOMHighResTimestamp relative to local Performance.timeOrigin.
That would leave captureTime
unset otherwise. A webcodecs app could use whichever out-of-band techniques to compute & set a valid local DOMHighResTimestamp. Does it need to be mentioned in the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is now updated to reference mediacapture-extensions, where these concepts are now defined.
</tr> | ||
<tr> | ||
<td>receiveTime</td> | ||
<td>[Receive time](https://wicg.github.io/video-rvfc/#dom-videoframecallbackmetadata-receivetime)</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment: RVFC text is also too WebRTC-centric. Can we allow this to be present in a WebCodecs application as well (set by the receiver)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to describe this as "set for remote webrtc tracks" and use DOMHighResTimestamp relative to local Performance.timeOrigin. I agree it should be settable by a WebCodecs application.
</tr> | ||
<tr> | ||
<td>rtpTimestamp</td> | ||
<td>[RTP timestamp](https://wicg.github.io/video-rvfc/#dom-videoframecallbackmetadata-rtptimestamp)</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RVFC text is ok here but I'd still probaby copy it over rather than referencing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I planned to define what a "rtp timestamp" is over at webrtc-extensions and simply define here that it's present for "remote webrtc tracks" together with a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with the reference to mediacapture-extensions.
Did not copy text to follow the format for Human face segmentation
This PR has been updated to reference mediacapture-extensions where these concepts are now properly defined (similar to human face segmentation). |
SHA: 41636a6 Reason: push, by Djuffin Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
These fields are useful for WebRTC-based applications.
See issue #601