-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML parser #3412
XML parser #3412
Conversation
…ree and manage node as arrays
…ssue with ES6 syntax
…nodes as array process) on-the-fly
…d remove _asArray nodes)
@technogeek00 can you please check this PR on your side regarding all the changes from your PR #3451? |
…d remove _asArray nodes)
…d remove _asArray nodes)
@bbert I can take a look, will take me a few days to get to it. |
@technogeek00 Did you have time to check this PR and unit tests? |
Reviewing tonight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I apologize for the delay in my review, but hopefully this is insightful feedback.
I pulled this branch and attempted to run the player and the tests locally, both appeared broken as indicated, but upon further investigation there seems to be deeper problems with the library proposed. The points I would highlight for discussion:
- The ordering of sibling elements of different tag names is not persisted, this is critical to patching operations as the add/remove/replace operations operate on the consecutively modified document
- There appears to be little to no proper support for xml namespacing, the prefixes are directly preserved, but these are arbitrary in a document and need to be properly handled
As a note on the first point, this is technically only critical for the patch document, the non-modified format of the tXml library does appear to support parsing and simplification independently. If that version was taken directly the DashParser.js file could apply simplification to the MPD element to keep footprints small while the Patch could be left untouched to persist the ordering information for usage in application.
While I agree the parsing and internal structuring of the current library should be replaced with a more performant option, this particular library seems very purpose built for a specific use case and may not be fully conformant. If this library was used, I'd recommend reverting back to the original version, re-porting the attribute matcher framework, and then utilize the simplified version for mpds and lossless simplified version for patches instead of manually modifying the original parse process.
Outside the library we may need to take a close look at document usage cases to ensure single vs array element usage is done consistently and accurately with the updated parsed structure. I highlighted a couple bugs in this review, but I'd worry about areas that did not reference the _asArray
and still always expected it or others that always assumed standalone elements.
externals/tXml.js
Outdated
function tXml(S, options, attrMatchers, arrayChildNames) { | ||
"use strict"; | ||
options = options || {}; | ||
attrMatchers = attrMatchers || {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type mismatch, attrMatchers
used as array but defaulted to object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
externals/tXml.js
Outdated
|
||
* @return {tNode[]} | ||
*/ | ||
function tXml(S, options, attrMatchers, arrayChildNames) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the differences between this file and the main tXML attrMatches
and arrayChildNames
appear to be extensions, it may be better to integrate them through the libraries existing conventions of the options
object.
The attribute matchers are a nice addition, but generally why introduce a sometimes array type child function like arrayChildNames
instead of using the full tXML file and passing options.simplify: true
to have the all children are arrays simplified form without file modification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attrMatches
and attrMatches
can indeed be passed in the options object.
So do you suggest to have systematically all children being stored as arrays? This has the disadvantage to access some child nodes as an array while DASH spec indicates for some nodes you can have only (for example SegmentBase)
DashConstants.SUPPLEMENTAL_PROPERTY, | ||
DashConstants.METRICS, | ||
DashConstants.REPORTING | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing elements preventing patching from working:
PatchLocation
replace
add
remove
Example patch documents can be found in the DASH Schema repo:
Two other random notes discovered in my personal testing:
-
The parsing library does not handle namespaced elements properly which means future inclusion will need to build this out in the library. I highlight this because the Annex I query parameter work may soon be included from the DASH-IF Ad Insertion work which requires namespacing. Note that with XML namespaces the namespace identifier in documents are arbitrary and part of the node definition. Example namespaced node also in the DASH Schema repo example_I1.mpd
-
The parsing library does not correctly unescape XML characters, i.e.
&
=>&
does not occur rendering URLs invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed these ones. I did fix
src/dash/models/DashManifestModel.js
Outdated
@@ -1043,7 +1043,7 @@ function DashManifestModel() { | |||
if (manifest && manifest.hasOwnProperty(Constants.LOCATION)) { | |||
// for now, do not support multiple Locations - | |||
// just set Location to the first Location. | |||
manifest.Location = manifest.Location_asArray[0]; | |||
manifest.Location = manifest.Location[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unsafe re-assignment, it will cause the Location to be truncated on subsequent calls. With Location
always an array this line should be omitted and the return should be shifted to return manifest.Location[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, did replace asArray without paying attention to this.
I fixed it
src/dash/models/DashManifestModel.js
Outdated
@@ -1055,7 +1055,7 @@ function DashManifestModel() { | |||
function getPatchLocation(manifest) { | |||
if (manifest && manifest.hasOwnProperty(DashConstants.PATCH_LOCATION)) { | |||
// only include support for single patch location currently | |||
manifest.PatchLocation = manifest.PatchLocation_asArray[0]; | |||
manifest.PatchLocation = manifest.PatchLocation[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/dash/models/DashManifestModel.js
Outdated
} | ||
|
||
mimeTypeRegEx = (type !== Constants.TEXT) ? new RegExp(type) : new RegExp('(vtt|ttml)'); | ||
|
||
if (adaptation.Representation_asArray && adaptation.Representation_asArray.length && adaptation.Representation_asArray.length > 0) { | ||
let essentialProperties = getEssentialPropertiesForRepresentation(adaptation.Representation_asArray[0]); | ||
if (adaptation.Representation && adaptation.Representation.length && adaptation.Representation.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A theme I've observed across the review, there are a number of places that the dash.js code was guarding against the possibility of the element not being an array. With the forced array representation only an existence check is needed:
if(adaptation.Representation && adaptation.Representation.length > 0) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I did update these checks
@@ -106,7 +106,7 @@ class SimpleXPath { | |||
|
|||
// stop one early if this is the last element and an attribute | |||
if (level !== this.path.length - 1 || !name.startsWith('@')) { | |||
let children = parent[name + '_asArray'] || []; | |||
let children = parent[name] || []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change assumes that all children are now arrays, but only a limited set are based on the parsing logic. While some SimpleXPath tests are failing because the test expects need to be updated to use array notation: Period.AdaptationSet
=> Period[0].AdaptationSet[0]
.
The remainder still fail because of nodes that are not arrays, such as SegmentTemplate and SegmentTimeline. For those objects, this line will take the object direction and it will have fail the length check as it is a strict equality to zero causing the node search to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can fix it, for example by considering the list of node names that should be as array, as done in the parser.
But I would need some test streams to check. Do you have some?
@@ -26,7 +25,7 @@ function staticSegmentTimeline() { | |||
function staticSegmentTemplate() { | |||
let timeline = staticSegmentTimeline(); | |||
return { | |||
SegmentTimeline: timeline, // purposely omit the _asArray to ensure single node case captured | |||
SegmentTimeline: timeline, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem on this exact line, but need to highlight in this file
Note that the PatchHelper is explicitly generating the patches with well formed and ordered __children
arrays because ordering of elements is significant for patch operations, they must be supplied sequentially not based on node type. Actual code reference: https://github.com/Dash-Industry-Forum/dash.js/blob/development/test/unit/helpers/PatchHelper.js#L87
The simplified output of the updated parsing library does not appear capable of representing this information.
Re-apply modifications to tXml parser library in order to: - add child node to their parent as arrays or object according to node names - process attributes on the fly - set node text value in "_text" property
…at can contain a number value (such as "id" attribute)
// if we did have a positional reference we need to purge from array set and restore X2JS proper semantics | ||
if (relativePosition != -1) { | ||
let targetArray = target[name + '_asArray']; | ||
if (typeof target[name] === Object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (typeof target[name] === Object) { | |
if (typeof target[name] === 'object') { |
typeof
returns a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in PR #4180
Closed, see #4180 |
The goal of this PR is to provide a new XML parser in order to speed up parsing time.
Parsing time may be criticial for long contents with SegmentTimelines.
What has been achieved in this PR:
Here are some numbers for the XML parsing processing time (in ms) for a 240mn content having one AdaptationSet with a SegmentTimeline without repeat pattern: