[core-http] Update XML serialization to support Atom XML operations#4916
[core-http] Update XML serialization to support Atom XML operations#4916ramya0820 merged 83 commits intoAzure:masterfrom
Conversation
common/config/rush/pnpm-lock.yaml
Outdated
| eslint-plugin-promise: 4.2.1 | ||
| events: 3.0.0 | ||
| express: 4.17.1 | ||
| fast-xml-parser: 3.12.20 |
There was a problem hiding this comment.
Is this a canonical implementation of an XML parser? Is this the default used in the community?
There was a problem hiding this comment.
The dependencies to review would be in package.json file.
This is a temporary artifact that gets generated on local and will be rebased on master. (After which this line would go away since it's not being used in implementation)
About the dependency in itself, I tried using it but it doesn't allow for XML manipulation in memory. So, not sure if it's a good choice for an alternative just yet.
bterlson
left a comment
There was a problem hiding this comment.
High level:
- Remove
eachutil - Remove unused utils
- Remove
anytypes - Remove dependency on
cryptojsin the browser (consult with Johnathan and Daniel on what they did with keyvault) - Remove
isUndefinedhelper (x === undefinedis better). In general the helpers seem less useful with modern JS so take a stab at removing as many as makes sense :) - Some specific comments were left, but I did not point out all the places where the feedback applies. Take these comments as general feedback.
- Replace vars with let/const.
|
Thanks @sadasant for helping with the WebCrypto stuff! Let me know if there's anything else you suggest around it. The changes we looked into are present in |
sdk/core/core-http/lib/credentials/sasServiceClientCredentials.ts
Outdated
Show resolved
Hide resolved
sdk/core/core-http/lib/credentials/sasServiceClientCredentials.ts
Outdated
Show resolved
Hide resolved
sdk/core/core-http/lib/credentials/sasServiceClientCredentials.ts
Outdated
Show resolved
Hide resolved
sdk/core/core-http/lib/credentials/sasServiceClientCredentials.ts
Outdated
Show resolved
Hide resolved
|
PR updated to not house the atomSerializationPolicy or any Atom XML specific things per discussions yesterday. |
AlexGhiondea
left a comment
There was a problem hiding this comment.
The changes overall look good, but I have some concerns around throwing new exceptions when we didn't throw them before.
I left a couple of comments, please take a look.
sdk/core/core-http/lib/util/utils.ts
Outdated
|
|
||
| /** | ||
| * @ignore | ||
| * Helper utility to clean up unintended characters that get appended by OS. |
There was a problem hiding this comment.
My understanding is that the OS won't automatically append the BOM to a file as it doesn't know the content of the file. It is up to the application to add those bytes or not.
In our scenario, who is going to put the BOM into the stream that contains the XML doc?
There was a problem hiding this comment.
This is the comment that was added by original authors in azure-sb
Based on https://www.w3.org/International/questions/qa-byte-order-mark.en, looks like this gets added commonly by editors and any app that's used to generate the text content.
There was a problem hiding this comment.
I understand the scenario where applications put this header. What I am trying to understand is how is the service bus SDK receiving XML objects that contain this BOM.
I don't mind adding this in if there is a scenario in which we know the BOM is being returned by the service -- what I am asking is for is that scenario.
There was a problem hiding this comment.
So there is no test scenario that was written earlier that surfaced this as being sent by the service.
As discussed earlier, our understanding is that the OS on which the code generating files (if any) would end up appending this.
So no, I haven't encountered a concrete scenario supporting this other than the fact that this check was added with a comment describing reason by authors in azure-sb
Notes from offline discussion yesterday afternoon for ref -
• Handling of unwanted BOM characters - This was being taken care of in azure-sb implementation and seems less understood by us overall.
https://en.wikipedia.org/wiki/Byte_order_mark
https://www.screenaware.com/en/blog/xml2js-sax-js-non-whitespace-before-first-tag
Based on research and testing things, looks like it's a general problem that users face and must have been added for a reason in azure-sb. The documentation and intent is pretty clear too.
In Node, while xml2js handles this internally for XML parsing, it doesn't do so for XML building.
In browser, both building and parsing require BOM handling.
sdk/core/core-http/lib/util/utils.ts
Outdated
| */ | ||
| export function removeBOM(str: any) { | ||
| if (typeof str == "string") { | ||
| if (str.charCodeAt(0) === 0xfeff || str.charCodeAt(0) === 0xffef) { |
There was a problem hiding this comment.
I believe that this checks for the BOM header when the encoding is expected to be UTF16.
Would the correct check be for the following sequence of bytes EF BB BF ?
https://www.w3.org/International/questions/qa-byte-order-mark.en
There was a problem hiding this comment.
Looks like this is for utf-8, utf-16 seems to have some other considerations added in a different section
There was a problem hiding this comment.
Here is another resource -- there seems to be a difference between UTF8 and UTF16.
https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
The UTF-8 representation of the BOM is the (hexadecimal) byte sequence 0xEF,0xBB,0xBF
https://en.wikipedia.org/wiki/Byte_order_mark#UTF-16
In UTF-16, a BOM (U+FEFF) may be placed as the first character of a file or character stream to indicate the endianness (byte order) of all the 16-bit code units of the file or stream. If an attempt is made to read this stream with the wrong endianness, the bytes will be swapped, thus delivering the character U+FFFE, which is defined by Unicode as a "non character" that should never appear in the text.
Can you share more information about the section you mention?
There was a problem hiding this comment.
In the earlier link at https://www.w3.org/International/questions/qa-byte-order-mark.en#bytheway
It says
If you use a UTF-16 encoding for your page (and we strongly recommend that you don't), there are some additional considerations.
So the directions mentioned in entirety of article seemed to refer to UTF-8
There was a problem hiding this comment.
In the same document, they have this:
With a binary editor capable of displaying the hexadecimal byte values in the file, the UTF-8 signature displays as EF BB BF.
At this point, this is irrelevant as we don't have the BOM code anymore, but this is something to keep in mind for the SB changes.
There was a problem hiding this comment.
Since SB doesn't directly feed any files to input, it's not going to be necessary.
But when it comes to dealing with output sent by service, such as for deserializationPolicy and maybe others where the response content from service is being parsed directly, would be good to have these in core-http I think.
Especially cause xml2js handles it in Node, but for in browser we are manually parsing and this isn't being handled implicitly.
So it seems to do with general text stream handling things, and, wonder if Storage handles the BOM characters explicitly in the SDK.
There was a problem hiding this comment.
About the actual characters to use, above is interesting - unclear why the utf-16 code was being added in azure-sb - it might be possible that this specific character was noted in output. (Or maybe it was before xml2js (or the service itself) was updated to implicitly handle the BOM characters)
| export function parseXML(str: string, opts?: { includeRoot?: boolean }): Promise<any> { | ||
| try { | ||
| if (str == undefined || str == "") { | ||
| throw new Error("Document is empty"); |
There was a problem hiding this comment.
This is a new error case -- what used to happen when you passed and empty XML string before?
I would prefer that we match the existing behavior so that we don't break existing scenarios.
There was a problem hiding this comment.
This is actually the same error that gets thrown by parser in Node - same for all errors in .browser.ts file
There was a problem hiding this comment.
The error message is not what I was wondering. If another SDK was using this browser code before and they passed either undefined or and empty document, what would they get back?
I am asking because changing that behavior (and throwing in that case) could be a breaking change.
There was a problem hiding this comment.
Based on usage in Storage SDK, this scenario doesn't seem to exist
It seems a very less likely possibility since the SDKs are all targeting the same Azure services. And for an agnostic thing such as empty/undefined document being disallowed in one SDK, it seems like the behavior should be same for others as well.
For instance, if parsing of undefined/empty documents is allowed in browser, it would be a bug if it doesn't work from the Node SDK?
Similarly, if it is not allowed in browser, then the tests and checks would have helped catch that type of bug early on.
But I think more investigation is needed on this, since maybe we should allow handling of empty/undefined documents if that is a valid scenario? Ideally, this discrepancy wouldn't be there if we are using the same parser for Node and browser, but for most part the behaviors would be expected to be same unless very explicitly required as otherwise I think..
cc @bterlson @daviwil
ramya-rao-a
left a comment
There was a problem hiding this comment.
I have 2 main concerns left
- I would have the consumer of the library worry about bom and not have the bom handling here in this library
- I would remove the new validation errors that are being thrown when in browser mode. These are made to match the errors coming from xml2js library when run in node which can change over time and we have no control over.
|
@ramya-rao-a @AlexGhiondea However, reduced the test coverage and removed BOM handling to unblock PR. We can continue discussing/learning more later.. |
sdk/core/core-http/lib/util/xml.ts
Outdated
| }); | ||
| /** | ||
| * Converts given XML string into JSON | ||
| * @param str String containg the XML content to be parsed into JSON |
There was a problem hiding this comment.
| * @param str String containg the XML content to be parsed into JSON | |
| * @param str String containing the XML content to be parsed into JSON |
sdk/core/core-http/test/xmlTests.ts
Outdated
| ); | ||
| }); | ||
|
|
||
| it.only("with element with child undefined element", async function() { |
There was a problem hiding this comment.
| it.only("with element with child undefined element", async function() { | |
| it("with element with child undefined element", async function() { |
|
/azp run js - client - ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |



EDIT:
This PR is for changes in
core-httprequired for supporting Atom XML serialization required of Service Bus (and other SDKs in future such as Event Hubs). Code was mostly moved over from older JS onlyazure-sbimplementation and has been significantly updated to fit into TS best practices.Most of the initial work has been moved to Service Bus #4922 as per latest discussion.
Also includes