Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: Encoder encodes double quotes incorrectly. #12400

Closed
dragonfax opened this issue Aug 30, 2015 · 5 comments
Closed

encoding/xml: Encoder encodes double quotes incorrectly. #12400

dragonfax opened this issue Aug 30, 2015 · 5 comments

Comments

@dragonfax
Copy link

xml.Encoder encodes a double quote in a text node into the entity ". Technically this a valid entity. But the standard is to encode a double quote into " instead. This is the standard when not using a DTD. https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML

I only noticed this because it sends riak-cs haywire when used with the aws-sdk-go client (an AWS s3 client written in go).

Here is an example.
https://gist.github.com/dragonfax/ca3ee45a0acf97820f58

@mikioh mikioh changed the title xml.Encoder encodes double quotes incorrectly. encoding/xml: Encoder encodes double quotes incorrectly. Aug 30, 2015
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Aug 30, 2015
@nodirt
Copy link
Contributor

nodirt commented Oct 14, 2015

Same applies to ' https://play.golang.org/p/tIu_6xahLG
I'd like to work on this

@nodirt
Copy link
Contributor

nodirt commented Oct 14, 2015

Except, this may be considered backwards incompatible. @adg?

@nodirt
Copy link
Contributor

nodirt commented Oct 14, 2015

Apparently, this is behavior is intended because " and ' are shorter than " and ' respectively

go/src/encoding/xml/xml.go

Lines 1833 to 1834 in bf21643

esc_quot = []byte(""") // shorter than """
esc_apos = []byte("'") // shorter than "'"

@adg (or another core gopher) to make the final decision (I can't close bugs)

@adg
Copy link
Contributor

adg commented Oct 15, 2015

I don't see anywhere in the spec that says " must be used and not ". In fact, all I could find was this sentence which suggests that it's fine to use the numeric entity instead:

Entity and character references may both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (amp, lt, gt, apos, quot) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references " < " and " & " may be used to escape < and & when they occur in character data.

I'd be inclined not to change the existing behavior because—as @nodirt says—this may break (or at least unexpectedly change) existing programs.

@rakyll
Copy link
Contributor

rakyll commented Oct 15, 2015

&quot; is just a predefined &#34;. Given the fact that &#34; is intentional, there is no reason we should fix this bug if there are no other major practical reasons.

@rakyll rakyll closed this as completed Oct 15, 2015
@golang golang locked and limited conversation to collaborators Oct 17, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants