Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonpb: Add option to disable HTML escaping #409

Closed
wants to merge 1 commit into from

Conversation

pauln
Copy link

@pauln pauln commented Aug 13, 2017

Adds a new SkipEscapeHTML option to jsonpb.Marshaler, which disables HTML escaping by swapping out json.Marshal in favour of a JSON Encoder with SetEscapeHTML(false) called on it.

Closes #407.

@googlebot
Copy link
Collaborator

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
  • In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

@pauln
Copy link
Author

pauln commented Aug 13, 2017

I've signed the CLA.

Failing test is Go 1.6, due to lack of Encoder.SetEscapeHTML and Encoder.SetIndent.

@googlebot
Copy link
Collaborator

CLAs look good, thanks!

@adamryman
Copy link

@cybrcodr, @dsnet if the tests are versioned go version could this be merged?

@pauln
Copy link
Author

pauln commented Oct 5, 2017

@cybrcodr @dsnet I see that some recent commits have caused this PR to conflict. I'm happy to work on resolving that, if you can provide some guidance on the preferred way to deal with the lack of Encoder.SetEscapeHTML and Encoder.SetIndent on Go <= 1.6.

@dsnet dsnet added the jsonpb label Dec 5, 2017
@dsnet dsnet requested a review from cybrcodr December 5, 2017 23:44
@dsnet
Copy link
Member

dsnet commented Dec 5, 2017

I'm not fond of every encoding/json option being in jsonpb, but since support Indent was already added, there is precedence for mirror the options for encoder/json.Marshaler.

I'll leave the review to @cybrcodr

@cybrcodr
Copy link
Contributor

cybrcodr commented Dec 8, 2017

I share the same sentiment that these options should not belong in jsonpb. These can be done via a post-processing step instead. Even though the library already has Indent, I would rather not extend this type of options.

Here's a simple example of transformation of stripping these escape sequence characters from a JSON string -- https://play.golang.org/p/AdmagoBdkz. I think one can write a faster and more efficient transformation w/o using regexp.

@cybrcodr cybrcodr closed this Dec 8, 2017
@dsnet
Copy link
Member

dsnet commented Dec 8, 2017

Alternatively, you could round-trip through the stdlib encoder to act as a JSON-to-JSON transformer:

func rewriteJSON(in []byte) ([]byte, error) {
	var m map[string]interface{}
	if err := json.Unmarshal(in, &m); err != nil {
		return in, err
	}
	var b bytes.Buffer
	e := json.NewEncoder(&b)
	e.SetEscapeHTML(false)
	if err := e.Encode(m); err != nil {
		return in, err
	}
	return b.Bytes(), nil
}

@veqryn
Copy link

veqryn commented Sep 11, 2018

@dsnet @cybrcodr Can I politely disagree with this decision and this pr/issue be re-opened?

Post-processing can be very expensive, and it isn't always easy to find a place to do the post-processing.

HTML escaping in json is a gotchya that has broken clients and downstream processes for us before.

I don't believe it is too much to ask to allow us the ability to turn it off.

There aren't any other options that json.Encoder has, so I can't image this list of options growing any further.
But still, if there are concerns about adding more options to the Marshaler struct, then there are other ways of doing this PR. For example, an interface or function could be used for the marshalling function, that way people could replace it with whatever they need.

@dsnet
Copy link
Member

dsnet commented Sep 11, 2018

Hi @veqryn.

HTML escaping in json is a gotchya that has broken clients and downstream processes for us before.

That seems to be a bug in the downstream client if they're not able to handle RFC 7195 compliant output. I believe it is a mistake if bugs in downstream code require upstream code to make changes. It encourages buggy behavior to persist.

I don't believe it is too much to ask to allow us the ability to turn it off. There aren't any other options that json.Encoder has, so I can't image this list of options growing any further.

Each new feature request often seems like it isn't too much to ask, but they do pile up. As a result, it can become difficult to add other features that are actually more desired since they may interact poorly with the current set of features.

That is exactly what happened to encoding/json. I'm one of the maintainers of the standard library package as well. It's API has unfortunately become over-burdened, clunky, non-orthogonal in some areas as a result of feature addition without enough future foresight. Maintaining that package has taught me the importance of being very careful about what features to adopt.

@veqryn
Copy link

veqryn commented Sep 11, 2018

I realize and agree with with everything you said, but I would still really like to see the ability to turn off html-escaping.
If you had an interface for the json.Marshal function, you could get rid of the Indent option. That would keep things at the same number of fields, but with more flexibility.

@dsnet
Copy link
Member

dsnet commented Sep 11, 2018

If you had an interface for the json.Marshal function, you could get rid of the Indent option.

I'm not sure how that would work. Did you mean to expose json.Encoder so that can directly set whatever options that the encoder supports?

Assuming that's the case, you are correct that this provides the flexibility that you want in this specific case, but it also incurs detriments:

  • Exposing json.Encoder tightly couples the public API of jsonpb with encoding/json† and also tightly couples the implementation of jsonpb to encoding/json. We are actually working on v2 of Go protobufs, and the new implementation actually doesn't use encoding/json. To reduce the maintenance burden, we would want to be able to implement jsonpb v1 in terms of jsonpb v2. If we exposed json.Encoder, then the v2 implementation is now on the hook for implementing every option that encoding/json provides.
  • If there were ever a v2 of encoding/json, then we've already coupled ourselves to to the v1 version and can't easily switch without a major version change on our end (since our API would now change as well).
  • As a general principle, it is not a good idea to have your public API depend on other packages unless there is very strong benefit to it.

Overall, I understand the desire to have an override, but I'm not seeing a compelling reason to add it now:

  • If the use-case was for nicer human-readable output, then JSON-JSON transformation should suffice. There's a performance hit, but it's for humans anyways.
  • If the use-case is for buggy client, then it seems wrong to further encourage incorrect behavior.
  • I know encoding/json has a SetEscapeHTML option, but that was too narrow of an option. You could imagine that a more general option of SetStringEncoder that allowed the user to specify how strings were serialized in JSON (with various pre-defined encoders). However, that option can't be added now since it interacts poorly with the existing SetEscapeHTML option.

† I'm aware that we already exposed via jsonpb.UnmarshalNext, but that was a mistake and should not be precedence for more exposure.

@veqryn
Copy link

veqryn commented Sep 12, 2018

Actually I was not thinking of exposing json.Encoder.

Right the code calls: b, err := json.Marshal(s)

Let the user put their own function JSON Marshalling there, so long as it matches the same signature: func(interface{}) ([]byte, error)

@dsnet
Copy link
Member

dsnet commented Sep 12, 2018

What is the input to func(interface{}) ([]byte, error)? We should not make assumptions about the implementation (even if the current code does it a certain way). The v2 implementation currently does not use encoding/json, and so does not call json.Marshal, and thus does not have a data structure that matches the interface{} that the function would presumably expect.

@veqryn
Copy link

veqryn commented Sep 12, 2018

I guess I'd have to see the v2 version to see where the best place or way to add this feature would be.

I assume based on what you have said so far that you are trying to keep API compatibility between the v1 and v2 implementations of jsonpb?

If not then I would also argue there is nothing wrong with doing it one way in v1 and a different way in v2.

@dsnet
Copy link
Member

dsnet commented Sep 12, 2018

API compatibility between the v1 and v2 implementations of jsonpb?

Not API compatibility (the whole purpose of v2 is to break API compatibility), but feature compatibility. That is, the ability to implement v1 in terms of the v2 API. Otherwise, we are on the hook forever to maintain the v1 code.

@veqryn
Copy link

veqryn commented Sep 12, 2018

So in that case, I'd agree that putting something into v1 that doesn't play well with the upcoming v2 is bad idea. What does get put in, should probably go into v2 first and then get backported in a nice way to v1. Without seeing what v2 looks like, I believe all I can do at this point is say that I would like the ability to not escape HTML in the generated JSON.

Maybe that takes the form of a SetStringEncoder, maybe it is something functional, maybe it is an interface, maybe it is a flag like json.Encoder has, or maybe it is something else that integrates well with the v2 version but is forward-looking and maintainable.

I'd be happy to help out or make a PR, if you have some idea or direction of what you'd accept.

@dsnet
Copy link
Member

dsnet commented Sep 12, 2018

should probably go into v2 first and then get backported in a nice way to v1

Yep. Thank you for being understanding.

You can keep track of all the v2 work here: https://github.com/golang/protobuf/tree/api-v2
The code reviews are start here: https://go-review.googlesource.com/q/project:protobuf+status:open

antoniomo added a commit to antoniomo/protobuf that referenced this pull request Jun 1, 2020
@golang golang locked and limited conversation to collaborators Jun 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jsonpb: No way to disable HTML escaping
6 participants