-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/protojson: DiscardUnknown doesn't work for unknown enum strings #1208
Comments
\cc @cybrcodr, do you happen to remember what the behavior for equivalent functionality in C++ is? |
I didn't check C++, but I'm going to guess that it doesn't handle this case since there's not an obvious behavior to perform in the case of repeated fields. |
Sorry for the late response. I just checked using libprotoc 3.13.0, and with JsonParseOptions.ignore_unknown_fields = true, C++ lib does ignore unknown enum string value. For a singular field, it "ignores" the field, treating it as not set. For repeated fields, it "ignores" unknown string value by dropping it and hence changing the size of the repeated field. When I added the DiscardUnknown option, I opted only for unknown fields. For "undefined/unknown" integer enum values, C++ will treat it as an unknown field in proto2, but recognize it in proto3. Go does accept integer values that are not defined regardless of the DiscardUnknown option. I could be wrong on this, but I think treatment of integer enum values varies across different languages. For unknown enum string values, it may seem like this may be common across language implementations though and hence I'm open to adding support for it. Slight related, C++ also has the option for case_insensitive_enum_parsing which has the following warning:
While this is not the same as the original request, just want to voice out ahead that I don't think this is a feature we want to support. |
Wow. That is surprising behavior, but okay. |
Yeah, I think it’s kind of hard to say what appropriate behavior should be here. Arguably, if you add an enum and a service/client gets that value and doesn’t know which integer value that corresponds to… I mean, setting it to the default value is probably not a good idea, unless everyone has followed good procedure and defined their default/zero-value to be ”INVALID VALUE”. Imagine perhaps, something using I understand my example is contrived, but it’s just supposed to be an example of how treating the value as an unknown field, or a default value could cause semantics bugs in code. P.S.: I think if a valid optional support were to be added, it should be explicitly to turn this behavior on, and not conflating unknown fields with unknown enum values. Users of the code should know exactly what behaviors they are enabling/disabling. |
Decided to reread this issue since I bumped into the problem again this morning. Upon re-reading, I realized that the conversion has centered around repeated enum -- which is not actually my situation. I'm mostly looking for parity with C++ (and similar semantics as proto-encoding). It sound like there's support for that, so I'll do a PR today or tomorrow. |
@cybrcodr mailed out a change for this already. I agree that something should be changed, but it's not clear what should be changed. It's known that C++'s approach is problematic, and sometimes even considered a bug. It's not clear that we should emulate them in this exact behavior. |
The Language Guide (proto2 and proto3) have no mention on how to handle this for JSON. The section on JSON options simply mention about Ignore unknown fields: Proto3 JSON parser should reject unknown fields by default but may provide an option to ignore unknown fields in parsing. And hence we added DiscardUnknown option to ignore unknown fields, but not unknown enum values. Btw, I just realized that the expectation on the original post here is to have the unknown value be treated as the default enum value. That is not what the C++ implementation does as I mentioned above, i.e. it does not replace the value with the default value, it really ignores the value by not using them. This may be OK for proto3 singular field since 0 is treated as the default enum value, but it is not the same in proto2, where there is no default enum value. And for repeated (or map) field, it drops the value. We have to account for consistency on how to deal with unknown enum values, and these appear in singular, repeated and map fields. protocolbuffers/protobuf#6355 is a similar issue raised in the main protobuf project. Perhaps you may want to follow up on that issue on that project first or open another issue if you think it's unrelated. |
Thanks @cybrcodr Sorry about the confusion re 'ignored field' and 'default value'. So, where are we leaving this? Try to get by-off from the main protobuf project? or go for parity with existing behavior as in #1208 (comment) |
Yes, getting approval from the main protobuf project is always the right approach for getting consistency across the proto specification and across different languages. |
Ok, I commented over there. I'm a bit worried that the issue is stale (mid-2019)...but at least 2019 was a better year that 2020. Thanks all! |
Hi folks, I've opened a CL to fix this. My team has engaged with the protobuf team to get this worked out across languages. The CL is in WIP until we have a concrete decision. Just wanted to indicate that there is some movement on this issue. |
Hi @noahdietz, Thanks for engaging the Protobuf team on this. I'd appreciate it if your team or the Protobuf team can update the Protobuf Language Guide with regards to what the expected behavior should be across all languages for this issue. I'm guessing the discussion was done just within the teams and not via https://groups.google.com/g/protobuf or some other channels? |
Definitely! I'll pass that along :)
Yeah, this is being communicated via direct contact. |
Sorry, I should be more clear on this. I'd like to request that the Protobuf Language Guide be updated with regards to this because the current behavior when deserializing enum strings already vary across language implementations. Changing it based on some internal discussion won't provide for a valid reference especially if other users have different expectations. Thanks! |
Hi! I'm on the same team as Noah and I've been the one engaging with the protobuf team on this issue, motivated by the immediate use case we're running into. I totally agree that the Language Guide should be updated once a decision is reached. One of the motivations driving the protobuf team's feedback is achieving more consistency across languages, so that is good news. I would expect there will a formal approval once our current, somewhat informal, discussions settle on an approach. Also happy to forward more info internally. |
It would be a lot better if the discussion is being done externally, preferably on the Protobuf issue tracker. It'll at least allow for contributors and users of different implementations to provide feedback, or at the very least understand the rationale behind a decision. We do have active non-Googler contributors in the Go Protobuf open source project, and I'm sure they can provide valuable feedback. A solution for one immediate use case may present issues in other use cases. It'll also make reviewing the CL a lot easier if the reviewers have insight to the discussion. Given the lack of external discussion, I'll go ahead and present my feedback on the proposed solution as being implemented in https://golang.org/cl/350469 here. Even if this was only implemented with the DiscardUnknown option enabled, a JSON value with an unknown integer enum would result in a different deserialized value from a JSON value with an unknown string enum. This can be problematic. https://developers.google.com/protocol-buffers/docs/proto3#enum
While the current parser can accept integer value outside the specified range, it intentionally rejects unknown string value in order to avoid producing a different enum value. A possible solution would be to present a different option for this special treatment of enum values outside of specified range regardless of whether the value is an integer or a string. This would keep the DiscardUnknown option as discarding unknown fields only. While one may think we should simply just implement this in the Go implementation, we have learned through experience that even optional behavior should be specified in the Protobuf spec/guide for consistency as this does present issues if not. |
Those are all good points. The current discussions are motivated by internal use cases, which is why they started internally. I would hope once we have good proposal(s), we can move the discussion to a public forum (but that may not be entirely under my control). As to the cross-language consistency, that is a major point that all parties seem to be agreeing on: the intent was not to submit the Go changes until we have a cross-language consistent plan (which may not even wind up being this one, after all the discussions are done). |
BTW, new conformance test cases were added specifically around unknown enum values in the JSON format: protocolbuffers/protobuf#9534 |
This updates all generated code to match the contents of the latest v22.0 release of Protobuf. This involved a couple of changes to the script that does the sync'ing: 1. The new Protobuf version no longer includes autoconf configuration and instead requires using bazel to build things. 2. The new Protobuf release does not have an artifact named "protobuf-all-${VERSION}.tar.gz", but the one named "protobuf-${VERSION}.tar.gz" has all of the sources and was sufficient for the regenerate.bash script to complete. This change does NOT regenerate the protos related to benchmarks. The Protobuf repo no longer includes benchmarks. The CL removing them says they are superceded by google/fleetbench: protocolbuffers/protobuf@83c499d But that project's proto benchmark files are very different: https://github.com/google/fleetbench/tree/main/fleetbench/proto So I commented out those steps in the generation code since the benchmarks will need some work to reconcile with fleetbench. This code adds known failing tests for conformance. New test cases were added in protocolbuffers/protobuf#9534, but the Go protojson package does not behave according to the new tests. There is an existing issue in GitHub about this: golang/protobuf#1208 Change-Id: Iad796ec7889bc2a74b58db5224facf850cd1a1cd Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/469255 Reviewed-by: Michael Stapelberg <[email protected]> Reviewed-by: Damien Neil <[email protected]>
These three of the new conformance tests in the main protobuf repo fail with Go's protojson implementation: I suspect updating protojson so these test cases pass will be enough to resolve and close this issue. |
What version of protobuf and what language are you using?
Version: 1.24.0 (golang)
What did you do?
We're using protojson to encode our internal protobufs for indexing in ElasticSearch. For interop with other users, we encode enums as strings. At some point, we added a new enum value, which has caused older golang consumers to fail with the following error:
invalid value for enum type: "CLASS_RESULTS_PRE_CLINICAL"
What did you expect to see?
We set
DiscardUnknown
inUnmarshalOptions
, so I would have expected the value to be parsed as the default enum value.What did you see instead?
invalid value for enum type: "CLASS_RESULTS_PRE_CLINICAL"
The text was updated successfully, but these errors were encountered: