Skip to content

encoding/json: add Decoder.DisallowDuplicateFields #48298

@dsnet

Description

@dsnet

The presence of duplicate fields in JSON input is almost always a bug from the sender and the behavior across various implementations is highly inconsistent. It's too late to switch the current behavior to always reject duplicate fields in the current package, but we can provide an option to enforce stricter checks. As such, I propose adding a Decoder.DisallowDuplicateFields option.


Background

Per RFC 8259, section 4, the handling of duplicate names is left as undefined behavior. Rejecting such inputs is within the realm of valid behavior. Tim Bray, the author of RFC 8259, actually recommends going beyond RFC 8259 and that implementations should instead target compliance with RFC 7493. RFC 7493 is a fully compatible subset of RFC 8259, which makes strict decisions about behavior that RFC 8259 leaves undefined (including the rejection of duplicate names).

The lack of duplicate name rejection has correctness implications where roundtrip unmarshal/marshal does not result in semantically equivalent JSON, and surprising behavior for users when they accidentally send JSON objects with duplicate names. In such a case, the current behavior is actually somewhat inconsistent and difficult to explain.

The lack of duplicate name rejection may have security implications since it becomes difficult for a security tool to validate the semantic meaning of a JSON object since meaning is inherently undefined in the presence of duplicate names.


Implementation

A naive implementation can remember all seen names in a Go map. A more clever implementation can take advantage of the fact that we are almost always unmarshaling into a Go map or Go struct. In the case of a Go map, we can use the Go map itself as a means to detect duplicate names. In the case of a Go struct, we can convert a JSON name into an index (i.e., the field index in the Go struct), and then use a an efficient bitmap to detect whether we saw the name before.

In the common case, there would be no performance slow downs to enabling checks for duplicate names.


Aside: I'm not fond of the name Fields since JSON terminology calls this either a "name" or "member" (per RFC 8259, section 4). However, it is consistent with the existing DisallowUnknownFields option.

\cc @bradfitz @crawshaw @mvdan

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Accepted

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions