Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support automatic conversion between text and binary string representations #476

Merged

Conversation

benluddy
Copy link
Contributor

@benluddy benluddy commented Jan 25, 2024

Description

This is a proof-of-concept solution for #449. It includes a test that roundtrips a struct value with fields of types []byte and [5]byte to interface{} and back to the original struct type, via both CBOR (using the features here) and via encoding/json.

    json_test.go:58: original: cbor_test.S{Bytes:[]uint8{0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64}, Arr:[5]uint8{0x68, 0x65, 0x6c, 0x6c, 0x6f}}
    json_test.go:64: original to json: {"bytes":"aGVsbG8gd29ybGQ=","arr":[104,101,108,108,111]}
    json_test.go:74: original to cbor: {'bytes': 22('hello world'), 'arr': [104, 101, 108, 108, 111]}
    json_test.go:81: json to interface{}: map[string]interface {}{"arr":[]interface {}{104, 101, 108, 108, 111}, "bytes":"aGVsbG8gd29ybGQ="}
    json_test.go:88: cbor to interface{}: map[string]interface {}{"arr":[]interface {}{0x68, 0x65, 0x6c, 0x6c, 0x6f}, "bytes":"aGVsbG8gd29ybGQ="}
    json_test.go:94: interface{} to json: {"arr":[104,101,108,108,111],"bytes":"aGVsbG8gd29ybGQ="}
    json_test.go:104: interface{} to cbor: {'bytes': 'aGVsbG8gd29ybGQ=', 'arr': [104, 101, 108, 108, 111]}
    json_test.go:108: native-to-interface{} via cbor differed from native-to-interface{} via json
    json_test.go:116: json to native: cbor_test.S{Bytes:[]uint8{0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64}, Arr:[5]uint8{0x68, 0x65, 0x6c, 0x6c, 0x6f}}
    json_test.go:126: cbor to native: cbor_test.S{Bytes:[]uint8{0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64}, Arr:[5]uint8{0x68, 0x65, 0x6c, 0x6c, 0x6f}}

PR Was Proposed and Welcomed in Currently Open Issue

Checklist (for code PR only, ignore for docs PR)

  • Include unit tests that cover the new code
  • Pass all unit tests
  • Pass all 18 ci linters (golint, gosec, staticcheck, etc.)
  • Sign each commit with your real name and email.
    Last line of each commit message should be in this format:
    Signed-off-by: Firstname Lastname [email protected]
  • Certify the Developer's Certificate of Origin 1.1
    (see next section).

Certify the Developer's Certificate of Origin 1.1

  • By marking this item as completed, I certify
    the Developer Certificate of Origin 1.1.
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@benluddy benluddy force-pushed the stdlib-json-byteslice-compatibility branch 2 times, most recently from b06ee3f to 139da95 Compare January 25, 2024 23:48
@benluddy benluddy force-pushed the stdlib-json-byteslice-compatibility branch from 139da95 to 0663f95 Compare February 1, 2024 16:43
@fxamacker
Copy link
Owner

Thanks @benluddy for detailed write up in #449 and this POC!

The POC covers all use cases you mentioned and the roundtrip tests illustrating those use cases are great!

I see that TextConversionMode has 3 options, including the TextConversionEncodeToText option.

Since tag numbers 21-23 are defined in Section 3.4.5.2 of RFC 8949, what do you think about moving TextConversionEncodeToText option into it's own separate mode?

For example, we can add a decoding mode with 2 options:

  • by default, ignore tag number 21-23 and only decode tag content for backward compatibility
  • decode tag number 21-23 as well as tag content (as handled by POC)

This new decoding mode for tags 21-23 would match the new encoding mode ByteSliceMode added by the POC.

It can also help simplify TextConversionMode since the new mode can be used without being tied to TextConversions.

Thoughts?

decode.go Outdated Show resolved Hide resolved
@benluddy
Copy link
Contributor Author

I see that TextConversionMode has 3 options, including the TextConversionEncodeToText option.

Since tag numbers 21-23 are defined in Section 3.4.5.2 of RFC 8949, what do you think about moving TextConversionEncodeToText option into it's own separate mode?

For example, we can add a decoding mode with 2 options:

* by default, ignore tag number 21-23 and only decode tag content for backward compatibility

* decode tag number 21-23 as well as tag content (as handled by POC)

This new decoding mode for tags 21-23 would match the new encoding mode ByteSliceMode added by the POC.

It can also help simplify TextConversionMode since the new mode can be used without being tied to TextConversions.

Thoughts?

Yes, that sounds reasonable. I waffled a bit on the shape of the decode options before ending up with the current iteration. It was clear that the exact text conversion behavior depends on the type of the destination Go value, but less clear whether or not it was useful to make those types configurable. The example in the roundtrip test (string: encode, []byte: decode) is sufficient for my use case.

Would you prefer to see the TextConversions function replaced entirely by two decode options: one for decoding into strings and one for byte slices?

@fxamacker
Copy link
Owner

Would you prefer to see the TextConversions function replaced entirely by two decode options: one for decoding into strings and one for byte slices?

@benluddy Yes, that would be great! 👍 Thanks for suggesting it.

@benluddy benluddy force-pushed the stdlib-json-byteslice-compatibility branch from 0663f95 to 6178d1c Compare April 16, 2024 21:13
@benluddy
Copy link
Contributor Author

Updated based on the discussion from last month. It should be fairly close to its final shape now. I'll follow up soon to fill out the test coverage.

@benluddy benluddy force-pushed the stdlib-json-byteslice-compatibility branch 2 times, most recently from e164d09 to 7474624 Compare April 17, 2024 20:07
@benluddy benluddy marked this pull request as ready for review April 17, 2024 20:09
@benluddy
Copy link
Contributor Author

OK, I think this is ready now.

While I was filling in the test coverage gaps, I realized that only one new decode option was required. The existing option ByteStringToString has modes "forbidden" (error on decoding byte string into string) and "allowed" (no error on decoding byte string into string). Rather than add a new option to control the behavior of decoding expected-later-encoding-tagged byte strings into strings -- which would itself require ByteStringToStringAllowed to be useful -- I added a third mode to ByteStringToString that extends ByteStringToStringAllowed by automatically applying the text encoding.

Please take another look!

Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benluddy Thanks for updating this PR!

After wrapping up some other PR reviews this weekend, I only had time left to review encoding changes and related tests in this PR.

I left a couple comments, e.g. we should encode tags 21-23 before encoding other user registered tags.

encode.go Outdated Show resolved Hide resolved
encode.go Show resolved Hide resolved
@benluddy benluddy force-pushed the stdlib-json-byteslice-compatibility branch from 7474624 to e922c80 Compare April 29, 2024 16:52
These options improve interoperability with programs that use JSON to encode and decode objects to
and from both struct types and empty interface values.

Signed-off-by: Ben Luddy <[email protected]>
@benluddy benluddy force-pushed the stdlib-json-byteslice-compatibility branch from e922c80 to 83e9c2b Compare April 29, 2024 16:53
@benluddy
Copy link
Contributor Author

Hi @fxamacker, I've pushed changes to address your feedback on the encode side. Thanks for the review!

@benluddy benluddy requested a review from fxamacker April 29, 2024 16:56
Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benluddy Thanks for updating this PR and great discussions! LGTM! 👍

@fxamacker fxamacker merged commit ca79194 into fxamacker:master May 5, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants