Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObjectStr Text is incompatible with pre-2.0 MessagePack spec #36

Open
l29ah opened this issue Jul 24, 2020 · 7 comments
Open

ObjectStr Text is incompatible with pre-2.0 MessagePack spec #36

l29ah opened this issue Jul 24, 2020 · 7 comments

Comments

@l29ah
Copy link
Contributor

l29ah commented Jul 24, 2020

As the spec allows non-unicode content, while Text can't accept arbitrary bytes.

What would be a good way to augment msgpack-types to avoid forking the library or breaking code compatibility by s/Text/ByteString/? Python's msgpack library uses use_bin_type and raw options to handle the old format, but i don't see how to do similar stuff in Haskell.

@iphydf
Copy link
Member

iphydf commented Jul 24, 2020

By "pre-2.0", do you mean it is compatible with the post-2.0 spec?

Text accepts arbitrary Unicode code points, which includes all code points from 0 to 255. We could use that to encode arbitrary bytes. This is what we do in JSON::XS (and JSON::PP) for Perl.

l29ah added a commit to l29ah/hs-msgpack-types that referenced this issue Jul 24, 2020
l29ah added a commit to l29ah/hs-msgpack-binary that referenced this issue Jul 24, 2020
@l29ah
Copy link
Contributor Author

l29ah commented Jul 24, 2020

It is compatible, except that 2.0 tells the ObjectStr is UTF-8, while it wasn't limited earlier.

Well, now i observe parsing failures with unmodified msgpack-binary and msgpack-types when reading arbitrary bytes in the strings:
expected:

ObjectMap [(ObjectStr "i",ObjectWord 2),(ObjectStr "r",ObjectStr "]zaxC\140\DELD\153\vK\NUL$\246\170W\DC3\203\172\147W\236HKo\249\205\DC1\169\156E\202")]

actual:

hyborg: ParseError {unconsumed = "\161i\STX\161r\218\NUL ]zaxC\140\DELD\153\vK\NUL$\246\170W\DC3\203\172\147W\236HKo\249\205\DC1\169\156E\202", offset = 1, content = "Data.Binary.Get(Alternative).empty"}

@kirelagin
Copy link

One option would be to use a //ROUNDTRIP encoding as GHC does for things like filenames, the only problem is that I don’t know if there is an easy way to use this kind of TextEncoding to decode Text.

Another option would be to say that this library only supports MessagePack >= 2.0 and, honestly, I think this one makes the most sense.

@settings settings bot removed the triage label Nov 26, 2021
@epoberezkin
Copy link

Possibly, the unused config could be extended to support safe utf8 decoding. A separate issue?

@iphydf
Copy link
Member

iphydf commented Jan 29, 2024

Yes the config makes sense to be used for that. I'm mostly in favour of supporting only 2.0 (and higher if any higher happens). How would you suggest the semantics to be for safe utf8?

@l29ah
Copy link
Contributor Author

l29ah commented Jan 29, 2024

Possibly, the unused config could be extended to support safe utf8 decoding. A separate issue?

I don't think you can supply any configuration to a Get instance. Either moving to a non-UTF8-requiring type completely, or making distinct 1.0-compatible modules, will make more sense.

Currently i went the former way:
https://github.com/l29ah/hs-msgpack-types
https://github.com/l29ah/hs-msgpack-binary

@epoberezkin
Copy link

I'm mostly in favour of supporting only 2.0

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants