force bytes #190

wbolster · 2016-03-20T20:13:43Z

is it possible to force an api to return bytes instead of the messy mix of str (unicode in py2) and bytes, depending on whether the value happened to be valid utf-8?

apache hbase has a bytes-only data model, and happybase, the python library for it, uses thrift to communicate with hbase. however, this results in mixed str/bytes responses:

>>> c = happybase.Connection()
>>> t = c.table('mytable')
>>> t.put(b'\x01\x02', {'cf2:foo\xee\xee\xee': '\xff\xff\xff'}
>>> t.put('foo', {'cf1:bar': u'baz'})
>>> list(t.scan())
[(u'\x01\x02', {'cf2:foo\xee\xee\xee': '\xff\xff\xff'}),
 (u'foo', {u'cf1:bar': u'baz'})]

(this example code was run with python2)

The text was updated successfully, but these errors were encountered:

lxyu · 2016-03-24T06:31:08Z

You can now add decode_response=False to protocol factory to force return bytes.

wbolster · 2016-03-24T10:23:47Z

awesome, thanks.

I believe this improves [1] which addresses a bytes/str inconsistency issue reported some time ago[2]. The goal of this patch is to have an option to always deseriaize binary fields as bytes and string fields as str (on Python 2, respectively: str and unicode). It achieves it by adding a third option to previously purely boolean decode_response parameter (I admit I'm not a fan of mixing types like this but I'm doing this strictly for backwards compatibility and I'd argue the "auto" behavior should become the default in the future and decode_response should be removed altogether). I'm not intimately familiar with Thrift specification so adding a separate ttype like this may be stupid - let me know if that is so. I've renamed and reorganized parts of the code in order for the names to be reasonably truthful and to avoid too much code redundancy connected to the fact that I've added more tests. Ideally some of the test code would be shared so reduce the redundancy even further but I feel that it's out of scope here. (Similarly the actual non-test code - I'm not entirely comfortable with the redundancy this commit introduces but there's quite a bit of redundancy in the code already so improving this is likely a whole separate task). [1] Thriftpy@00c6553 [2] Thriftpy#190

I believe this improves [1] which addresses a bytes/str inconsistency issue reported some time ago[2]. The goal of this patch is to have an option to always deseriaize binary fields as bytes and string fields as str (on Python 2, respectively: str and unicode). It achieves it by adding a third option to previously purely boolean decode_response parameter (I admit I'm not a fan of mixing types like this but I'm doing this strictly for backwards compatibility and I'd argue the "auto" behavior should become the default in the future and decode_response should be removed altogether). I'm not intimately familiar with Thrift specification so adding a separate ttype like this may be stupid - let me know if that is so. I've renamed and reorganized parts of the code in order for the names to be reasonably truthful and to avoid too much code redundancy connected to the fact that I've added more tests. Ideally some of the test code would be shared so reduce the redundancy even further but I feel that it's out of scope here. (Similarly the actual non-test code - I'm not entirely comfortable with the redundancy this commit introduces but there's quite a bit of redundancy in the code already so improving this is likely a whole separate task. Also extracting the logic to a separate function would negatively impact the performance /possibly not important/ but it could also make the code less readable, I'm conflicted here). [1] Thriftpy@00c6553 [2] Thriftpy#190

I believe this improves [1] which addresses a bytes/str inconsistency issue reported some time ago[2]. The goal of this patch is to have an option to always deseriaize binary fields as bytes and string fields as str (on Python 2, respectively: str and unicode). It achieves it by adding a third option to previously purely boolean decode_response parameter (I admit I'm not a fan of mixing types like this but I'm doing this strictly for backwards compatibility and I'd argue the "auto" behavior should become the default in the future and decode_response should be removed altogether). I'm not intimately familiar with Thrift specification so adding a separate ttype like this may be stupid - let me know if that is so. I've renamed and reorganized parts of the code in order for the names to be reasonably truthful and to avoid too much code redundancy connected to the fact that I've added more tests. Ideally some of the test code would be shared so reduce the redundancy even further but I feel that it's out of scope here. Similarly there's some almost identical code in Python and Cython protocol implementations but I lack expertise regarding possible ways to mitigate that. [1] Thriftpy@00c6553 [2] Thriftpy#190

I believe this improves [1] which addresses a bytes/str inconsistency issue reported some time ago[2]. The goal of this patch is to have an option to always deseriaize binary fields as bytes and string fields as str (on Python 2, respectively: str and unicode). It achieves it by adding a third option to previously purely boolean decode_response parameter (I admit I'm not a fan of mixing types like this but I'm doing this strictly for backwards compatibility and I'd argue the "auto" behavior should become the default in the future and decode_response should be removed altogether). Internally the goal of this change is achieved by keeping the ttype of binary fields as STRING but putting a non-None value in their "container spec" which introduces a bit of inconsistency (string fields are TType.STRING but binary fields are (TType.STRING, BINARY)) but I believe we can live with it. I've renamed and reorganized parts of the code in order for the names to be reasonably truthful and to avoid too much code redundancy connected to the fact that I've added more tests. Ideally some of the test code would be shared so reduce the redundancy even further but I feel that it's out of scope here. Similarly there's some almost identical code in Python and Cython protocol implementations but I lack expertise regarding possible ways to mitigate that. [1] Thriftpy@00c6553 [2] Thriftpy#190

I believe this improves [1] which addresses a bytes/str inconsistency issue reported some time ago[2]. The goal of this patch is to have an option to always deseriaize binary fields as bytes and string fields as str (on Python 2, respectively: str and unicode). It achieves it by adding a third option to previously purely boolean decode_response parameter (I admit I'm not a fan of mixing types like this but I'm doing this strictly for backwards compatibility and I'd argue the "auto" behavior should become the default in the future and decode_response should be removed altogether). Internally the goal of this change is achieved by keeping the ttype of binary fields as STRING but putting a non-None value in their "container spec" which introduces a bit of inconsistency (string fields are TType.STRING but binary fields are (TType.STRING, BINARY)) but I believe we can live with it. I've renamed and reorganized parts of the code in order for the names to be reasonably truthful and to avoid too much code redundancy connected to the fact that I've added more tests. Ideally some of the test code would be shared so reduce the redundancy even further but I feel that it's out of scope here. Similarly there's some almost identical code in Python and Cython protocol implementations but I lack expertise regarding possible ways to mitigate that. Casting of values in the parser is fixed as a bonus, the following code never worked because TType.STRING and TType.BINARY had the same value: if t == TType.STRING: return _cast_string if t == TType.BINARY: return _cast_binary [1] Thriftpy@00c6553 [2] Thriftpy#190

wbolster mentioned this issue Mar 20, 2016

Port to thriftpy python-happybase/happybase#114

Merged

lxyu closed this as completed in 00c6553 Mar 24, 2016

wbolster mentioned this issue Mar 28, 2016

decode_response flag only works partially #193

Closed

wbolster mentioned this issue Apr 29, 2016

How to disable thriftpy's auto decoding? python-happybase/happybase#122

Closed

dan-blanchard pushed a commit to dan-blanchard/thriftpy that referenced this issue May 6, 2016

add decode_response option for protocol, fix Thriftpy#190

35a9889

jstasiak mentioned this issue May 14, 2016

Treat binary and string fields differently #202

Open

pyup-bot mentioned this issue Aug 18, 2016

Initial Update scieloorg/opac_proc#10

Closed

pyup-bot mentioned this issue Aug 26, 2016

Pin thriftpy to latest version 0.3.9 scieloorg/scielo-manager#1314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

force bytes #190

force bytes #190

wbolster commented Mar 20, 2016

lxyu commented Mar 24, 2016

wbolster commented Mar 24, 2016

force bytes #190

force bytes #190

Comments

wbolster commented Mar 20, 2016

lxyu commented Mar 24, 2016

wbolster commented Mar 24, 2016