Skip to content
This repository has been archived by the owner on Dec 10, 2018. It is now read-only.

force bytes #190

Closed
wbolster opened this issue Mar 20, 2016 · 2 comments
Closed

force bytes #190

wbolster opened this issue Mar 20, 2016 · 2 comments

Comments

@wbolster
Copy link
Contributor

is it possible to force an api to return bytes instead of the messy mix of str (unicode in py2) and bytes, depending on whether the value happened to be valid utf-8?

apache hbase has a bytes-only data model, and happybase, the python library for it, uses thrift to communicate with hbase. however, this results in mixed str/bytes responses:

>>> c = happybase.Connection()
>>> t = c.table('mytable')
>>> t.put(b'\x01\x02', {'cf2:foo\xee\xee\xee': '\xff\xff\xff'}
>>> t.put('foo', {'cf1:bar': u'baz'})
>>> list(t.scan())
[(u'\x01\x02', {'cf2:foo\xee\xee\xee': '\xff\xff\xff'}),
 (u'foo', {u'cf1:bar': u'baz'})]

(this example code was run with python2)

@lxyu
Copy link
Contributor

lxyu commented Mar 24, 2016

You can now add decode_response=False to protocol factory to force return bytes.

@wbolster
Copy link
Contributor Author

awesome, thanks.

dan-blanchard pushed a commit to dan-blanchard/thriftpy that referenced this issue May 6, 2016
jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 14, 2016
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

I'm not intimately familiar with Thrift specification so adding a
separate ttype like this may be stupid - let me know if that is so.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. (Similarly the actual non-test code - I'm not
entirely comfortable with the redundancy this commit introduces but
there's quite a bit of redundancy in the code already so improving this
is likely a whole separate task).

[1] Thriftpy@00c6553
[2] Thriftpy#190
jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 14, 2016
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

I'm not intimately familiar with Thrift specification so adding a
separate ttype like this may be stupid - let me know if that is so.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. (Similarly the actual non-test code - I'm not
entirely comfortable with the redundancy this commit introduces but
there's quite a bit of redundancy in the code already so improving this
is likely a whole separate task).

[1] Thriftpy@00c6553
[2] Thriftpy#190
jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 14, 2016
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

I'm not intimately familiar with Thrift specification so adding a
separate ttype like this may be stupid - let me know if that is so.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. (Similarly the actual non-test code - I'm not
entirely comfortable with the redundancy this commit introduces but
there's quite a bit of redundancy in the code already so improving this
is likely a whole separate task. Also extracting the logic to a separate
function would negatively impact the performance /possibly not
important/ but it could also make the code less readable, I'm conflicted
here).

[1] Thriftpy@00c6553
[2] Thriftpy#190
jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 14, 2016
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

I'm not intimately familiar with Thrift specification so adding a
separate ttype like this may be stupid - let me know if that is so.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. Similarly there's some almost identical code in
Python and Cython protocol implementations but I lack expertise
regarding possible ways to mitigate that.

[1] Thriftpy@00c6553
[2] Thriftpy#190
jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 17, 2016
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

Internally the goal of this change is achieved by keeping the ttype of
binary fields as STRING but putting a non-None value in their
"container spec" which introduces a bit of inconsistency (string fields
are TType.STRING but binary fields are (TType.STRING, BINARY)) but
I believe we can live with it.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. Similarly there's some almost identical code in
Python and Cython protocol implementations but I lack expertise
regarding possible ways to mitigate that.

[1] Thriftpy@00c6553
[2] Thriftpy#190
jstasiak added a commit to jstasiak/thriftpy that referenced this issue May 17, 2016
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

Internally the goal of this change is achieved by keeping the ttype of
binary fields as STRING but putting a non-None value in their
"container spec" which introduces a bit of inconsistency (string fields
are TType.STRING but binary fields are (TType.STRING, BINARY)) but
I believe we can live with it.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. Similarly there's some almost identical code in
Python and Cython protocol implementations but I lack expertise
regarding possible ways to mitigate that.

Casting of values in the parser is fixed as a bonus, the following code
never worked because TType.STRING and TType.BINARY had the same value:

     if t == TType.STRING:
         return _cast_string
     if t == TType.BINARY:
         return _cast_binary

[1] Thriftpy@00c6553
[2] Thriftpy#190
jstasiak added a commit to jstasiak/thriftpy that referenced this issue Sep 25, 2016
I believe this improves [1] which addresses a bytes/str inconsistency issue
reported some time ago[2]. The goal of this patch is to have an option
to always deseriaize binary fields as bytes and string fields as str (on
Python 2, respectively: str and unicode). It achieves it by adding a
third option to previously purely boolean decode_response parameter (I
admit I'm not a fan of mixing types like this but I'm doing this
strictly for backwards compatibility and I'd argue the "auto" behavior
should become the default in the future and decode_response should be
removed altogether).

Internally the goal of this change is achieved by keeping the ttype of
binary fields as STRING but putting a non-None value in their
"container spec" which introduces a bit of inconsistency (string fields
are TType.STRING but binary fields are (TType.STRING, BINARY)) but
I believe we can live with it.

I've renamed and reorganized parts of the code in order for the names to
be reasonably truthful and to avoid too much code redundancy connected
to the fact that I've added more tests. Ideally some of the test code
would be shared so reduce the redundancy even further but I feel that
it's out of scope here. Similarly there's some almost identical code in
Python and Cython protocol implementations but I lack expertise
regarding possible ways to mitigate that.

Casting of values in the parser is fixed as a bonus, the following code
never worked because TType.STRING and TType.BINARY had the same value:

     if t == TType.STRING:
         return _cast_string
     if t == TType.BINARY:
         return _cast_binary

[1] Thriftpy@00c6553
[2] Thriftpy#190
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants