Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TAsyncProtocolBase and TAsyncTransportBase #108

Merged
merged 10 commits into from
Dec 11, 2019

Conversation

aiudirog
Copy link
Collaborator

@aiudirog aiudirog commented Dec 7, 2019

As promised, I'm implementing proper interfaces and cleaning up some parts of the code. I'll let you know when this is ready.:)

Comment on lines 43 to 52
def _read(self, sz):
"""
Internal read method which can read up to `sz` bytes but doesn't
need to return them all.
"""
raise NotImplementedError

def read(self, sz):
"""Get exactly `sz` bytes from the underlying connection."""
return readall(self._read, sz)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original TTransportBase seems to declare that read() will always return sz bytes. However not every transport on its own does this. TFramedTransport is a good example because it always assumes it will be wrapped by TBufferedTransport and would therefore time out if it tried to read everything it was asked to. Might it be best to have the protocol call readall() instead? Then we would remove _read() and make read() an abstract method.

If we decide not to make that change, I'll update this doc-string to say that if read() isn't going to return all the requested bytes, the transport must be wrapped by one that will.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think read must return sz bytes, that is the reason I modified the TAsyncBufferedTransport

Copy link
Member

@ethe ethe Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If transport can not find a way to return enough length data, it should raise an END_OF_FILE exception.

Copy link
Member

@ethe ethe Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea about readall method, could you please tell me which cases are this method used in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The socket should always be able to furnish the data requested by the protocol, otherwise EOF would get raised. However transports are designed to wrap each other to form layers and, internally, calls to read() between wrapped transports can't always return sz bytes.

Like we discussed previously, TSocket has to be wrapped by TBufferedTransport in order to be used as a transport despite it sporting the same API. However, because of the nature of TBufferedTransport and it requesting as much data as possible from the sub-transport to fill its buffer, the read method of the sub-transport can't always return sz bytes. This is the case with TFramedTransport which, despite inheriting the readall version of read() from TTransportBase, does not implement _read() (to the dismay of most linters:P) and instead overrides read() so that it doesn't have to return all sz bytes. This allows it to be wrapped by TBufferedTransport without running into timeout errors when it can't furnish enough data to fill the buffer but it also makes the wrapping required to ensure all the data is returned.

This leads to a weird dynamic where a transport has to make one of two choices:

  1. Have its read() return exactly sz bytes and not support being wrapped by TBufferedTransport
  2. Have its read() return whatever is in the underlying transport and require being wrapped by TBufferedTransport

Basically, the current implementation requires that the outermost transport has to return exactly sz bytes from its read() and that inner transport just returns what it can get its hands on and will be called again if it didn't get enough. The means that transports like TFramedTransport can't be put on the outer layer and anyone developing a custom transport needs to know the rules above, which aren't obvious from looking at the base class.

I propose we could make this easier one of two ways:

  • Have the protocol be in charge of calling readall() instead of TBufferedTransport.
    • This would require quite a few changes in places and the more I look at it, the more I don't like it.
  • Have two public methods for reading: read() which returns exactly sz bytes and read1() which returns up to sz bytes.
    • This matches the standard Python Buffer API from io.IOBase
      • I don't particularly like the name read1, but I chose it since it matches the standard API. We could always change it to something like read_up_to.
    • read1() would be used for intra-transport reads like TBufferedTransport to TFramedTransport and read() would be used by the protocol to guarentee all of the data gets read.
    • Implementation would be quite easy: the existing _read() method would become read1() and serve the same purpose, only publicly. TTransportBase.read() would stay exactly as is and be inherited by every transport so they would only have to define a read1().

Either change would allow transports to wrap each other arbitrarily, making it easier to develop new, more complex transports in the future.

I'll mock up a commit of the second one and send it over. We can always drop it.

Copy link
Collaborator Author

@aiudirog aiudirog Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aiudirog@d2bcc63

It's not 100% perfect (I don't like the sockets breaking the read1 vs read idea, but they have to for compatibility with Cython code) but it passes the test cases and makes transport implementation more consistent.

Edit: This change took me like 10 minutes to make, so if you don't like it it's not a lot of time lost.

Copy link
Member

@ethe ethe Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have got your point. TFramedTransport does not need to return data exactly equal to the required size, but TBufferedTransport does. However, I think it is better to keep this behavior as before, leaving some comments to the read method to describe this behavior is fine I think.

Copy link
Member

@ethe ethe Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like the design of read and read1 or read_up_to, but it seems like replacing read with readall would break the compatibility

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving some comments to the read method to describe this behavior is fine I think.

Sounds good. As long as it's clearly documented it shouldn't be a problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to see how Java did it and they actually have a read() and readall(): https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/transport/TTransport.java#L60

I'll update the documentation for now though.

Coverage tests seem to cause the loop itself to close before running the tear down.

Signed-off-by: aiudirog <[email protected]>
@codecov
Copy link

codecov bot commented Dec 7, 2019

Codecov Report

Merging #108 into master will decrease coverage by 0.25%.
The diff coverage is 76.2%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #108      +/-   ##
==========================================
- Coverage   80.21%   79.95%   -0.26%     
==========================================
  Files          39       43       +4     
  Lines        3811     3887      +76     
==========================================
+ Hits         3057     3108      +51     
- Misses        754      779      +25
Impacted Files Coverage Δ
thriftpy2/contrib/aio/protocol/binary.py 57.8% <100%> (+0.24%) ⬆️
thriftpy2/transport/__init__.py 79.16% <100%> (-6.84%) ⬇️
thriftpy2/contrib/aio/transport/framed.py 92.85% <100%> (ø) ⬆️
thriftpy2/transport/framed/__init__.py 88.63% <100%> (ø) ⬆️
thriftpy2/transport/buffered/__init__.py 95.12% <100%> (ø) ⬆️
thriftpy2/contrib/aio/protocol/__init__.py 100% <100%> (ø) ⬆️
thriftpy2/protocol/__init__.py 86.66% <100%> (+0.95%) ⬆️
thriftpy2/transport/memory/__init__.py 88.23% <100%> (ø) ⬆️
thriftpy2/contrib/aio/transport/buffered.py 94.73% <100%> (+0.73%) ⬆️
thriftpy2/protocol/binary.py 90.31% <100%> (+0.03%) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c398e2b...d50374a. Read the comment docs.

@aiudirog aiudirog requested a review from ethe December 7, 2019 17:42
@aiudirog
Copy link
Collaborator Author

aiudirog commented Dec 7, 2019

@ethe I'm all set for now. Let me know what you think and if there is anything else you want to add.

@ethe ethe changed the title Housekeeping Add TAsyncProtocolBase Dec 10, 2019
@ethe ethe changed the title Add TAsyncProtocolBase Add TAsyncProtocolBase and TAsyncTransportBase Dec 10, 2019
@aiudirog
Copy link
Collaborator Author

I updated the docstring and included a link to our discussion. Since you approved the previous changes, I'll go ahead and merge it in. Thanks!

@aiudirog aiudirog merged commit d6826ef into Thriftpy:master Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants