Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Cython in HTTP and fix TCyBufferedTransport early flush issue #129

Merged
merged 4 commits into from
Mar 16, 2020

Conversation

aiudirog
Copy link
Collaborator

@aiudirog aiudirog commented Mar 16, 2020

TL;DR
This PR solves two issues:

  • TCyBufferedTransport flusing the underlying transport before the message is complete
  • HTTP client only supporting pure Python TBufferedTransport

While developing an alternative HTTP client to the one in thriftpy2.http using HTTPX (so I can use the same GSSAPI authentication on both sync and async) I ran into the issue described here:

Also, using TCyBufferedTransportFactory will let THttpClient write a broken string to server, which making server freezed in transport.readall() method.

After some debugging, I discovered that this is because TCyBufferedTransport flushes its underlying transport when its buffer size will be exceeded by a write. For streaming transports, like TSocket, this isn't an issue. However, for HTTP (and other message based transports) the flush method creates a request and sends that to the server. The request is supposed to be a complete Thrift message, however the early flushing behavior of TCyBufferedTransport causes an incomplete message to be sent.

My proposed solution is to not flush the underlying transport until TCyBufferedTransport's flush() or c_flush() methods are called directly. I did this by adding c_dump_wbuf() which checks if there is data to write and, if so will, writes the data into the underlying transport and then clears the write buffer. Now, when the write buffer is too full, this method is used to empty the buffer into the underlying transport without flushing it. c_flush() can then be simplified to dump the write buffer and then flush the underlying transport.


For anyone else who comes across the other issues listed in the comments of http.py:

  • Cannot convert TBufferedTransport to thriftpy2.transport.cybase.CyTransportBase is caused by TCyBinaryProtocol declaring the transport as a CyTransportBase for performance. Wrapping your custom transport in TBufferedTransport (which would end up being TCyBufferedTransport) solves this issue because it doesn't care about the type of the wrapped transport.
  • too small buffer allocated by TCyBufferedTransport is caused by passing a buffer size less than 1024 to TCyBufferedTransport. The HTTP client is passing this value explicitly to make sure only one call was every made to self.rfile.read(size) because an over-read like the buffered transports normally do can cause the read socket to timeout. I'm not sure why this occurs, since normal sockets typically just return what they have at the moment up to size, but I fought with it for about an hour before deciding to work around it.

I was able to fix these two issues and enable arbitrary server transport support by simply pre-reading the full content into a BytesIO (which is what was internally happening anyways with the buf_size hack) and passing that to whatever transport factory the user chooses.

Fix issues related to cython support in http

Signed-off-by: Roger Aiudi <[email protected]>
@aiudirog aiudirog requested a review from ethe March 16, 2020 02:52
@aiudirog
Copy link
Collaborator Author

I tested 3.6->3.8 before pushing, looks like I need to be more thorough. I'll figure out what's up with 2.7 and 3.5 before tomorrow.

@ethe
Copy link
Member

ethe commented Mar 16, 2020

I find that you modify the behavior of TCyBufferedTransport.c_write, would it also change the behavior of other servers rather than HTTP only? TCyBufferedTransport would flush after each size reaches the limitation, but now it only flushes at the end.

@aiudirog
Copy link
Collaborator Author

aiudirog commented Mar 16, 2020

I changed it to just not flush the underlying transport, which is how the regular TBufferedTransport works already. I went through every transport and only TCyBufferedTransport calls flush on the wrapped transport inside its write, so I would say that it is safe to say that it is in the wrong and anything relying on that behavior shouldn't have been, especially with flush being a top level call.

The only tests which are failing are the HTTP ones and that is for a different reason altogether I'm debugging now.

Edit: IMO, the Cython implementations should only improve speed and should do their absolute best to not affect functionality.

@ethe
Copy link
Member

ethe commented Mar 16, 2020

I agree.

@ethe
Copy link
Member

ethe commented Mar 16, 2020

I checked the pure Python version of TBufferedTransport, you are right.

@aiudirog
Copy link
Collaborator Author

aiudirog commented Mar 16, 2020

Okay, I found the bug: TCyBinaryProtocol calls flush() in write_message_end() but no other protocol does. TClient._send() flushes the transport directly after calling write_message_end() which means there is a double flush when using TCyBinaryProtocol compared to the other protocol implementations. So, the HTTP client sends an empty request because it sends a request every flush.

I see this as two bugs:

  1. The HTTP client should ignore a double flush when it doesn't have data
  2. The binary protocol shouldn't be flushing in the message end since no other protocol does

Thoughts?

@ethe
Copy link
Member

ethe commented Mar 16, 2020

I think we should correct the TCyBinaryProtocol as other protocols.

@aiudirog
Copy link
Collaborator Author

Awesome, I'll push my commit and let's see if it works. I tested on 2.7 this time, so fingers crossed.

Ignore empty flush in THTTPClient

Signed-off-by: Roger Aiudi <[email protected]>
Copy link
Member

@ethe ethe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI passed, it looks good to me.

@aiudirog
Copy link
Collaborator Author

FYI, I noticed that the coverage reports aren't running, even in master. Might be something to check out.

@codecov
Copy link

codecov bot commented Mar 16, 2020

Codecov Report

Merging #129 into master will decrease coverage by 0.01%.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #129      +/-   ##
==========================================
- Coverage   83.11%   83.09%   -0.02%     
==========================================
  Files          43       43              
  Lines        3908     3911       +3     
==========================================
+ Hits         3248     3250       +2     
- Misses        660      661       +1
Impacted Files Coverage Δ
thriftpy2/http.py 87% <92.3%> (-0.36%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ff04dc...427ac21. Read the comment docs.

@aiudirog
Copy link
Collaborator Author

:) I'll merge it in then, thanks for being available so quickly on a Monday morning.

@ethe
Copy link
Member

ethe commented Mar 16, 2020

There is a duplicate job in CI, code coverage works fine in https://travis-ci.com/github/Thriftpy/thriftpy2/jobs/298409113. I will try to remove another one.

@aiudirog
Copy link
Collaborator Author

Ah got it, no problem then.

@aiudirog aiudirog merged commit 7a293e6 into Thriftpy:master Mar 16, 2020
@aiudirog
Copy link
Collaborator Author

@ethe Do you think you can do a release within the next week or so?

By the way, here is the client I'm developing: https://github.com/aiudirog/ThriftPy2-HTTPX-Client. I decided not to send it as a PR to contrib since it's 3.6+ only and adds additional dependencies.

@ethe
Copy link
Member

ethe commented Mar 17, 2020

OK, I will launch a release this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants