alts: Fix TsiSocket doWrite on outstanding handshake bytes handling#16488
alts: Fix TsiSocket doWrite on outstanding handshake bytes handling#16488antoniovicente merged 13 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: yihuaz <yihuaz@google.com>
8f2cad7 to
4dd39ad
Compare
antoniovicente
left a comment
There was a problem hiding this comment.
Thanks for the quick debug and fix.
| // Check if we need to flush outstanding handshake bytes. | ||
| if (write_buffer_contains_handshake_bytes_) { | ||
| ASSERT(raw_write_buffer_.length() > 0); | ||
| if (raw_write_buffer_.length() > 0 && prev_bytes_to_drain_ == 0) { |
There was a problem hiding this comment.
It would be helpful to add tests to cover the case where doHandshakeNextDone adds bytes and how this version behaves better than the case where we only set "write_buffer_contains_handshake_bytes_ = true" in one special case.
Signed-off-by: yihuaz <yihuaz@google.com>
1f00a25 to
c62dfa2
Compare
|
Regarding test, we have |
The existing test didn't catch the regression introduced by #15962, so I think we have some gaps in test coverage that we should address in this PR. |
| Network::IoResult result = raw_buffer_socket_->doWrite(raw_write_buffer_, false); | ||
| if (handshake_complete_ && raw_write_buffer_.length() > 0) { | ||
| write_buffer_contains_handshake_bytes_ = true; | ||
| if (handshake_complete_) { |
There was a problem hiding this comment.
An error could be detected on doWrite which could cause the connection to be closed. I think that this raiseEvent could then cause a crash if the connection is already closed.
Similarly, the call to doWrite after raiseEvent(Connected) needs to check the status of the connection. If the raiseEvent(Connected) closes the connection the doWrite may run into some trouble. IIRC close on raiseEvent(Connected) does happen in the case of SSL sockets.
There was a problem hiding this comment.
Which doWrite do you mean in "Similarly, the call to doWrite after raiseEvent(Connected) needs to check the status of the connection"? We do not call doWrite after raiseEvent in this PR.
Signed-off-by: yihuaz <yihuaz@google.com>
d667d33 to
ec665cd
Compare
The reason why we did not catch the regression is |
I recommend an integration test that uses tsi socket from the test client. A write down this socket before handshake is complete may be able to repo the failure scenario that we saw on our internal extensions when tsi socket was updated. |
Nice suggestion! Let me play around with the integration test to see if we can reproduce it. |
| // There should be no handshake bytes in raw_write_buffer_. | ||
| ASSERT(!(raw_write_buffer_.length() > 0 && prev_bytes_to_drain_ == 0)); | ||
| while (true) { | ||
| uint64_t bytes_to_drain_this_iteration = |
There was a problem hiding this comment.
See https://github.com/envoyproxy/envoy/pull/16514/files
Could you merge upstream/main and consider changing the "std::min<size_t>(" to "std::min<uint64_t>("
There was a problem hiding this comment.
Sure thing. Done.
Signed-off-by: yihuaz <yihuaz@google.com>
2c71305 to
713d4c4
Compare
@antoniovicente, I tried to add some data to client's buffer before starting an ALTS connection, and it seems one place where we can add write() API is before this API returns in makeRawHttpConnection() in http_integration.cc Then, I got stuck on how to add a partial HTTP request via the above write API because it seems an HTTP request is encoded via an RequestEncoder which does not provide an API that returns the final string to be sent to the peer. In other words, it does not provide an API that we can use to manipulate the HTTP request. Ideally, if there are APIs that can send/receive a raw string over the ALTS connection, we can easily use them to test the above regression. Could you please provide some guidance on it? |
BaseIntegrationTest::sendRawHttpAndWaitForResponse seems like a possible starting point. The RawConnectionDriver does a write when onEvent(Network::ConnectionEvent::Connected) is delivered by the transport. |
Signed-off-by: yihuaz <yihuaz@google.com>
e905808 to
b3917a1
Compare
Ah it makes sense. It could have taken me a while to figure this out. +1 for adding the check to ensure codecs are not created when using raw connections. |
|
/retest |
|
Retrying Azure Pipelines: |
|
I'm not sure why tests don't seem to be running. Could you merge upstream/main ? |
Signed-off-by: yihuaz <yihuaz@google.com>
9580721 to
58e626a
Compare
Signed-off-by: yihuaz <yihuaz@google.com>
48d72ea to
d3aa25f
Compare
|
Please look at the TSAN/ASAN CI results. There seems to be a test-only bug in AltsIntegrationTestValidPeer.RouterRequestAndResponseWithBodyRawHttp which results in use-after-free. |
|
The use-after-free error happened when invoking |
|
@antoniovicente Do you know the version of gRPC envoy currently depends on? I wonder if it includes grpc/grpc#25621. If it does, we can verify the correctness of I can not think of a better way of verifying So if the current gRPC envoy depends on does not include grpc/grpc#25621, I prefer to add a TODO and add the aforementioned test once the PR gets included. We should also remove the use of PLMK what you think. |
|
You can see the current grpc dependency here. We're currently building against a grpc library that is 6 months old. I'm trying to get the dependency updated. |
|
SG. PLMK when the update is complete, and I will update the test. |
|
grpc version PR: #16687 |
|
The PR to update grpc deps just landed, please |
Signed-off-by: yihuaz <yihuaz@google.com>
8658f41 to
81d9cc6
Compare
Signed-off-by: yihuaz <yihuaz@google.com>
8a823cb to
eecee24
Compare
|
/retest |
|
Retrying Azure Pipelines: |
|
/retest since this PR doesn't change docs inputs. |
|
Retrying Azure Pipelines: |
…nvoyproxy#16488) Signed-off-by: yihuaz <yihuaz@google.com>
TsiSocket doWrite() incorrectly handles outstanding handshake bytes produced after handshake is marked as complete. This PR fixes the issue.
Details: when the handshake is complete in
doHandshakeNextDone(), it callsraiseEvent(Network::ConnectionEvent::Connected)which triggersTsiSocket::doWrite(). ThedoWrite()call then entersrepeatProtectAndWrite()with outstanding handshake bytes inraw_write_buffer_, which violates the invariant.