Skip to content

CronetBidirectionalStream #2164

Merged
RyanTheOptimist merged 36 commits intoenvoyproxy:mainfrom
carloseltuerto:cronvoy063
May 10, 2022
Merged

CronetBidirectionalStream #2164
RyanTheOptimist merged 36 commits intoenvoyproxy:mainfrom
carloseltuerto:cronvoy063

Conversation

@carloseltuerto
Copy link
Contributor

@carloseltuerto carloseltuerto commented Apr 18, 2022

This is the initial PR to address #1642. The BidirectionalStreamTest has been ported "as is". Caveats:

  • The QUIC mode where the request header is guaranteed to be sent with the first flushed ByteBuffers needs to be implemented in EM first. BidirectionalStreamQuicTest was not ported.
  • There are some @Ignore tests:
    • Testing specific error codes
    • Causing EM to crash (sequence of destroying the Engine and creating a new one immediately)
    • FinalIntel are flaky (not always set)
    • FinalIntel not available at all when cancelling before receiving the response headers
    • ReceivedByteCount if off and inconsistent with EM.

The implementation does not use synchronized. The logic is 100% Compare-And-Swap based.

This is the test that was executed to hopefully catch all race conditions:

bazel test --runs_per_test=500 --define=signal_trace=disabled test/java/org/chromium/net:bidirectional_stream_test

Description: Initial CronetBidirectionalStream implementation
Risk Level: none: only touches Cronvoy
Testing: CI, and multi concurrent runs (many time 500 runs)
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Copy link
Contributor

@StefanoDuo StefanoDuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started going through the code and I've a bunch of questions. @carloseltuerto feel free to either reply here or ping me tomorrow.

Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Copy link
Contributor

@StefanoDuo StefanoDuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at CancelProofEnvoyStream, CronetUrlRequest* and the tests since they are orthogonal to the bidi stream implementation.

Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Copy link
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level comment. There are thousands of lines of code added in the PR. Are there any ways we could consider breaking it up into smaller pieces?

In any case, this is awesome!

@carloseltuerto
Copy link
Contributor Author

carloseltuerto commented May 3, 2022

High-level comment. There are thousands of lines of code added in the PR. Are there any ways we could consider breaking it up into smaller pieces?

In any case, this is awesome!

This PR is a deal with the devil: you have code avoiding as much as possible "task switching"/"synchronisation", however, an implementation with blocking semantics (java "synchronized") would probably be easier (still tricky).

And there is a class that could be in an isolated PR: CronetBidirectionalState (and the test: CronetBidirectionalStateTest). However, my appreciation of the situation is this would be a pointless exercise. By just looking at that class and its test, it is impossible to infer what how it is used: the caller is needed too to get a sense of what is going on. And there is one class that does not need much scrutiny: BidirectionalStreamTest - it is an almost exact copy of the original.

Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Copy link
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thoughtful response! It looks like CancelProofEnvoyStream could be extracted from the PR as a stand-alone change (even if it wouldn't be used yet). I would definitely consider doing that.

@carloseltuerto
Copy link
Contributor Author

carloseltuerto commented May 5, 2022

Ryan wrote:

Yeah, I google'd "compare and set" and I think I understand that primitive (after swapping back in some CS knowledge from years ago :>) and how it is used in this PR.
But CAS is merely a primitive. The reason it's OK to call cancel() here is because once the CANCEL_BIT is set, mState in unable to be incremented. It can only be decremented. So once mState = CANCEL_BIT it will never change. (Because that's the contract that the other methods agree to). Is that assessment correct?
As for infinite looping... This method will busy loop until any in-flight operations finish, right? As such if any of those operations need access to the current thread, they will deadlock, I think? Presumably, something prevents that from happening. Can you expand on what that is? Also it looks like setStream() finished an operation that gets started elsewhere. Is it possible that cancel() is called before setStream() but for some reason setStream() is blocked on the same thread that cancel() is called from? I think you've thought about this enough that this is all probably true, but it would be good to have this all written down.

One of the great virtue of Compare-And-Swap is that it can't deadlock. There is always one Thread that will succeed - the others may/will loop, and so forth. IBM calls that loop around the CS mnemonic a "spin lock" - they were the first to come up with a dedicated mnemonic for that. (I was a student in 1987 - I wrote a small Assembler for IMB360 as a project)

@RyanTheOptimist
Copy link
Contributor

Ryan wrote:

Yeah, I google'd "compare and set" and I think I understand that primitive (after swapping back in some CS knowledge from years ago :>) and how it is used in this PR. But CAS is merely a primitive. The reason it's OK to call cancel() here is because once the CANCEL_BIT is set, mState in unable to be incremented. It can only be decremented. So once mState = CANCEL_BIT it will never change. (Because that's the contract that the other methods agree to). Is that assessment correct? As for infinite looping... This method will busy loop until any in-flight operations finish, right? As such if any of those operations need access to the current thread, they will deadlock, I think? Presumably, something prevents that from happening. Can you expand on what that is? Also it looks like setStream() finished an operation that gets started elsewhere. Is it possible that cancel() is called before setStream() but for some reason setStream() is blocked on the same thread that cancel() is called from? I think you've thought about this enough that this is all probably true, but it would be good to have this all written down.

One of the great virtue of Compare-And-Swap is that it can't deadlock. There is always one Thread that will succeed - the others may/will loop, and so forth. IBM calls that loop around the CS mnemonic a "spin lock" - they are the first to come up with a dedicated mnemonic for that. (I was a student in 1987 - I wrote a small Assembler for IMB360 as a project)

Ugh, github somehow moved this discussion from the comment in the code to the main conversation page. Man, I hate github.

Can you explain what would happen if cancel() is called and setStream() is never called?

Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
@alyssawilk alyssawilk removed their assignment May 9, 2022
- Add more documentation
- Rearrange and rename State and Action to make them more consistant

Signed-off-by: Charles Le Borgne <cleborgne@google.com>
RyanTheOptimist
RyanTheOptimist previously approved these changes May 9, 2022
Copy link
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing all this, Charles. I'm slightly apprehensive about the maintainability / complexity of the state machine, but we and always address that later if it turns out to be problematic. LGTM

@alyssawilk care to take a look?

Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
@alyssawilk
Copy link
Contributor

@RyanTheOptimist I strongly suspect my pass isn't going to add much value to the review you've already done. I'd be inclined to merge as-is and as you say we can revisit complexity later if we need to. SGTY or you actively prefer I do a pass?

@RyanTheOptimist
Copy link
Contributor

@RyanTheOptimist I strongly suspect my pass isn't going to add much value to the review you've already done. I'd be inclined to merge as-is and as you say we can revisit complexity later if we need to. SGTY or you actively prefer I do a pass?

Works for me! I'll go ahead and merge now.

@RyanTheOptimist RyanTheOptimist merged commit 3a59691 into envoyproxy:main May 10, 2022
@carloseltuerto carloseltuerto deleted the cronvoy063 branch May 10, 2022 17:49
jpsim added a commit that referenced this pull request May 10, 2022
* main:
  build: remove dist/ (#2184)
  Fix Envoy Mobile bug where writing prevents the read loop from running, (#2221)
  Add comments to CronetBidirectionalStream (#2266)
  CronetBidirectionalStream  (#2164)
  ci: update build image (#2261)
  Bump Lyft Support Rotation (#2260)

Signed-off-by: JP Simard <jp@jpsim.com>
jpsim added a commit that referenced this pull request May 10, 2022
* main:
  build: remove dist/ (#2184)
  Fix Envoy Mobile bug where writing prevents the read loop from running, (#2221)
  Add comments to CronetBidirectionalStream (#2266)
  CronetBidirectionalStream  (#2164)
  ci: update build image (#2261)
  Bump Lyft Support Rotation (#2260)

Signed-off-by: JP Simard <jp@jpsim.com>
jpsim added a commit that referenced this pull request May 12, 2022
* main:
  Add assert when failing to get_env (#2253)
  Update Kotlin standard libraries to 1.6.21 (#2256)
  iOS: Change release artifacts to use xcframeworks (#2217)
  build: remove dist/ (#2184)
  Fix Envoy Mobile bug where writing prevents the read loop from running, (#2221)
  Add comments to CronetBidirectionalStream (#2266)
  CronetBidirectionalStream  (#2164)

Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants