Skip to content

quic: test fast-fail if secrets not loaded when create quic connection#18705

Merged
ggreenway merged 4 commits intoenvoyproxy:mainfrom
YaoZengzeng:null
Nov 4, 2021
Merged

quic: test fast-fail if secrets not loaded when create quic connection#18705
ggreenway merged 4 commits intoenvoyproxy:mainfrom
YaoZengzeng:null

Conversation

@YaoZengzeng
Copy link
Copy Markdown
Member

@YaoZengzeng YaoZengzeng commented Oct 21, 2021

Commit Message: quic: test fast-fail if secrets not loaded when create quic connection
Additional Description: N/A
Risk Level: Low
Testing: N/A
Docs Changes: N/A

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

Do you have any context for this change?
Please make sure we have a new unit/integration test.
It looks like there's a build problem at the moment which should have been fixed by #18708. I recommend syncing and trying again?

Also, please fill out the required fields in the PR description. For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

/wait

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

/wait

@YaoZengzeng
Copy link
Copy Markdown
Member Author

YaoZengzeng commented Oct 22, 2021

@RyanTheOptimist because the function createQuicNetworkConnection() (https://github.com/envoyproxy/envoy/blob/main/source/common/quic/client_connection_factory_impl.cc#L65) would return nullptr, so need this fix to avoid invalid reference.

@danzh2010
Copy link
Copy Markdown
Contributor

@RyanTheOptimist because the function createQuicNetworkConnection() (https://github.com/envoyproxy/envoy/blob/main/source/common/quic/client_connection_factory_impl.cc#L65) would return nullptr, so need this fix to avoid invalid reference.

It looks like quic upstream would reference nullptr if using SDS without this fix. Could you please add a regression test for this fix?

@YaoZengzeng
Copy link
Copy Markdown
Member Author

@RyanTheOptimist because the function createQuicNetworkConnection() (https://github.com/envoyproxy/envoy/blob/main/source/common/quic/client_connection_factory_impl.cc#L65) would return nullptr, so need this fix to avoid invalid reference.

It looks like quic upstream would reference nullptr if using SDS without this fix. Could you please add a regression test for this fix?

I'll do it :)

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

/wait

@YaoZengzeng
Copy link
Copy Markdown
Member Author

It seems there are fast-fail to check if secret is loaded (https://github.com/envoyproxy/envoy/blob/main/source/common/http/http3/conn_pool.cc#L88), so if the secrets will not change to empty again, and the createQuicNetworkConnection will not return nullptr.

I wonder if it's necessary to double check in createQuicNetworkConnection. If do it, it is necessary to confirm the return value of createQuicNetworkConnection out of defense.

Although it's necessary to add a test for fast-fail if secrets not loaded, and I'd like to do it:)

@YaoZengzeng YaoZengzeng changed the title quic: return nullptr if create quic network connection failed quic: test fast-fail if secrets not loaded when create quic connection Oct 28, 2021
@YaoZengzeng YaoZengzeng force-pushed the null branch 4 times, most recently from e148a6a to 3f4bf3a Compare October 28, 2021 10:38
Copy link
Copy Markdown
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this object leak, or does the dispatcher end up returning it at some point? If the latter, please add a short comment which says that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please use ConnectionPool::InstancePtr instead of auto (assuming that's the right type).

@YaoZengzeng YaoZengzeng force-pushed the null branch 3 times, most recently from b2d2a25 to 73d3b2f Compare October 29, 2021 03:16
Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
@YaoZengzeng
Copy link
Copy Markdown
Member Author

@RyanTheOptimist updated. Errors in CI don't appear to be related to this PR.

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

It seems there are fast-fail to check if secret is loaded (https://github.com/envoyproxy/envoy/blob/main/source/common/http/http3/conn_pool.cc#L88), so if the secrets will not change to empty again, and the createQuicNetworkConnection will not return nullptr.

I wonder if it's necessary to double check in createQuicNetworkConnection. If do it, it is necessary to confirm the return value of createQuicNetworkConnection out of defense.

Although it's necessary to add a test for fast-fail if secrets not loaded, and I'd like to do it:)

So to clarify, is the new test testing this existing return nullptr (on line 95) or the new code you added? If it's testing the existing code, can we get a test to cover the new code?

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

@RyanTheOptimist updated. Errors in CI don't appear to be related to this PR.

I had two comments from yesterday that don't seem to be address. Have they been address?

A note for the future: force pushes breaks the reviewing flow, so please avoid doing so while iterating on the PR. We end up squashing the commits on merge anyways, so it leaving the commits up doesn't matter.

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #18705 (comment) was created by @RyanTheOptimist.

see: more, trace.

@YaoZengzeng
Copy link
Copy Markdown
Member Author

@RyanTheOptimist updated. Errors in CI don't appear to be related to this PR.

I had two comments from yesterday that don't seem to be address. Have they been address?

A note for the future: force pushes breaks the reviewing flow, so please avoid doing so while iterating on the PR. We end up squashing the commits on merge anyways, so it leaving the commits up doesn't matter.

Both have been fixed, add comment for suspected leak and change auto to specific type.

I used to keep the commit log clean :) But thanks for the advice, I'll follow it next time.

@YaoZengzeng
Copy link
Copy Markdown
Member Author

It seems there are fast-fail to check if secret is loaded (https://github.com/envoyproxy/envoy/blob/main/source/common/http/http3/conn_pool.cc#L88), so if the secrets will not change to empty again, and the createQuicNetworkConnection will not return nullptr.
I wonder if it's necessary to double check in createQuicNetworkConnection. If do it, it is necessary to confirm the return value of createQuicNetworkConnection out of defense.
Although it's necessary to add a test for fast-fail if secrets not loaded, and I'd like to do it:)

So to clarify, is the new test testing this existing return nullptr (on line 95) or the new code you added? If it's testing the existing code, can we get a test to cover the new code?

New testing code is for the existing return nullptr. And I think the new code which I added is just out of defence, I don't think if the secrets have been loaded before createQuicNetworkConnection, it will become empty in that function.

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

It seems there are fast-fail to check if secret is loaded (https://github.com/envoyproxy/envoy/blob/main/source/common/http/http3/conn_pool.cc#L88), so if the secrets will not change to empty again, and the createQuicNetworkConnection will not return nullptr.
I wonder if it's necessary to double check in createQuicNetworkConnection. If do it, it is necessary to confirm the return value of createQuicNetworkConnection out of defense.
Although it's necessary to add a test for fast-fail if secrets not loaded, and I'd like to do it:)

So to clarify, is the new test testing this existing return nullptr (on line 95) or the new code you added? If it's testing the existing code, can we get a test to cover the new code?

New testing code is for the existing return nullptr. And I think the new code which I added is just out of defence, I don't think if the secrets have been loaded before createQuicNetworkConnection, it will become empty in that function.

Interesting. I wonder what would happen if you duplicated your new tests, but changed:
EXPECT_CALL(*config, isReady()).WillRepeatedly(Return(false));
So that the first call returns true (and hence the old return nullptr is not triggered) but second call returns false and hence hits the new return nullptr. What do you think

Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
@YaoZengzeng
Copy link
Copy Markdown
Member Author

@RyanTheOptimist Updated. Actually we can't manipulate isReady(), because it's triggered by secrets change. But we can't precisely make the secret change after fast-fail but before createQuicNetworkConnection.

So I create a mock QuicClientTransportSocketFactory and to achive the goal by changing the return value of method ssl_context :)

@YaoZengzeng
Copy link
Copy Markdown
Member Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #18705 (comment) was created by @YaoZengzeng.

see: more, trace.

RyanTheOptimist
RyanTheOptimist previously approved these changes Nov 1, 2021
Copy link
Copy Markdown
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. @danzh2010 what do you think?

danzh2010
danzh2010 previously approved these changes Nov 1, 2021
@RyanTheOptimist
Copy link
Copy Markdown
Contributor

/approve-from envoyproxy/senior-maintainers

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

/assign-from envoyproxy/senior-maintainers

@repokitteh-read-only
Copy link
Copy Markdown

envoyproxy/senior-maintainers assignee is @ggreenway

🐱

Caused by: a #18705 (comment) was created by @RyanTheOptimist.

see: more, trace.

Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait-any

source_address, quic_stat_names, scope);
if (data.connection_ == nullptr) {
ENVOY_LOG_TO_LOGGER(
Envoy::Logger::Registry::getLog(Envoy::Logger::Id::pool), warn,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is warn going to be too much output in the log? I presume if this happens once, it'll probably happen for every connection attempt. Maybe use ENVOY_LOG_PERIODIC_TO_LOGGER or ENVOY_LOG_EVERY_POW_2_TO_LOGGER?

Is there any failure metric that increases in this case?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway updated to ENVOY_LOG_EVERY_POW_2_TO_LOGGER.

As for failure metric, there seems to be no metric to indicate the failure of quic connection creation. From my research, even for HTTP 1/2, there is only a metric bind_errors_ which is used to indicate the bind error when failed to create client network connection.

Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
@ggreenway ggreenway merged commit 361fd53 into envoyproxy:main Nov 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants