Skip to content

Conversation

@aaron-congo
Copy link
Contributor

The topology query is not monitored by the EFM plugin. If the user has no socket timeout and the topology query is executed against a down instance, it could execute indefinitely. This PR temporarily sets a socket timeout for the topology query if none was set by the user.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

try {
networkTimeout = conn.getNetworkTimeout();
// The topology query is not monitored by the EFM plugin, so it needs a socket timeout
if (networkTimeout == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we override the user's socket timeout here even if it is set to a non-zero value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should apply default 5000ms socket timeout just in case the current one is set to indefinite wait (it doesn't matter if this indefinite was a user explicit setting or just lack of setting)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they've set their socket timeout to accommodate their longest running queries it could be awhile before a problem is detected with the topology query. The topology query itself should be pretty quick to return. However, I lean towards keeping the code as is, I'd prefer to abide by the settings they gave us as much as possible. Just thought I'd bring it up to see if you agree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergiyvamz won't the value returned by getNetworkTimeout be 0 regardless of if the user explicitly set it to zero or just didn't set it at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it is. But it makes sense to verify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked and the default value is 0 with all drivers

networkTimeout = conn.getNetworkTimeout();
// The topology query is not monitored by the EFM plugin, so it needs a socket timeout
if (networkTimeout == 0) {
conn.setNetworkTimeout(NETWORK_TIMEOUT_EXECUTOR, defaultTopologyQueryTimeoutMs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it synchronous? We shouldn't need to check executor to check it timeout is applied?

public static final CacheMap<String, Boolean> primaryClusterIdCache = new CacheMap<>();

private static final int defaultTopologyQueryTimeoutMs = 5000;
private static final int setNetworkTimeoutMs = 10000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a reasonable timeout value for waiting for setNetworkTimeout to complete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting timeout is a call to set a socket setting. It doesn't require any network activity like sending/receiving data. I'd expect that it takes few millis to set it but 10s is more than enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I lower it to a different value then?


private void setNetworkTimeout(final Connection conn, final int timeoutMs) throws SQLException {
Future<Boolean> future = NETWORK_TIMEOUT_COMPLETION_SERVICE.submit(() -> {
conn.setNetworkTimeout(NETWORK_TIMEOUT_EXECUTOR, timeoutMs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I doing this right? My manual test is working so it isn't hanging anymore but it seems strange that I'm passing the executor to the CompletionService constructor and using it inside the submitted task too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runnable#run and Callable#call do not allow you to set a timeout for the task. Is it okay to use them without a timeout here?

@sergiyvamz sergiyvamz merged commit 5aa64f6 into aws:main May 11, 2023
@aaron-congo aaron-congo deleted the topology-query-timeout branch May 24, 2023 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants