Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,14 @@
import java.util.Properties;
import java.util.Set;
import java.util.UUID;
import java.util.concurrent.CompletionService;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executor;
import java.util.concurrent.ExecutorCompletionService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import java.util.concurrent.locks.ReentrantLock;
import java.util.logging.Logger;
import org.checkerframework.checker.nullness.qual.NonNull;
Expand Down Expand Up @@ -75,6 +82,9 @@ public class AuroraHostListProvider implements DynamicHostListProvider {
+ "This pattern is required to be specified for IP address or custom domain connections to AWS RDS "
+ "clusters. Otherwise, if unspecified, the pattern will be automatically created for AWS RDS clusters.");

private final Executor NETWORK_TIMEOUT_EXECUTOR = Executors.newSingleThreadExecutor();
private final CompletionService NETWORK_TIMEOUT_COMPLETION_SERVICE =
new ExecutorCompletionService(NETWORK_TIMEOUT_EXECUTOR);
private final HostListProviderService hostListProviderService;
private final String originalUrl;
private RdsUrlType rdsUrlType;
Expand All @@ -93,6 +103,8 @@ public class AuroraHostListProvider implements DynamicHostListProvider {
public static final CacheMap<String, String> suggestedPrimaryClusterIdCache = new CacheMap<>();
public static final CacheMap<String, Boolean> primaryClusterIdCache = new CacheMap<>();

private static final int defaultTopologyQueryTimeoutMs = 5000;
private static final int setNetworkTimeoutMs = 10000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a reasonable timeout value for waiting for setNetworkTimeout to complete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting timeout is a call to set a socket setting. It doesn't require any network activity like sending/receiving data. I'd expect that it takes few millis to set it but 10s is more than enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I lower it to a different value then?

private final ReentrantLock lock = new ReentrantLock();
protected String clusterId;
protected HostSpec clusterInstanceTemplate;
Expand Down Expand Up @@ -333,11 +345,42 @@ protected List<HostSpec> queryForTopology(final Connection conn) throws SQLExcep
this.topologyAwareDialect = (TopologyAwareDatabaseCluster) this.hostListProviderService.getDialect();
}

int networkTimeout = -1;
try {
networkTimeout = conn.getNetworkTimeout();
// The topology query is not monitored by the EFM plugin, so it needs a socket timeout
if (networkTimeout == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we override the user's socket timeout here even if it is set to a non-zero value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should apply default 5000ms socket timeout just in case the current one is set to indefinite wait (it doesn't matter if this indefinite was a user explicit setting or just lack of setting)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they've set their socket timeout to accommodate their longest running queries it could be awhile before a problem is detected with the topology query. The topology query itself should be pretty quick to return. However, I lean towards keeping the code as is, I'd prefer to abide by the settings they gave us as much as possible. Just thought I'd bring it up to see if you agree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergiyvamz won't the value returned by getNetworkTimeout be 0 regardless of if the user explicitly set it to zero or just didn't set it at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it is. But it makes sense to verify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked and the default value is 0 with all drivers

setNetworkTimeout(conn, defaultTopologyQueryTimeoutMs);
}
} catch (SQLException e) {
LOGGER.warning(() -> Messages.get("AuroraHostListProvider.errorGettingNetworkTimeout",
new Object[] {e.getMessage()}));
}

try (final Statement stmt = conn.createStatement();
final ResultSet resultSet = stmt.executeQuery(this.topologyAwareDialect.getTopologyQuery())) {
return processQueryResults(resultSet);
} catch (final SQLSyntaxErrorException e) {
throw new SQLException(Messages.get("AuroraHostListProvider.invalidQuery"), e);
} finally {
if (networkTimeout == 0 && !conn.isClosed()) {
setNetworkTimeout(conn, networkTimeout);
}
}
}

private void setNetworkTimeout(final Connection conn, final int timeoutMs) throws SQLException {
Future<Boolean> future = NETWORK_TIMEOUT_COMPLETION_SERVICE.submit(() -> {
conn.setNetworkTimeout(NETWORK_TIMEOUT_EXECUTOR, timeoutMs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I doing this right? My manual test is working so it isn't hanging anymore but it seems strange that I'm passing the executor to the CompletionService constructor and using it inside the submitted task too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runnable#run and Callable#call do not allow you to set a timeout for the task. Is it okay to use them without a timeout here?

return true;
});

try {
future.get(setNetworkTimeoutMs, TimeUnit.MILLISECONDS);
} catch (InterruptedException | ExecutionException | TimeoutException e) {
throw new SQLException(
Messages.get("AuroraHostListProvider.errorSettingNetworkTimeout",
new Object[]{e.getMessage()}));
}
}

Expand Down
2 changes: 2 additions & 0 deletions wrapper/src/main/resources/messages.properties
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ AuroraHostListConnectionPlugin.providerAlreadySet=Another dynamic host list prov
# Aurora Host List Provider
AuroraHostListProvider.clusterInstanceHostPatternRequired=The ''clusterInstanceHostPattern'' configuration property is required when an IP address or custom domain is used to connect to a cluster that provides topology information. If you would instead like to connect without failover functionality, set the 'enableClusterAwareFailover' configuration property to false.
AuroraHostListProvider.clusterInstanceHostPatternNotSupportedForRDSProxy=An RDS Proxy url can''t be used as the 'clusterInstanceHostPattern' configuration setting.
AuroraHostListProvider.errorSettingNetworkTimeout=An error occurred while adjusting the network timeout for the topology query: ''{0}''
AuroraHostListProvider.invalidPattern=Invalid value for the 'clusterInstanceHostPattern' configuration setting - the host pattern must contain a '?' character as a placeholder for the DB instance identifiers of the instances in the cluster.
AuroraHostListProvider.invalidTopology=The topology query returned an invalid topology - no writer instance detected.
AuroraHostListProvider.suggestedClusterId=ClusterId ''{0}'' is suggested for url ''{1}''.
Expand All @@ -28,6 +29,7 @@ AuroraHostListProvider.invalidQuery=Error obtaining host list. Provided database
AuroraHostListProvider.invalidDialect=Expecting a dialect that supports a cluster topology.
AuroraHostListProvider.invalidDialectForGetHostRole=An Aurora dialect is required to analyze a host's role. The detected dialect was ''{0}''
AuroraHostListProvider.errorGettingHostRole=An error occurred while obtaining the connected host's role. This could occur if the connection is broken or if you are not connected to an Aurora database.
AuroraHostListProvider.errorGettingNetworkTimeout=An error occurred while getting the connection network timeout: {0}

# AWS Credentials Manager
AwsCredentialsManager.nullProvider=The configured AwsCredentialsProvider was null. If you have configured the AwsCredentialsManager to use a custom AwsCredentialsProviderHandler, please ensure the handler does not return null.
Expand Down