-
Notifications
You must be signed in to change notification settings - Fork 73
Fix bug where topology query could execute indefinitely #416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
2f86c2b
3f24ad4
c7639e0
91401c3
acf6d28
d3bc28c
daac7d4
ef33e4b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,7 +30,14 @@ | |
| import java.util.Properties; | ||
| import java.util.Set; | ||
| import java.util.UUID; | ||
| import java.util.concurrent.CompletionService; | ||
| import java.util.concurrent.ExecutionException; | ||
| import java.util.concurrent.Executor; | ||
| import java.util.concurrent.ExecutorCompletionService; | ||
| import java.util.concurrent.Executors; | ||
| import java.util.concurrent.Future; | ||
| import java.util.concurrent.TimeUnit; | ||
| import java.util.concurrent.TimeoutException; | ||
| import java.util.concurrent.locks.ReentrantLock; | ||
| import java.util.logging.Logger; | ||
| import org.checkerframework.checker.nullness.qual.NonNull; | ||
|
|
@@ -75,6 +82,9 @@ public class AuroraHostListProvider implements DynamicHostListProvider { | |
| + "This pattern is required to be specified for IP address or custom domain connections to AWS RDS " | ||
| + "clusters. Otherwise, if unspecified, the pattern will be automatically created for AWS RDS clusters."); | ||
|
|
||
| private final Executor NETWORK_TIMEOUT_EXECUTOR = Executors.newSingleThreadExecutor(); | ||
| private final CompletionService NETWORK_TIMEOUT_COMPLETION_SERVICE = | ||
| new ExecutorCompletionService(NETWORK_TIMEOUT_EXECUTOR); | ||
| private final HostListProviderService hostListProviderService; | ||
| private final String originalUrl; | ||
| private RdsUrlType rdsUrlType; | ||
|
|
@@ -93,6 +103,8 @@ public class AuroraHostListProvider implements DynamicHostListProvider { | |
| public static final CacheMap<String, String> suggestedPrimaryClusterIdCache = new CacheMap<>(); | ||
| public static final CacheMap<String, Boolean> primaryClusterIdCache = new CacheMap<>(); | ||
|
|
||
| private static final int defaultTopologyQueryTimeoutMs = 5000; | ||
| private static final int setNetworkTimeoutMs = 10000; | ||
| private final ReentrantLock lock = new ReentrantLock(); | ||
| protected String clusterId; | ||
| protected HostSpec clusterInstanceTemplate; | ||
|
|
@@ -333,11 +345,42 @@ protected List<HostSpec> queryForTopology(final Connection conn) throws SQLExcep | |
| this.topologyAwareDialect = (TopologyAwareDatabaseCluster) this.hostListProviderService.getDialect(); | ||
| } | ||
|
|
||
| int networkTimeout = -1; | ||
| try { | ||
| networkTimeout = conn.getNetworkTimeout(); | ||
| // The topology query is not monitored by the EFM plugin, so it needs a socket timeout | ||
| if (networkTimeout == 0) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we override the user's socket timeout here even if it is set to a non-zero value?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the motivation for that?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should apply default 5000ms socket timeout just in case the current one is set to indefinite wait (it doesn't matter if this indefinite was a user explicit setting or just lack of setting)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If they've set their socket timeout to accommodate their longest running queries it could be awhile before a problem is detected with the topology query. The topology query itself should be pretty quick to return. However, I lean towards keeping the code as is, I'd prefer to abide by the settings they gave us as much as possible. Just thought I'd bring it up to see if you agree
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sergiyvamz won't the value returned by getNetworkTimeout be 0 regardless of if the user explicitly set it to zero or just didn't set it at all?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume it is. But it makes sense to verify.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just checked and the default value is 0 with all drivers |
||
| setNetworkTimeout(conn, defaultTopologyQueryTimeoutMs); | ||
| } | ||
| } catch (SQLException e) { | ||
| LOGGER.warning(() -> Messages.get("AuroraHostListProvider.errorGettingNetworkTimeout", | ||
| new Object[] {e.getMessage()})); | ||
| } | ||
|
|
||
| try (final Statement stmt = conn.createStatement(); | ||
| final ResultSet resultSet = stmt.executeQuery(this.topologyAwareDialect.getTopologyQuery())) { | ||
| return processQueryResults(resultSet); | ||
| } catch (final SQLSyntaxErrorException e) { | ||
| throw new SQLException(Messages.get("AuroraHostListProvider.invalidQuery"), e); | ||
| } finally { | ||
| if (networkTimeout == 0 && !conn.isClosed()) { | ||
| setNetworkTimeout(conn, networkTimeout); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| private void setNetworkTimeout(final Connection conn, final int timeoutMs) throws SQLException { | ||
| Future<Boolean> future = NETWORK_TIMEOUT_COMPLETION_SERVICE.submit(() -> { | ||
| conn.setNetworkTimeout(NETWORK_TIMEOUT_EXECUTOR, timeoutMs); | ||
|
||
| return true; | ||
| }); | ||
|
|
||
| try { | ||
| future.get(setNetworkTimeoutMs, TimeUnit.MILLISECONDS); | ||
| } catch (InterruptedException | ExecutionException | TimeoutException e) { | ||
| throw new SQLException( | ||
| Messages.get("AuroraHostListProvider.errorSettingNetworkTimeout", | ||
| new Object[]{e.getMessage()})); | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a reasonable timeout value for waiting for setNetworkTimeout to complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting timeout is a call to set a socket setting. It doesn't require any network activity like sending/receiving data. I'd expect that it takes few millis to set it but 10s is more than enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I lower it to a different value then?