-
Notifications
You must be signed in to change notification settings - Fork 72
Fix bug where topology query could execute indefinitely #416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| try { | ||
| networkTimeout = conn.getNetworkTimeout(); | ||
| // The topology query is not monitored by the EFM plugin, so it needs a socket timeout | ||
| if (networkTimeout == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we override the user's socket timeout here even if it is set to a non-zero value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should apply default 5000ms socket timeout just in case the current one is set to indefinite wait (it doesn't matter if this indefinite was a user explicit setting or just lack of setting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they've set their socket timeout to accommodate their longest running queries it could be awhile before a problem is detected with the topology query. The topology query itself should be pretty quick to return. However, I lean towards keeping the code as is, I'd prefer to abide by the settings they gave us as much as possible. Just thought I'd bring it up to see if you agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sergiyvamz won't the value returned by getNetworkTimeout be 0 regardless of if the user explicitly set it to zero or just didn't set it at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it is. But it makes sense to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked and the default value is 0 with all drivers
| networkTimeout = conn.getNetworkTimeout(); | ||
| // The topology query is not monitored by the EFM plugin, so it needs a socket timeout | ||
| if (networkTimeout == 0) { | ||
| conn.setNetworkTimeout(NETWORK_TIMEOUT_EXECUTOR, defaultTopologyQueryTimeoutMs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it synchronous? We shouldn't need to check executor to check it timeout is applied?
| public static final CacheMap<String, Boolean> primaryClusterIdCache = new CacheMap<>(); | ||
|
|
||
| private static final int defaultTopologyQueryTimeoutMs = 5000; | ||
| private static final int setNetworkTimeoutMs = 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a reasonable timeout value for waiting for setNetworkTimeout to complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting timeout is a call to set a socket setting. It doesn't require any network activity like sending/receiving data. I'd expect that it takes few millis to set it but 10s is more than enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I lower it to a different value then?
|
|
||
| private void setNetworkTimeout(final Connection conn, final int timeoutMs) throws SQLException { | ||
| Future<Boolean> future = NETWORK_TIMEOUT_COMPLETION_SERVICE.submit(() -> { | ||
| conn.setNetworkTimeout(NETWORK_TIMEOUT_EXECUTOR, timeoutMs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I doing this right? My manual test is working so it isn't hanging anymore but it seems strange that I'm passing the executor to the CompletionService constructor and using it inside the submitted task too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks a bit over-engineered. Why not to use something similar to this?
https://github.com/spring-projects/spring-framework/blob/main/spring-core/src/main/java/org/springframework/core/task/SyncTaskExecutor.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runnable#run and Callable#call do not allow you to set a timeout for the task. Is it okay to use them without a timeout here?
The topology query is not monitored by the EFM plugin. If the user has no socket timeout and the topology query is executed against a down instance, it could execute indefinitely. This PR temporarily sets a socket timeout for the topology query if none was set by the user.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.