Cannot get over failover issues #1610
Unanswered
samiabuzeineh-wk
asked this question in
Q&A
Replies: 1 comment 1 reply
-
|
Hi @samiabuzeineh-wk, thank you for reaching out. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I have been dealing with AWS support for over a month at this point to no avail, and they redirected me here. We have noticed during a disaster recovery test that we still hang onto old connections post failover for almost 10 minutes even though we are using the AWS advanced JDBC wrapper which claims it should handle it for us in seconds. What this means is we continue to send write requests to downgraded reader instances for almost 10 minutes post-failover.
The AWS team recommended we try and target one cluster endpoint and allow the
readOnlyflag on the connection to dictate whether or not we should hit read instances:They linked me this article as well where that was stated.
While this did fix our failover issues, I unfortunately found that this was causing all of our queries to hit our primary instances and the driver didn't appear to handle the routing for us like we intended. I verified that by running the following java code to log some output:
I was receiving results like the following:
host=ip-172-17-3-215, dbReadOnly=false, innoDbReadOnly=false, jdbcReadOnly=trueThe fact that
jdbcReadOnlyis true indicates that the readOnly flag is set to true in our application code at it's proper location. However, thedbReadOnlyandinnoDbReadOnlyboth come back as false indicating that we are not actually hitting read instances like we intend to. Allowing all queries to go to the primary instance could lead to high CPU utilization and potential outages that we cannot afford.Lastly, after jumping on a Chime meeting with an AWS representative, he recommended we use the AwsWrapperDataSource and not the Amazon JDBC driver. So I did a big refactor in our codebase to utilize the AwsWrapperDataSource now instead of a HikariDataSource that had no references to the AwsWrapperDataSource using this as a reference. I also got rid of usages of the Amazon Driver as they recommended as well, and I put back in the individual read endpoint as well as the primary endpoint as they also recommended. Lastly, I made sure we set the connection timeout on the hikari properties to be very low to get rid of any DNS caching and have a small TTL:
I then triggered a failover and saw us hanging onto old stale connections yet again:
They then directed me to post about it here for ideas. Is there anything you all could help us with to fix this issue we've been battling with for over a month now? Any insights would be greatly appreciated.
Thanks,
Sami
Beta Was this translation helpful? Give feedback.
All reactions