Cannot get over failover issues #1610

samiabuzeineh-wk · 2025-11-24T19:54:17Z

samiabuzeineh-wk
Nov 24, 2025

Hello,

I have been dealing with AWS support for over a month at this point to no avail, and they redirected me here. We have noticed during a disaster recovery test that we still hang onto old connections post failover for almost 10 minutes even though we are using the AWS advanced JDBC wrapper which claims it should handle it for us in seconds. What this means is we continue to send write requests to downgraded reader instances for almost 10 minutes post-failover.

The AWS team recommended we try and target one cluster endpoint and allow the readOnly flag on the connection to dictate whether or not we should hit read instances:

The AWS JDBC Driver for MySQL/PostgreSQL is specifically designed for Aurora clusters, capable of automatically discovering cluster topology, managing failovers, and intelligently routing read/write operations based on transaction context. This simplifies application configuration, as only a single DataSource pointing to the Cluster Endpoint is needed, with the driver handling the internal routing.

They linked me this article as well where that was stated.

While this did fix our failover issues, I unfortunately found that this was causing all of our queries to hit our primary instances and the driver didn't appear to handle the routing for us like we intended. I verified that by running the following java code to log some output:

try (
            Connection conn = DataSourceUtils.getConnection(dataSource);
            Statement stmt = conn.createStatement();
            ResultSet rs = stmt.executeQuery("SELECT @@aurora_server_id, @@hostname, @@read_only, @@innodb_read_only")
        ) {
            if (rs.next()) {
                log.info(
                    "host=%s, dbReadOnly=%s, innoDbReadOnly=%s, jdbcReadOnly=%s".formatted(
                            rs.getString("@@hostname"),
                            rs.getBoolean("@@read_only"),
                            rs.getBoolean("@@innodb_read_only"),
                            conn.isReadOnly()
                        )
                );
            }

I was receiving results like the following:

host=ip-172-17-3-215, dbReadOnly=false, innoDbReadOnly=false, jdbcReadOnly=true

The fact that jdbcReadOnly is true indicates that the readOnly flag is set to true in our application code at it's proper location. However, the dbReadOnly and innoDbReadOnly both come back as false indicating that we are not actually hitting read instances like we intend to. Allowing all queries to go to the primary instance could lead to high CPU utilization and potential outages that we cannot afford.

Lastly, after jumping on a Chime meeting with an AWS representative, he recommended we use the AwsWrapperDataSource and not the Amazon JDBC driver. So I did a big refactor in our codebase to utilize the AwsWrapperDataSource now instead of a HikariDataSource that had no references to the AwsWrapperDataSource using this as a reference. I also got rid of usages of the Amazon Driver as they recommended as well, and I put back in the individual read endpoint as well as the primary endpoint as they also recommended. Lastly, I made sure we set the connection timeout on the hikari properties to be very low to get rid of any DNS caching and have a small TTL:

db.hikari.workspaces.connection-timeout=5000
db.hikari.workspaces.idle-timeout=10000

I then triggered a failover and saw us hanging onto old stale connections yet again:

Caused by: java.sql.SQLException: Running in read-only mode
 at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:121)
 at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:114)
 at com.mysql.cj.jdbc.ServerPreparedStatement.serverExecute(ServerPreparedStatement.java:612)
 at com.mysql.cj.jdbc.ServerPreparedStatement.executeInternal(ServerPreparedStatement.java:320)
 at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:383)
 at software.amazon.jdbc.plugin.DefaultConnectionPlugin.execute(DefaultConnectionPlugin.java:126)
 at software.amazon.jdbc.ConnectionPluginManager.lambda$execute$5(ConnectionPluginManager.java:306)
 at software.amazon.jdbc.ConnectionPluginManager.lambda$null$0(ConnectionPluginManager.java:237)
 at software.amazon.jdbc.ConnectionPluginManager.executeWithTelemetry(ConnectionPluginManager.java:209)
 at software.amazon.jdbc.ConnectionPluginManager.lambda$makePluginChainFunc$1(ConnectionPluginManager.java:237)
 at software.amazon.jdbc.ConnectionPluginManager.lambda$null$2(ConnectionPluginManager.java:247)
 at software.amazon.jdbc.plugin.efm2.HostMonitoringConnectionPlugin.execute(HostMonitoringConnectionPlugin.java:182)
 at software.amazon.jdbc.ConnectionPluginManager.lambda$execute$5(ConnectionPluginManager.java:306)
 at software.amazon.jdbc.ConnectionPluginManager.lambda$null$3(ConnectionPluginManager.java:245)
 at software.amazon.jdbc.ConnectionPluginManager.executeWithTelemetry(ConnectionPluginManager.java:209)
 at software.amazon.jdbc.ConnectionPluginManager.lambda$makePluginChainFunc$4(ConnectionPluginManager.java:244)
 at software.amazon.jdbc.ConnectionPluginManager.lambda$null$2(ConnectionPluginManager.java:247)
 at software.amazon.jdbc.plugin.failover.FailoverConnectionPlugin.execute(FailoverConnectionPlugin.java:288)
 at software.amazon.jdbc.ConnectionPluginManager.lambda$execute$5(ConnectionPluginManager.java:306)
 at software.amazon.jdbc.ConnectionPluginManager.lambda$null$3(ConnectionPluginManager.java:245)
 at software.amazon.jdbc.ConnectionPluginManager.executeWithTelemetry(ConnectionPluginManager.java:209)
 at software.amazon.jdbc.ConnectionPluginManager.lambda$makePluginChainFunc$4(ConnectionPluginManager.java:244)
 at software.amazon.jdbc.ConnectionPluginManager.executeWithSubscribedPlugins(ConnectionPluginManager.java:194)
 at software.amazon.jdbc.ConnectionPluginManager.execute(ConnectionPluginManager.java:303)
 at software.amazon.jdbc.util.WrapperUtils.executeWithPlugins(WrapperUtils.java:301)
 at software.amazon.jdbc.wrapper.PreparedStatementWrapper.execute(PreparedStatementWrapper.java:186)
 at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44)
 at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java)
 at org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:219)
 at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:458)
 at org.jooq.impl.AbstractDMLQuery.execute(AbstractDMLQuery.java:1099)
 at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:348)
 ... 21 common frames omitted

They then directed me to post about it here for ideas. Is there anything you all could help us with to fix this issue we've been battling with for over a month now? Any insights would be greatly appreciated.

Thanks,
Sami

karenc-bq · 2025-11-25T19:08:44Z

karenc-bq
Nov 25, 2025
Maintainer

Hi @samiabuzeineh-wk, thank you for reaching out.
Are you able to share your connection configuration and any driver logs with us?

1 reply

samiabuzeineh-wk Nov 25, 2025
Author

Hi @karenc-bq , I don't have any driver logs but I will use your link to try and enable them to get you something soon.

As for the connection configuration, this is how we are doing it:

WorkspacesDatabaseConfiguration.java
DataSourceBuilder.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot get over failover issues #1610

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Cannot get over failover issues #1610

Uh oh!

samiabuzeineh-wk Nov 24, 2025

Replies: 1 comment · 1 reply

Uh oh!

karenc-bq Nov 25, 2025 Maintainer

Uh oh!

samiabuzeineh-wk Nov 25, 2025 Author

samiabuzeineh-wk
Nov 24, 2025

Replies: 1 comment 1 reply

karenc-bq
Nov 25, 2025
Maintainer

samiabuzeineh-wk Nov 25, 2025
Author