Issue using microsoft.data.sqlclient RetryLogicProvider #2549
Replies: 17 comments
-
Thank you for reporting an issue with RetryLogicProvider. I noticed that the repro provided does not compile due to a couple of issues:
However, in the error message you reported it seems that the actual error code is getting lost and only showing error code 0. Since error code 0 is not in the transient error list to retry on, the retry logic is not getting triggered. Could you update the repro so we can use that to investigate further, please? |
Beta Was this translation helpful? Give feedback.
-
I used a private connection and database, I wasn't clear in that fact, but the databases I used are basic database service tier with nothing special in it. It would be possible to use your own for testing. As for insertQuery, it's the same, it will just depend on the table you use for testing, but it's a simple SQL Insert element : "INSERT INTO [dbo].[BigTable] ([Id],[FirstName],[LastName],[Email],[Salary],[City],[Country],[ModifiedDate]) VALUES({0},'Sophia','Miller','c67b4b0dfadb41d0b036',16415.78,'New York City','China','2024-05-23 11:15:29.8000000')" I'm sorry, but I don't think I can give my credentials to my database for you as it can contain sensitive information. |
Beta Was this translation helpful? Give feedback.
-
I also just tried to setup TransientErrors like that to ensure I had as much error numbers as possible and the same exception happens instead of having the retry logic happening : |
Beta Was this translation helpful? Give feedback.
-
Could you try adding this line of code as the first line to see if the PoC will throw an actual retriable error code instead of error code 0: AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows", true); |
Beta Was this translation helpful? Give feedback.
-
This is the exception error I get when I add the line of code you gave me : Message : Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.. |
Beta Was this translation helpful? Give feedback.
-
Could you comment out the TransientErrors in the var options = new SqlRetryLogicOption() block please just to see if the default transient error list in SqlConfigurableRetryFactory will catch the error to retry on. |
Beta Was this translation helpful? Give feedback.
-
I just tried it with the transientError list commented, the default transient error list in [SqlConfigurableRetryFactory] doesn't pick this error for some reason. it will throw the exception instead of retrying even with the default list. |
Beta Was this translation helpful? Give feedback.
-
Verified if I could give the connection string for the connection and the Query. I added them in the main issue comment code part. You should be able to test it on your side as well now. |
Beta Was this translation helpful? Give feedback.
-
Oh, I am so sorry I did not expect you to provide private connection string. Since this is a public forum, you may want to remove any private information in your connection string. However, I want to request one more information. An event source trace log would really help to troubleshoot this issue. More information on how to capture event source trace log can be found here. |
Beta Was this translation helpful? Give feedback.
-
SqlError.ctor | ERR | Info Number 0, Error State 0, Error Class 20, Error Message 'A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 35 - An internal exception was caught)', Procedure '', Line Number 0, Batch Index -1 |
Beta Was this translation helpful? Give feedback.
-
Thank you for providing the event source trace log. We will investigate and inform you of our findings. |
Beta Was this translation helpful? Give feedback.
-
I think your issue is here. You are trying to open so many connections and after that performing the reader. That will cause the issue. Try extending your for loop till the end of SqlDataReader operation. This does not seem like a SqlClient issue. |
Beta Was this translation helpful? Give feedback.
-
A severed connection is not a transient error that the driver can retry. Keep in mind, connecting and executing are independent retriable things to the driver. If a connection is broken, a statement can't be retried against it. In-use connections have too many unknowns to the driver for safe, transparent reconnect and retry of statements when a connection is dropped mid-use. For example: Was there an open transaction (perhaps done in T-SQL in a procedure)? Were partial results received (how would the driver restart at a specific point)? Where some records updated but not others (again, stored procedures could have done things the driver isn't aware of)? |
Beta Was this translation helpful? Give feedback.
-
IT's possible I end up with the wrong exception which is why it won't retry. In my case i was trying to reproduce this error with Microsoft.data.sqlclient instead of the old system.data.sqlclient : System.Data.SqlClient.SqlException System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. I went and checked in the documentation of Microsoft to see what would cause this issue and it seems this error happens when maintenance is done on the database and the connection is not available for a bit (normally it would take about 5 seconds to be available again, but at that time the exception has already been thrown). What I want to test and ensure with this Proof of concept is that the connection can be retried if ever a maintenance is happening while the connection is reading or writing. I know this is not exactly the same error as what i showed earlier, but I was under the impression with the documentation on how to test transient errors with a database we could use PowerShell command "Invoke-AzSqlDatabaseFailover". I'm open to any suggestion if you have an idea on how else I could try and simulate a maintenance of a database or an elastic pool for my POC. |
Beta Was this translation helpful? Give feedback.
-
It sounds like the built-in retry logic won't be sufficient for what you are trying to do. If database maintenance results in a broken connection during read/write operations, your application needs to decide what set of actions constitutes an atomic operation that can be safely retried. The built-in retry logic can only retry connection open for specific errors OR statement execution for specific errors. It can't retry both connection open and statement execution as part of one statement execution operation for specific errors. |
Beta Was this translation helpful? Give feedback.
-
Thank you for all your help.
We have two scenarios in which we have most of our issues:
We are going to go forward with the implementation, but we just want to be sure and properly document the use cases where the retry will work vs when the connection will be severed. |
Beta Was this translation helpful? Give feedback.
-
Depending on the state of each connection, some may be automatically restored by the driver. Idle connections (no open transactions or result sets, idle for over 30 seconds)) should be transparently restored as long as connectRetryCount > 0. Connections that aren't idle and are broken will have to be handled by the application.
4014 isn't in the list of transient errors and won't trigger a retry, by default. I'm not familiar with that error, but since it's a fatal error, the connection will have to be closed and restored/retried by the application, as appropriate, when it happens. |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm using azure portal SQL databases in my development for use in cloud infrastructure. Those databases are prone to transient errors. That is why I decided to try and use the new Retry logic feature in the microsoft.data.sqlclient to better handle those cases.
I set up a POC to try and make the feature work in the case the database has to do maintenance. On Microsoft documentation and troubleshooting, apparently using this PowerShell command "Invoke-AzSqlDatabaseFailover" can simulate the maintenance of a database.
Unfortunately, when I'm trying the retry feature when the connection is in use and getting the data from the database and I end up severing the connection in the middle of the reading process, I end up catching an exception like this one (A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)) instead of having the retry being called.
Here is my POC code :
Beta Was this translation helpful? Give feedback.
All reactions