-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk copy: fails with combination sendStringParametersAsUnicode=false #1784
Comments
Hello @JDBC-0, Thanks for reaching out. I don't suppose you are able to provide the stack trace in English? We will investigate regardless but those would help greatly expediate the process. |
I wrote a minimal test to reproduce this: A) create the table: ` /****** Object: Table [dbo].[test_MH_JDBC] Script Date: 04.04.2022 11:01:06 ******/ SET QUOTED_IDENTIFIER ON CREATE TABLE [dbo].[test_MH_JDBC]( B) and here is the Java application : ` import java.sql.Connection; /**
}//Test The result is:
(Roughly translated: bcp-Client received an invalid column lngth for column ID 1) Our environment:
|
... and already found the problem: if I remove the URL parameter sendStringParametersAsUnicode=false; then the error does not occur! But we have to set sendStringParametersAsUnicode=false because we still have all the older datatypes (VARCHAR and not NVARCHAR) and the standard Windows encoding / collation on the database server (Cp1252, Latin1_General_CI_AS). I also tested it with the same table but NVARCHAR datatypes: same error occurs. We use sendStringParametersAsUnicode=false since jTDS driver years ago and the reason might have been to "fix" implicit charatcer conversions between the Java UTF encoding (we use -Dfile.encoding=UTF-8 for misc reasons). I have no idea if this parameter can be omitted (or set to 'true' with SQL Server 2019 and Java 11 ...? I read here, that the new JDBC 4 method setNString() is available ... the above test code does not show the error anymore if I use setNString() instead of setString() : ps.setNString(1, "TST"); This is not quite correct as the table column is still VARCHAR and not NVARCHAR. But I don't know if this general change of using setNString() instead of setString() has other side effects in our application. (And it would mean changing thousands of code lines.) But without using the useBulkCopyForBatchInsert=true parameter, the code runs fine, so it has something to do with the combination of bulk copy feature and character encoding / collation: A) sendStringParametersAsUnicode=false;useBulkCopyForBatchInsert=false ... works One reason might be, that bulk copy ignores the parameter value for sendStringParametersAsUnicode ? |
I played with the driver specific API to use com.microsoft.sqlserver.jdbc.SQLServerBulkCopy (but we want to stay driver independent, so we just used the standard JDBC API) and this kind of usage seems to work (i.e. no error message and data is inserted): `
` As you can see, it's total overkill to first create an artificial statement with a database roundtrip to have a ResultSet to add data to. Just wanted to show that with teh same JDBC URL parameters, it works using the driver's concrete SQLServerBulkCopy object instead using the standard JDBC PreparedStatement (which is preferable). |
So, with the workaround to create a separate connection for such bulk inserts with the parameters ...;sendStringParametersAsUnicode=true;useBulkCopyForBatchInsert=true;... it works. BUT: it is slower ... considerably slower! Für table data with about 10.000 rows, it's almost identical. But for larger data like multiple 100.000 rows, it is about 10% slower than the connection without bulk copy enabled!!! |
Hi @JDBC-0, |
Hi @JDBC-0, After quite some debugging. The issue currently stands as follows. When trying to use setString with sendStringParametersAsUnicode=false, the driver will attempt to insert the given data as varchar which is one byte per character. However this string data is in multibyte character format and will get mangled during between setString and executeBatch. Using either sendStringParameterAsUnicode=true OR setNString will instead insert data as nvarchar which is 2 bytes per character and has enough space to correctly add the string data. As such unless you can ascertain that your data will be 1 byte per character, you should use either sendStringParametersAsUnicode=true or setNString. |
Of course we can't change all our code from setString to setNString just for this broken bulk feature. I am not really sure, what is really doing - we just made the experience with jTDS that we had to set this parameter to false for all strings to work. I guess this has something to do with the coalation of the database but how is it related? Are there collations that require sendStringParameterAsUnicode=false? Our tables use VARCHAR and not NVARCHAR, so I don't understand why bulk copy has such a behaviour reading your explanation. |
I am unsure if there are any encodings that cannot be sent as Unicode. I can't immediately think of why a single byte charset would not be able to be sent with 2 bytes per character but it is not my field of expertise. Regardless I have a proposed fix to this issue in #1816, just have to run tests and discuss this with the team before finalizing on the solution. Could you try and see if the fix works for you @JDBC-0? |
Just an update in case you are still following this issue @JDBC-0. The PR is currently still in progress. It does fix your issue however the team has discussed it and we need quite a bit more testing to confirm it does not disturb other behavior. We will likely not have time to finish this in our current sprint and it will likely be revisited in a couple of weeks. |
Thnks for the feedback, this is good news. |
hi @JDBC-0 Error: java.sql.BatchUpdateException: The given value of type VARCHAR(3) from the data source cannot be converted to type varchar(1) of the specified target column Text. Also I am unsure what the issue is? If you want to send unicode characters you will need to NOT specify |
Looking at this issue more it involves quite a bit more work than originally thought. We will have to budget more time to investigate how |
Thanks for the feedback. We try to test if we can ommit this Parameter sendStringParametersAsUnicode now as we use it for years but now have a new JDBC driver (this mssql-jdbc, former: jTDS) and a new SQL Server (now: 2019, former: 2008). But our database ist still old with collation Latin1_General_CI_AS and we use Java 11 (of course UTF-8, Unicode) to use the database. So, I guess, it won't work as the Unice-Characters of Java are incompatible with the database collation. (For the test code: perhaps Github swalloed / changed some characters when pasting the code. It's jast a simple VARCHAR column - e.g. VARCHAR(3). The comment above with the test code seemed to have swallowed the "(3)" in the table declaration.) |
thanks, I have marked this as an enhancement to specify encoding. This will be considered along with other feature requests and bug fixes when we do planning for the next semester. |
I noticed a comment in our configuration file saying "set to false to avoid performance hits due to UTF-8 conversions". The official driver documentation also says: "For optimal performance with the CHAR, VARCHAR, and LONGVARCHAR JDBC data types, an application should set the sendStringParametersAsUnicode property to "false" and use the setString, ...." As we use the old VARCHAR datatype in database and setString() JDBC methods, I guess setting sendStringParametersAsUnicode = true will hurt performance as Java is UTF-8 and the VARCHAR type is nin-UTF-8 ? |
When setting Just an update on this issue: this is still an issue which requires a thorough understanding of |
Any news here? After we finally upgraded die driver 12.8.1 we still try to use useBulkCopyForBatchInsert=true but either get "errorCode=4022, message=Massenladedaten wurden erwartet, aber nicht gesendet. Der Batch wird beendet., SQLState=S0001, cause=null" or "Vom bcp-Client wurde eine ungültige Spaltenlänge für die Spalten-ID 1 empfangen." fiddling around with teh URL parameters. |
Driver version
10.2.0
SQL Server version
Microsoft SQL Server 2019 (RTM-GDR) (KB4583458) - 15.0.2080.9 (X64) Nov 6 2020 16:50:01 Copyright (C) 2019 Microsoft Corporation Developer Edition (64-bit) on Windows Server 2019 Standard 10.0 (Build 17763: ) (Hypervisor)
Client Operating System
Microsoft Windows [Version 10.0.19044.1566]
JAVA/JVM version
11.0.4
Table schema
MyColumnWith3Chars VARCHAR(3), ....
The statement is plain vanilla
"INSERT INTO amt.dbo.MyFancyTable VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"
Problem description
prepared INSERT staments fail with "invalid column length from the bcp client" for no reason.
Expected behavior
Debugged source values with 1 single string (for 1 single VARCHAR column) always having 3 charatcers. Destination table has VARCHAR(3). => should work (and is working if bulk copy is not used when we use our normal table with unsupported DATE/DATETIME columns)
Actual behavior
The error below occurs.
Error message/stack trace
Vom bcp-Client wurde eine ungültige Spaltenlänge für die Spalten-ID 1 empfangen. errorCode=0, message=Vom bcp-Client wurde eine ungültige Spaltenlänge für die Spalten-ID 1 empfangen., SQLState=null, cause=null
java.sql.BatchUpdateException: Vom bcp-Client wurde eine ungültige Spaltenlänge für die Spalten-ID 1 empfangen.
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2073)
Any other details that can be helpful
If I increase the table column to VARCHAR(11) the error vanishes - but nothing is saved anyway. I debugged the values and trible checked with a collegue that all values exactly have 3 characters - not a single one has more or less. No idea what else to try. Thinking about coalation, we have connection parameter "sendStringParametersAsUnicode=false" set, so there even should not be a duplication to 6 bytes for 3 characters.
Again: if we revert back the datatypes of our date/time columns, everything (including this first column with VARCHAR(3) works without error and is saved. (As then bulk copy does not kick in, I guess, because of the unsupported DATE/DATETIME columns.)
JDBC trace logs
I enabled driver logging and found this:
164803674 INFO r.ccyCodeRef_I: >>>EUR<<<
164803674 FINER ENTRY 1 EUR [com.microsoft.sqlserver.jdbc.Statement]
164803674 FINER RETURN [com.microsoft.sqlserver.jdbc.Statement]
My output shows "EUR" ... I have no idea if "ENTRY 1 EUR" might be a wrong value as it has exactly 11 characters (what a coincidence ...).
The text was updated successfully, but these errors were encountered: