Add configurable insert_batch_size JDBC session property#8434
Add configurable insert_batch_size JDBC session property#8434hashhar merged 1 commit intotrinodb:masterfrom
Conversation
8709a6e to
bd1633b
Compare
kokosing
left a comment
There was a problem hiding this comment.
Please squash first 3 commits together. Also a good rule is to have cleanup commits before actual change, so it is easier to extract them to separate PR and when you address review comments then it lower chance for conflicts.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcMetadataConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSinkProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestBaseJdbcConfig.java
Outdated
Show resolved
Hide resolved
hashhar
left a comment
There was a problem hiding this comment.
Looks good overall.
Some minor comments.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcMetadataConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestBaseJdbcConfig.java
Outdated
Show resolved
Hide resolved
bd1633b to
9ba6664
Compare
hashhar
left a comment
There was a problem hiding this comment.
Looks good % a couple of nitpicks.
A suggestion about one of the tests.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSinkProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcPageSink.java
Outdated
Show resolved
Hide resolved
9ba6664 to
9b8256c
Compare
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcMetadataSessionProperties.java
Outdated
Show resolved
Hide resolved
2fc6b7f to
a626059
Compare
|
@hashhar I don't know whether it adds value since we are not able to determine how many batches were executed. @sergey-melnychuk Maybe let's add insert test with a small and large batch sizes (and insert e.g. 100 rows with a batch size = 1, 50, 64, 100 and 10000) to see how |
|
@hashhar @wendigo WDYT about slightly refactored Such test can be extended to other methods of |
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcPageSink.java
Outdated
Show resolved
Hide resolved
675baa9 to
265354f
Compare
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/BaseJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/BaseJdbcClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSinkProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestJdbcPageSink.java
Outdated
Show resolved
Hide resolved
|
Thanks @kokosing , will put refactoring and unit-test out to separate PR and then add e2e test similar to existing ones. IMO such refactoring still introduces SRP and separation of concerns, seems like lack of them makes testing of |
265354f to
a626059
Compare
|
Leaving commit with refactoring out of this PR, now preparing the test using |
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSink.java
Outdated
Show resolved
Hide resolved
04aaab2 to
51825fd
Compare
|
Added simple e2e test that simply inserts |
hashhar
left a comment
There was a problem hiding this comment.
LGTM % comment.
Looks good to go otherwise.
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
51825fd to
e0ca6f7
Compare
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
2a88fd0 to
d9d232f
Compare
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
547c28c to
716b587
Compare
|
LGTM, please squash commits into single one @sergey-melnychuk |
Co-authored-by: Ashhar Hasan <hashhar_dev@outlook.com>
716b587 to
6e1cb08
Compare
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Show resolved
Hide resolved
|
Looking at the code comments (none), it's not clear why we want to have this configurable. As in #3775 (comment) -- we should understand why this needs to be configurable before making it so. |
| return insertBatchSize; | ||
| } | ||
|
|
||
| @Config("insert.batch-size") |
There was a problem hiding this comment.
is it applicable to CREATE TABLE AS as well?
There was a problem hiding this comment.
Yes. The PageSink gets used there too. write.batch-size?
| // between performance and pushdown capabilities | ||
| private int domainCompactionThreshold = 32; | ||
|
|
||
| private int insertBatchSize = 1000; |
There was a problem hiding this comment.
This doesn't belong to JdbcMetadataConfig, since it's not consumed in JdbcMetadata layer
There was a problem hiding this comment.
I think this is mostly because there were no session properties contributed by BaseJdbcConfig.
Maybe this should change? WDYT?
There was a problem hiding this comment.
Thank you for the discussion. I'm currently working with Trino 465 (OSS) and using the JDBC connector to insert data into SQL Server.
I've observed that even after setting write_batch_size = 5000, the inserts are still performed row-by-row in SQL Server Profiler (sp_execute per row), which impacts performance significantly for large datasets.
From my analysis:
- The property
insert_batch_sizewas introduced in PR Add configurable insert_batch_size JDBC session property #8434 to control batch size. - However, in Open Source Trino, this property is not exposed as a session property (
SET SESSION insert_batch_size = ...). - Instead,
write_batch_sizeis used, but it seems to be ignored or not applied effectively in some cases.
My question:
Is there a known reason why write_batch_size doesn't lead to true batched inserts (e.g., multi-row VALUES or bulk operations) when using the JDBC connector?
I'd appreciate any insights or guidance on how to achieve better batching behavior.
Thanks for your time and contributions to the project!
There was a problem hiding this comment.
Hi !! The property was renamed write_batch_size so we could SET SESSION <catalog_name>.write_batch_size-... for configuring it.
There was a problem hiding this comment.
write_batch_size is more of a JDBC specific generic property but it doesn't guarantee that writes operations are performed in bulk - it depends on how the JDBC driver is being implemented. In case of SQLServer sp_execute could be invoked per row. Have we tried by setting bulkInsertCopy property in SQLServer - https://learn.microsoft.com/en-us/sql/connect/jdbc/using-bulk-copy-with-the-jdbc-driver?view=sql-server-ver17#sqlserverbulkcopyoptions
There was a problem hiding this comment.
Hi @Praveen2112,
Thank you for the clarification!
I understand that write_batch_size is a JDBC-specific property and doesn't guarantee true bulk inserts — it only controls batch size for parameterized statements.
However, I'm trying to achieve real bulk insert performance (like BULK INSERT or bcp) when inserting 1.5M rows from Trino to MSSQL.
My goal is to see in SQL Server Profiler:
BULK INSERT [trino_test].[aaaaaa].[orders]
FROM 'virtual_stream'
WITH (TABLOCK, BATCHSIZE = 1000)There was a problem hiding this comment.
Could you please advise what configuration changes are required in Trino 465 (Open Source) to enable this behavior?
I've already confirmed that:
dbs-ai-sqldev_trino-test.bulk_copy_for_write = true is set in the catalog
retry_policy = 'NONE' is set in the session
Despite this, I still observe row-by-row inserts via sp_execute in SQL Profiler.
Is there any additional configuration or known limitation in Open Source Trino that prevents the use of SQLServerBulkCopy API?
| public static final String AGGREGATION_PUSHDOWN_ENABLED = "aggregation_pushdown_enabled"; | ||
| public static final String TOPN_PUSHDOWN_ENABLED = "topn_pushdown_enabled"; | ||
| public static final String DOMAIN_COMPACTION_THRESHOLD = "domain_compaction_threshold"; | ||
| public static final String INSERT_BATCH_SIZE = "insert_batch_size"; |
There was a problem hiding this comment.
same here -- it shouldn't be in this class, since it's not consumed by metadata layer
Example:
set session postgresql.insert_batch_size = 42;