Tune JDBC fetch-size automatically based on column count by findepi · Pull Request #16644 · trinodb/trino

findepi · 2023-03-21T09:45:50Z

PostgreSQL, Redshift and Oracle connectors had hard-coded fetch-size value of 1000. The value was found not to be optimal when server is far (high latency) or when number of columns selected is low. This commit improves in the latter case by picking fetch size automatically based on number of columns projected. After the change, the fetch size will be automatically picked in the range 1000 to 100,000.

Fixes #16153
Alternative to #16269

hashhar

the idea is nice however I don't know whether larger fetch sizes cause issues for wide tables, have you observed something?

I would've expected a different heurestic: the wider table or more rows we pull the higher the fetch count.

One concern I have with this is that this will impact memory estimation since the fetch size is no longer a constant value.

hashhar · 2023-03-21T09:49:36Z

plugin/trino-oracle/src/main/java/io/trino/plugin/oracle/OracleClient.java

        PreparedStatement statement = connection.prepareStatement(sql);
-        statement.setFetchSize(1000);
+        // This is a heuristic, not exact science. A better formula can perhaps be found with measurements.
+        // Column count is not known for non-SELECT queries. Not setting fetch size for these.


not setting fetch size can mean a fetch size of 1 in some drivers. IMO in case we don't know column size we should default to older value of 1k.

I had this originally, but note that column count is not known only for queries like DELETE (not SELECT queries).
All the queries reading data know their projected column count.
Thus I had a choice: do the change defensively, as if I didn't know when column count may be missing. Or write the code "the way it would be written today".

findepi · 2023-03-21T09:56:44Z

the idea is nice however I don't know whether larger fetch sizes cause issues for wide tables, have you observed something?

I would've expected a different heurestic: the wider table or more rows we pull the higher the fetch count.

I would expect this can cause memory pressure issues.

Note that

i didn't run any measurements on my own. This is just a heuristic, just like the previous value of 1k was just a heuristic.
the change doesn't discern data types. Ideally we could make the choice based on the width of columns pulled from statistics.

One concern I have with this is that this will impact memory estimation since the fetch size is no longer a constant value.

in what context do we do memory estimation, taking into account rows prefetched by the JDBC driver?
in any case, the estimations are still possible, since there is a min and max value for the fetch size (1k and 100k respectively).

hashhar · 2023-03-21T10:12:22Z

in what context do we do memory estimation

We don't do it today. Now that I re-read my comment actually even today the prefetch is not constant. For wider tables we prefetch more compared to narrow tables. So actually your change is probably better in this regard.

This looks like a good starting point. We can iterate over time if someone finds issues here. Do you think it'd be useful to have a killswitch for sometime?

findepi · 2023-03-21T13:28:49Z

This looks like a good starting point. We can iterate over time if someone finds issues here.

yes, that's the idea

Do you think it'd be useful to have a killswitch for sometime?

Sure, i can add one

findepi · 2023-03-21T21:11:25Z

CI #15430, #16652

ebyhr · 2023-03-21T23:14:04Z

plugin/trino-postgresql/src/main/java/io/trino/plugin/postgresql/PostgreSqlClient.java

Perhaps, we can avoid overriding the fetch size when the size is specified with defaultRowFetchSize connection property in addition to this change? It will allow users configure the value in their side. We can retrieve the value with PgConnection#getDefaultFetchSize.

Sounds like this is orthogonal, i.e. the existing code was setting the fetch size unconditionally, and this one sets fetch size unconditionally, just with a different value.

However, i don't think we should go into that direction at all.
if our intent is to let users configure the fetch size, we should have an explicit toggle rather than inspect fetch size provided in the JDBC URL string. Note however that giving users' control is good, but having our code be smarter is even better, and those things are at odds. i DO think we should desist desire to throw toggle at a problem.

chenjian2664 · 2023-03-21T23:55:15Z

I think the idea is good as we consider more factors while fetching the values.
But the question became how can we get the "correct" formula, eg, https://github.com/trinodb/trino/pull/16644/files/6cc400e7dd00946dc9270a9f680032843f7d1191#diff-e7f2966e1e2e070e8643b5563c7ba62f7dc0652f76fbf446ef389b75ef4883fbR423 why set 100_000, again is it allow control by user?
Why don't allow set by user like #16269 , and this is what already done while writing the values, see https://github.com/trinodb/trino/blob/master/plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcPageSink.java#L128

hashhar · 2023-03-22T05:41:22Z

plugin/trino-postgresql/src/main/java/io/trino/plugin/postgresql/PostgreSqlConfig.java

can this instead live in JdbcMetadataConfig?

no, it's applicable to only some connectors (postgresql, oracle, redshift)

hashhar

LGTM % comments from me and Yuya

findepi · 2023-03-22T10:01:25Z

Why don't allow set by user like #16269 ,

see #16269 (comment)

tl;dr: few users would benefit from a config toggle; many users may benefit from auto-adjusted value
i aim at the bigger benefit

But the question became how can we get the "correct" formula

@chenjian2664 i suppose in next iteration someone will do some experiments and propose a better formula.

findepi · 2023-03-22T10:26:21Z

There is a conflict with #16379 (just merged)
also, the formula used min(..., 1000) but it was supposed to be max(..., 1000). Will change & rebase.

findepi · 2023-03-22T10:42:29Z

Conflicts with #16616 (just merged), will rebase.

PostgreSQL, Redshift and Oracle connectors had hard-coded fetch-size value of 1000. The value was found not to be optimal when server is far (high latency) or when number of columns selected is low. This commit improves in the latter case by picking fetch size automatically based on number of columns projected. After the change, the fetch size will be automatically picked in the range 1000 to 100,000.

findepi · 2023-03-22T13:42:11Z

CI ignite #16671

findepi · 2023-03-22T16:49:17Z

CI #16652 (again)

findepi requested review from chenjian2664, ebyhr, hashhar, jklamer, vlad-lyutenko and wendigo March 21, 2023 09:45

cla-bot bot added the cla-signed label Mar 21, 2023

This was referenced Mar 21, 2023

Tune JDBC fetch-size automatically based on column count #16643

Closed

Added jdbc fetch size config for pg/oracle/redshift #16269

Closed

findepi force-pushed the findepi/auto-fetch-size-v2 branch from 9b3093b to ae7226a Compare March 21, 2023 09:51

hashhar reviewed Mar 21, 2023

View reviewed changes

findepi force-pushed the findepi/auto-fetch-size-v2 branch from ae7226a to 2f01b25 Compare March 21, 2023 13:40

findepi force-pushed the findepi/auto-fetch-size-v2 branch from 2f01b25 to 6cc400e Compare March 21, 2023 21:11

findepi requested a review from hashhar March 21, 2023 21:11

ebyhr approved these changes Mar 21, 2023

View reviewed changes

hashhar reviewed Mar 22, 2023

View reviewed changes

hashhar approved these changes Mar 22, 2023

View reviewed changes

findepi force-pushed the findepi/auto-fetch-size-v2 branch from 6cc400e to bebfcd3 Compare March 22, 2023 10:24

findepi force-pushed the findepi/auto-fetch-size-v2 branch from bebfcd3 to 5b3bcca Compare March 22, 2023 10:28

hashhar approved these changes Mar 22, 2023

View reviewed changes

findepi mentioned this pull request Mar 22, 2023

Remove kill-switch to disable automatic fetch-size selection #16670

Merged

findepi force-pushed the findepi/auto-fetch-size-v2 branch from 5b3bcca to 039eeb9 Compare March 22, 2023 10:43

findepi merged commit 9c4ef0e into master Mar 22, 2023

findepi deleted the findepi/auto-fetch-size-v2 branch March 22, 2023 16:49

github-actions bot added this to the 411 milestone Mar 22, 2023

colebow mentioned this pull request Mar 29, 2023

Add Trino 411 release notes #16552

Merged

ebyhr mentioned this pull request Jul 16, 2024

Add oracle, redshift, postgresql jdbc fetch size #22670

Merged

Conversation

findepi commented Mar 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hashhar left a comment

Choose a reason for hiding this comment

Uh oh!

hashhar Mar 21, 2023

Choose a reason for hiding this comment

Uh oh!

findepi Mar 21, 2023

Choose a reason for hiding this comment

Uh oh!

findepi commented Mar 21, 2023

Uh oh!

hashhar commented Mar 21, 2023

Uh oh!

findepi commented Mar 21, 2023

Uh oh!

findepi commented Mar 21, 2023

Uh oh!

ebyhr Mar 21, 2023

Choose a reason for hiding this comment

Uh oh!

findepi Mar 22, 2023

Choose a reason for hiding this comment

Uh oh!

chenjian2664 commented Mar 21, 2023

Uh oh!

hashhar Mar 22, 2023

Choose a reason for hiding this comment

Uh oh!

findepi Mar 22, 2023

Choose a reason for hiding this comment

Uh oh!

hashhar left a comment

Choose a reason for hiding this comment

Uh oh!

findepi commented Mar 22, 2023

Uh oh!

findepi commented Mar 22, 2023

Uh oh!

findepi commented Mar 22, 2023

Uh oh!

findepi commented Mar 22, 2023

Uh oh!

findepi commented Mar 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

findepi commented Mar 21, 2023 •

edited

Loading