Tune JDBC fetch-size automatically based on column count#16644
Tune JDBC fetch-size automatically based on column count#16644
Conversation
9b3093b to
ae7226a
Compare
hashhar
left a comment
There was a problem hiding this comment.
the idea is nice however I don't know whether larger fetch sizes cause issues for wide tables, have you observed something?
I would've expected a different heurestic: the wider table or more rows we pull the higher the fetch count.
One concern I have with this is that this will impact memory estimation since the fetch size is no longer a constant value.
| PreparedStatement statement = connection.prepareStatement(sql); | ||
| statement.setFetchSize(1000); | ||
| // This is a heuristic, not exact science. A better formula can perhaps be found with measurements. | ||
| // Column count is not known for non-SELECT queries. Not setting fetch size for these. |
There was a problem hiding this comment.
not setting fetch size can mean a fetch size of 1 in some drivers. IMO in case we don't know column size we should default to older value of 1k.
There was a problem hiding this comment.
I had this originally, but note that column count is not known only for queries like DELETE (not SELECT queries).
All the queries reading data know their projected column count.
Thus I had a choice: do the change defensively, as if I didn't know when column count may be missing. Or write the code "the way it would be written today".
I would expect this can cause memory pressure issues. Note that
in what context do we do memory estimation, taking into account rows prefetched by the JDBC driver? |
We don't do it today. Now that I re-read my comment actually even today the prefetch is not constant. For wider tables we prefetch more compared to narrow tables. So actually your change is probably better in this regard. This looks like a good starting point. We can iterate over time if someone finds issues here. Do you think it'd be useful to have a killswitch for sometime? |
yes, that's the idea
Sure, i can add one |
ae7226a to
2f01b25
Compare
2f01b25 to
6cc400e
Compare
There was a problem hiding this comment.
Perhaps, we can avoid overriding the fetch size when the size is specified with defaultRowFetchSize connection property in addition to this change? It will allow users configure the value in their side. We can retrieve the value with PgConnection#getDefaultFetchSize.
There was a problem hiding this comment.
Sounds like this is orthogonal, i.e. the existing code was setting the fetch size unconditionally, and this one sets fetch size unconditionally, just with a different value.
However, i don't think we should go into that direction at all.
if our intent is to let users configure the fetch size, we should have an explicit toggle rather than inspect fetch size provided in the JDBC URL string. Note however that giving users' control is good, but having our code be smarter is even better, and those things are at odds. i DO think we should desist desire to throw toggle at a problem.
|
I think the idea is good as we consider more factors while fetching the values. |
There was a problem hiding this comment.
can this instead live in JdbcMetadataConfig?
There was a problem hiding this comment.
no, it's applicable to only some connectors (postgresql, oracle, redshift)
hashhar
left a comment
There was a problem hiding this comment.
LGTM % comments from me and Yuya
see #16269 (comment) tl;dr: few users would benefit from a config toggle; many users may benefit from auto-adjusted value
@chenjian2664 i suppose in next iteration someone will do some experiments and propose a better formula. |
6cc400e to
bebfcd3
Compare
|
There is a conflict with #16379 (just merged) |
bebfcd3 to
5b3bcca
Compare
|
Conflicts with #16616 (just merged), will rebase. |
PostgreSQL, Redshift and Oracle connectors had hard-coded fetch-size value of 1000. The value was found not to be optimal when server is far (high latency) or when number of columns selected is low. This commit improves in the latter case by picking fetch size automatically based on number of columns projected. After the change, the fetch size will be automatically picked in the range 1000 to 100,000.
5b3bcca to
039eeb9
Compare
|
CI |
|
CI #16652 (again) |
PostgreSQL, Redshift and Oracle connectors had hard-coded fetch-size value of 1000. The value was found not to be optimal when server is far (high latency) or when number of columns selected is low. This commit improves in the latter case by picking fetch size automatically based on number of columns projected. After the change, the fetch size will be automatically picked in the range 1000 to 100,000.
Fixes #16153
Alternative to #16269