Skip to content

Conversation

@atanasenko
Copy link
Member

Description

Missing table metadata can be configured to not get cached (cacheMissing), but that doesn't apply to schema and table name lists.

In certain cases it might be beneficial to shorten the list cache ttl for them to reflect the added schemas/tables faster, while specific table caches will automatically reload tables that were missing before.

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@hashhar
Copy link
Member

hashhar commented Feb 13, 2023

This is making configuration very complex over time. What is the actual problem we are trying to solve? Do we want to notice newly created objects faster? Or is the goal something else?

@hashhar
Copy link
Member

hashhar commented Feb 13, 2023

Missing table metadata can be configured to not get cached (cacheMissing), but that doesn't apply to schema and table name lists.

To me it sounds like this is the solution we should implement - it would solve the problem in a simpler way and also make the config apply consistently.

@skrzypo987
Copy link
Member

To me it sounds like this is the solution we should implement

How should that work? The list of objects is not a 0/1 situation. With a table it either exists or it doesn't. A list is a list. It doesn't show up when the table is created, but it is changed.

@atanasenko atanasenko force-pushed the at/jdbc-metadata-cache-ttl branch 2 times, most recently from 66c39ac to 0b35c0e Compare February 15, 2023 13:51
Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some tests failing.
Other than that looks good

@atanasenko atanasenko force-pushed the at/jdbc-metadata-cache-ttl branch 2 times, most recently from 434709d to 677d14f Compare February 16, 2023 15:12
Comment on lines 115 to 123
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this constructor use the other one at all places ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those options are now non-optional and will have to be equal to metadataCachintTtl in such cases. I think it's easier to have an overloaded constructor to indicate that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split this into two methods - one for table and other for schema ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test ensures that schema and table names cache ttls are different, it won't do that if they are separate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test method is quite big and it does assert multiple operations like with and without cache. Or we could have schema name cache, table name cache and a test method with combined configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those calls are inter-dependendent. First of all the test ensure that schema and table caches are different, an then it tests the evolution of the cache over time. I can split it into separate 'stages' which should make it more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split it into multiple smaller sections and extracted to a separate assertions class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a DataProvider in this case ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the two values for boolean warrant a DataProvider, and it's not really data, it's two modes of operations and they should be expressed explcitly.

@atanasenko atanasenko force-pushed the at/jdbc-metadata-cache-ttl branch from 677d14f to a04744f Compare February 17, 2023 15:18
@atanasenko atanasenko force-pushed the at/jdbc-metadata-cache-ttl branch from a04744f to 0f7254c Compare February 17, 2023 16:05
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test method is quite big and it does assert multiple operations like with and without cache. Or we could have schema name cache, table name cache and a test method with combined configuration.

@atanasenko atanasenko force-pushed the at/jdbc-metadata-cache-ttl branch from 0f7254c to edcd11f Compare February 20, 2023 13:55
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a DataProvider in this case ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since cacheMissing works only at a table level - do we need to mix up with the schema related tests. We could configure some default value of schema cache ttl and create a JDBC client. WDYT ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing a usecase where a newly added table gets reflected in metadata lists faster than the general table cache ttl.

@atanasenko atanasenko force-pushed the at/jdbc-metadata-cache-ttl branch 3 times, most recently from f25eeb0 to 0648a59 Compare February 23, 2023 12:21
Comment on lines 1158 to 1162
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are asserting only the schema and table names do we need this assertion related to tableHandleByName.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed to make sure that metadata caches have separate ttl from individual table caches.

@findepi findepi requested a review from sopel39 February 27, 2023 10:57
@findepi
Copy link
Member

findepi commented Feb 27, 2023

@atanasenko please fork with @sopel39 to ensure this is consistent with recent Hive connector metadata caching changes design

@atanasenko atanasenko force-pushed the at/jdbc-metadata-cache-ttl branch from 0648a59 to d69f36b Compare February 27, 2023 11:11
@atanasenko
Copy link
Member Author

atanasenko commented Feb 27, 2023

Hive connector metadata caching changes design

@findepi those changes in #15811 seem to be related to extracting a separate ttl for stats, but here I'm separating some of the metadata, cache and individual table metadata are untouched.

@atanasenko atanasenko force-pushed the at/jdbc-metadata-cache-ttl branch from d69f36b to 1b25265 Compare February 27, 2023 13:25
Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@atanasenko atanasenko force-pushed the at/jdbc-metadata-cache-ttl branch from 1b25265 to bb45631 Compare March 2, 2023 10:32
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skimmed, will do another pass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to cap this to be lower than metadata cache ttl? Listing operations will see a table but there would be no metadata cached for it and the table might've been dropped already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists and specific metadata objects are loaded at different times too, sot this scenario can still happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar Theoretically, there could be some situations when users want to cache lists of schemas/tables for a longer amount of time (very big number of entries which do not change over time), but still let individual table data get reloaded.
This is a generic config after all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point.

assertTableNamesCache(cachingJdbcClient)
.misses(1)
.loads(1)
.afterRunning(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's preexisting, but
Should/can lambda have a parameter client which is received from assertTableNamesCache cachingJdbcClient?

Missing table metadata can be configured to not get cached
(`cacheMissing`), but that doesn't apply to schema and table name lists.

In certain cases it might be beneficial to shorten the list cache ttl
for them to reflect the added schemas/tables faster, while specific
table caches will automatically reload tables that were missing before.
@atanasenko atanasenko force-pushed the at/jdbc-metadata-cache-ttl branch from bb45631 to 215ed86 Compare March 2, 2023 16:05
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % whether we want to cap the list cache TTL to be lower than metadata cache ttl

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from me.

@kokosing kokosing merged commit 5d15ca4 into trinodb:master Mar 3, 2023
@github-actions github-actions bot added this to the 409 milestone Mar 3, 2023
@colebow
Copy link
Member

colebow commented Mar 3, 2023

Does this need release notes? cc @kokosing

@kokosing
Copy link
Member

kokosing commented Mar 3, 2023

Yes, thank you for asking.

JDBC connectors
*  Introduce `metadata.schemas.cache-ttl` and `metadata.tables.cache-ttl` configuration properties to control how long metadata information about tables and schemas should be kept in cache memory

CC: @atanasenko

@atanasenko atanasenko deleted the at/jdbc-metadata-cache-ttl branch March 5, 2023 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

8 participants