Skip to content

Use safe cache for Delta Lake Transaction Log#11562

Merged
findepi merged 1 commit intotrinodb:masterfrom
homar:homar/use_new_cache_delta_lake
Mar 23, 2022
Merged

Use safe cache for Delta Lake Transaction Log#11562
findepi merged 1 commit intotrinodb:masterfrom
homar:homar/use_new_cache_delta_lake

Conversation

@homar
Copy link
Copy Markdown
Member

@homar homar commented Mar 18, 2022

Description

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Mar 18, 2022
@homar homar force-pushed the homar/use_new_cache_delta_lake branch from 63faa0a to a4eac5d Compare March 18, 2022 11:07
@homar homar requested a review from findepi March 18, 2022 12:49
Copy link
Copy Markdown
Contributor

@findinpath findinpath Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we get to this point the snapshot could already be existing in the tableSnapshots cache.

A possible alternative would be to have a way to safely know whether the snapshot is already cached and act accordingly.

        AtomicBoolean isTableSnapshotAlreadyCached = new AtomicBoolean(true);
        TableSnapshot cachedTableSnapshot = tableSnapshots.get(location, () ->
        {
            isTableSnapshotAlreadyCached.set(false);
            return TableSnapshot.load(
                    table,
                    fileSystem,
                    tableLocation,
                    parquetReaderOptions,
                    checkpointRowStatisticsWritingEnabled);
        });

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we get to this point the snapshot could already be existing in the tableSnapshots cache.

And if this is the case we will get it from there instead of loading it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AtomicBoolean isTableSnapshotAlreadyCached = new AtomicBoolean(false);

I assume you made a mistake and initial value would be true, right ?
And the point of this part is to change the implementation in such a way we call cache.get only once? Do I understand correctly?

Copy link
Copy Markdown
Contributor

@findinpath findinpath Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you made a mistake and initial value would be true, right ?

Yes, I've corrected the snipped.

we call cache.get only once

Yes, that's my point.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that but I am not sure that will be cleaner than the current one. I find them both similarly ugly to follow ;)
@findepi WDYT ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will take another look once #11562 (review) gets applied

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parquetReaderOptions and checkpointRowStatisticsWritingEnabled are used only for TableSnapshot.load within the TransactionLogAccess class.

I recommend extracting the logic of the static method to a class which can be injected into TransactionLogAccess . This would give us the opportunity to thoroughly test the fix that you are applying in this PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't understand this comment. Which static method are you talking about?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.trino.plugin.deltalake.transactionlog.TableSnapshot#load

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean io.trino.plugin.deltalake.transactionlog.TableSnapshot.loadSnapshot ? It is not static this is why I am confused.
Also if you extract this to another class what about io.trino.plugin.deltalake.transactionlog.TableSnapshot.getActiveFiles ? Would you also try to extract it ?

Copy link
Copy Markdown
Contributor

@findinpath findinpath Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spotted these settings a bit out of place while reading the PR.

Obviously, it is not mandatory to extract this logic for being able to test the race conditions.
Ideally, this PR should provide appropriate tests to ensure that the class correctly with race conditions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this kind of tests would be good but I am not sure how to implement them. Running some simple operations hundred of times and waiting for failure/success seems like a flaky test to me.
I could maybe simulate something with heavy mocking but I have an impression like this kid of tests are not really used here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could maybe simulate something with heavy mocking

This is what I had in mind as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic: I was trying to understand where the method getActiveFiles is being used and noticed that the tableSnapshot parameter is always produced by a previous call to TransactionLogAccess as well.

transactionLogAccess.getMetadataEntry depends also on a tableSnapshot previously obtained from TransactionLogAccess as well.

It would be worth considering exposing a method which retrieves both the active metadata & files:

   // pseudocode
   MetadataAndAddFileEntryList getActiveMetadataAndAddFileEntryList(ConnectorSession session)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense but maybe not in scope of this PR. Please create a separate ticket for that,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@homar homar force-pushed the homar/use_new_cache_delta_lake branch from a4eac5d to 2376fd4 Compare March 21, 2022 12:13
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to unwrap the ExecutionException.
(also, watch out for UncheckedExecutionException and handle the same way)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in tests this should be throws Exception

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in this PR, you can just undo these changes, see the other comment)

@homar homar force-pushed the homar/use_new_cache_delta_lake branch 2 times, most recently from cd6161b to d6909d0 Compare March 22, 2022 09:43
@homar
Copy link
Copy Markdown
Member Author

homar commented Mar 22, 2022

@findepi comments addressed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restoring this is not that simple. I would need to check if data was loaded from the cache or file system. I would need to add atomic boolean etc (in similar fashion to what Marius suggested). Do you think it makes sense to add all this for a simple warn to be logged ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it belongs to

if (cachedTable.getVersion() > tableSnapshot.getVersion()) {
                return loadActiveFiles(tableSnapshot, session, fileSystem)

block? Please double check me on this

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, thanks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UncheckedEE too

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right!

@homar homar force-pushed the homar/use_new_cache_delta_lake branch from d6909d0 to a2aed4a Compare March 22, 2022 10:35
@findepi findepi merged commit 7553295 into trinodb:master Mar 23, 2022
@github-actions github-actions bot added this to the 375 milestone Mar 23, 2022
@findepi findepi mentioned this pull request Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants