Skip to content

Raptor read and write with delta delete functionality#13756

Merged
highker merged 5 commits intoprestodb:masterfrom
kewang1024:Raptor_read_write_with_delta_delete
Dec 18, 2019
Merged

Raptor read and write with delta delete functionality#13756
highker merged 5 commits intoprestodb:masterfrom
kewang1024:Raptor_read_write_with_delta_delete

Conversation

@kewang1024
Copy link
Collaborator

@kewang1024 kewang1024 commented Nov 26, 2019

== RELEASE NOTES ==

Raptor Changes
* Add the ability to read and write with delta deletes.

@kewang1024
Copy link
Collaborator Author

The second PR from #13248

@kewang1024 kewang1024 force-pushed the Raptor_read_write_with_delta_delete branch 2 times, most recently from 54f964b to 0246db9 Compare November 26, 2019 19:41
@kewang1024 kewang1024 changed the title Add tableSupportsDeltaDelete and deltaShardUuid in RaptorSplit[WIP] Add tableSupportsDeltaDelete and deltaShardUuid in RaptorSplit Nov 26, 2019
@kewang1024 kewang1024 changed the title Add tableSupportsDeltaDelete and deltaShardUuid in RaptorSplit Raptor read and write with delta delete functionality Nov 26, 2019
@kewang1024 kewang1024 requested a review from highker November 26, 2019 19:51
Copy link
Contributor

@jessesleeping jessesleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed "Enhance RaptorSplit and enable delta read".

As reviewed before, the general direction and main logic LGTM. Comments on minor changes.

Copy link
Contributor

@jessesleeping jessesleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed "Add delta delete functionality on worker". Generally looks good. Comments are around code refactoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to DeltaShardRewriter, InplaceShardRewriter seems to be more coupled with OrcStorageManager as it's originally part of OrcStorageManager. I suggest we can keep InplaceShardRewriter an embedded class of OrcStorageManageror an anonymous class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@highker what do you think of this suggestion?

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high-level comments

@kewang1024 kewang1024 force-pushed the Raptor_read_write_with_delta_delete branch 3 times, most recently from 50703ba to 17c87fd Compare November 28, 2019 22:40
@kewang1024
Copy link
Collaborator Author

kewang1024 commented Nov 28, 2019

All known requests for change have been addressed. Put up a temp commit for refactor of OrcPageSource. Will merge it after review.

@kewang1024 kewang1024 force-pushed the Raptor_read_write_with_delta_delete branch from 4b9a761 to 17c87fd Compare November 30, 2019 04:51
@highker
Copy link

highker commented Dec 2, 2019

rebase?

@kewang1024 kewang1024 force-pushed the Raptor_read_write_with_delta_delete branch from 17c87fd to eb40f8f Compare December 2, 2019 22:33
Copy link
Contributor

@jessesleeping jessesleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed "Commit protocol for rewrite delete, delta delete". LGTM with minor comments.

Copy link
Contributor

@jessesleeping jessesleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed " Export statistics of delta delete". LGTM with minor refactoring suggestions.

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OrcPageSource should not have fundamental surgery.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right design. This file should remain (almost) unchanged. The only critical change to be made is on recordReader. That should be replaced by another page source (which inherits ConnectorPageSource) to represent the source data. Let me know if you want to chat in person.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we chat in person today?

@kewang1024 kewang1024 force-pushed the Raptor_read_write_with_delta_delete branch 5 times, most recently from b659d83 to 4d54611 Compare December 10, 2019 00:08
@kewang1024 kewang1024 force-pushed the Raptor_read_write_with_delta_delete branch from 6be56fd to df5dcc8 Compare December 11, 2019 02:52
@kewang1024 kewang1024 requested a review from highker December 11, 2019 07:29
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments

@kewang1024 kewang1024 force-pushed the Raptor_read_write_with_delta_delete branch 4 times, most recently from 71718ce to 954a8dd Compare December 12, 2019 06:26
@kewang1024 kewang1024 requested a review from highker December 12, 2019 06:26
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

early comments; still reviewing

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only used by compaction, shall we move this with the compaction PR? Also, if this interface is only for delta delete, why do we need flag tableSupportsDeltaDelete? I think that can be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only used by compaction, shall we move this with the compaction PR? Also, if this interface is only for delta delete, why do we need flag tableSupportsDeltaDelete? I think that can be removed.

Actually this is a legacy problem.
We write this replaceShardUuids as a function to completely replace the old replaceShardUuids, that's why we need tableSupportsDeltaDelete in this new function.
But during the process, we found ourselves insecure of touching the old logical path, so we restore the old replaceShardUuids and keep these two together.
And we build tests around this new method and everything.
We will remove the old method once we feel the feature is stable. I've already left comments above
// TODO: Will merge these new function with old function once new feature is stable

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a try is needed to avoid file descriptor leak. Check other use cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a try is needed to avoid file descriptor leak. Check other use cases.

I think as long as we close OrcDataSource dataSource, we wouldn't have file descriptor leak.

So in above, we use
try (OrcDataSource dataSource = openShard(fileSystem, deltaShardUuid.get(), defaultReaderAttributes))
which will close the datasource once it encounters some exceptions

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will miss closing all the stream readers. So use try to make sure it is closed properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will miss closing all the stream readers. So use try to make sure it is closed properly.

But I do see from other examples that they used OrcBatchRecordReader with OrcDataSource dataSource as well, and they only close dataSource
eg1:
OrcStorageManager
public ConnectorPageSource getPageSource
it uses try and catch and it only closes the dataSourcecloseQuietly(dataSource);

eg2:
OrcBatchPageSourceFactory
createOrcPageSource
orcDataSource.close();

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because readers will be closed by ConnectorPageSource. In your case, no one can close the readers.

@kewang1024 kewang1024 force-pushed the Raptor_read_write_with_delta_delete branch 2 times, most recently from 8ac846d to 9ba31f5 Compare December 14, 2019 07:21
@kewang1024 kewang1024 requested a review from highker December 14, 2019 07:23
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will miss closing all the stream readers. So use try to make sure it is closed properly.

@kewang1024 kewang1024 force-pushed the Raptor_read_write_with_delta_delete branch from 9ba31f5 to 7bb94b0 Compare December 16, 2019 20:02
@kewang1024 kewang1024 requested a review from highker December 16, 2019 22:43
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement try for OrcBatchRecordReader recordReader = reader.createBatchRecordReader(. Otherwise LGTM

@kewang1024 kewang1024 force-pushed the Raptor_read_write_with_delta_delete branch from 7bb94b0 to c8375a3 Compare December 17, 2019 20:23
Enable the ability to send tableSupportsDeltaDelete and deltaShardUuid
from Coordinator to Workers. Enable delta read functionality.
Add delta delete functionality on coordinator
@kewang1024 kewang1024 force-pushed the Raptor_read_write_with_delta_delete branch from c8375a3 to 6ca3e6d Compare December 17, 2019 23:32
@kewang1024 kewang1024 requested a review from highker December 18, 2019 00:27
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@highker highker merged commit 7841028 into prestodb:master Dec 18, 2019
@aweisberg aweisberg mentioned this pull request Jan 17, 2020
7 tasks
@caithagoras caithagoras mentioned this pull request Jan 22, 2020
6 tasks
@kewang1024 kewang1024 deleted the Raptor_read_write_with_delta_delete branch March 12, 2023 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants