Skip to content

Add parquet ecryption functionality into presto#17881

Merged
zhenxiao merged 1 commit intoprestodb:masterfrom
shangxinli:column_indexes_dev_new_4_rebase.new.new
Jun 16, 2022
Merged

Add parquet ecryption functionality into presto#17881
zhenxiao merged 1 commit intoprestodb:masterfrom
shangxinli:column_indexes_dev_new_4_rebase.new.new

Conversation

@shangxinli
Copy link
Copy Markdown
Collaborator

@shangxinli shangxinli commented Jun 14, 2022

Co-authored-by: ggershinsky ggershinsky@users.noreply.github.com

Summary: This is to port parquet-mr decryption functionality. The main commits in parquet-mr for encryption/decryption are apache/parquet-java@65b95fb and several other fixes. This change only port the decryption only.

Test plan - (Please fill in how you tested your changes)

This feature was tested in the Uber environment and then rolled out to production for 2+ years.

Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.

== RELEASE NOTES ==

General Changes
* Add decryption functionality to Presto. When a Parquet file is encrypted following [Parquet Modular Encryption](https://github.com/apache/parquet-format/blob/master/Encryption.md), this change enables Presto to be able to decrypt.  

Hive Changes
* No flag is introduced. Presto-Hive was changed by adding the loading DecryptionPropertiesFactory(implemented in parquet-mr) and using it to get the file decryptor and pass it to presto-parquet.  

@shangxinli shangxinli requested a review from a team as a code owner June 14, 2022 14:41
@shangxinli shangxinli requested a review from presto-oss June 14, 2022 14:41
Co-authored-by: ggershinsky <ggershinsky@users.noreply.github.com>

Summary: This is to port parquet-mr decryption apache/parquet-java@65b95fb
@shangxinli shangxinli force-pushed the column_indexes_dev_new_4_rebase.new.new branch from 057eee3 to 7bf6a2b Compare June 14, 2022 19:35
@shangxinli
Copy link
Copy Markdown
Collaborator Author

shangxinli commented Jun 14, 2022

This is the PR from old PR #17791. This PR resolved the conflict with HudiParquetPageSource manually. For all other changes, it was just 'git cherry-pick' without any conflict.

The path to get to this PR is: #17479 -> #17728 -> #17791 -> #17881 (this PR). If you want to find out all the review comments, please read all the PRs' comments.

Copy link
Copy Markdown
Collaborator

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work, @shangxinli

@zhenxiao
Copy link
Copy Markdown
Collaborator

@kewang1024 @beinan this PR looks good.
following your approvals in #17791, I am willing to merge this PR soon. let me know if you have any new comments about this PR

@zhenxiao
Copy link
Copy Markdown
Collaborator

based on approvals in #17791, I am merging this PR

@zhenxiao zhenxiao merged commit 71dc62a into prestodb:master Jun 16, 2022
@shangxinli shangxinli deleted the column_indexes_dev_new_4_rebase.new.new branch June 17, 2022 17:58
@highker highker mentioned this pull request Jul 6, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants