-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2370] Supports data encryption #3614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Processing... |
|
Hi @liujinhui1994 Can you share the use case or some documentation for this feature? |
|
Hi Guys, Thank you |
pom.xml
Outdated
| <confluent.version>5.3.4</confluent.version> | ||
| <glassfish.version>2.17</glassfish.version> | ||
| <parquet.version>1.10.1</parquet.version> | ||
| <parquet.version>1.12.0</parquet.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main thing is the parquet upgrade. Would this be compatible with Spark 2.x. guess not? I think this needs Spark 3.2?
https://github.com/apache/spark/blob/branch-3.2/pom.xml
this will also upgrade parquet-avro that we use everywhere. So just thinking about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, vc,I am testing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liujinhui1994 : may I know whats the status on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently working on the last problem I found. Some classes cannot be found during runtime. After this is processed, I think it should be fine. @nsivabalan
|
https://issues.apache.org/jira/browse/HUDI-2370 @yanghua Here are some discussions |
|
@hudi-bot run azure |
|
@hudi-bot run azure |
|
@liujinhui1994 : how did your testing go. Can you update w/ your findings. @vinothchandar : apart from parquet version upgrade, I also see that we are enabling vectorized reading by default in this patch. Just wanted to remind you just incase we need to watch out for something. Also, should we do parquet upgrade in a separate patch, so that we can do some testing around diff query types, engines etc to certify the upgrade. |
|
I tested and verified last week. After upgrading the parquet version, many unit tests and integration tests failed. I am still looking for a solution. |
|
@hudi-bot run azure |
|
Of course, we first upgrade parquet 1.12.0. |
|
@hudi-bot run azure |
|
@hudi-bot run azure |
|
@liujinhui1994 thanks for making this feature. As discussed, some suggestions on the next steps: We'd like to see this gets into 0.10, but only without major parquet version upgrades. Understand you're using Kindly share with us any compatibility issues from your testings. cc @vinothchandar @nsivabalan |
|
@liujinhui1994 @vinothchandar @nsivabalan as discussed, removing this feature from 0.10. Let's continue the collaboration on next release. thanks |
xushiyan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liujinhui1994 some high level feedback:
- can you put some more info in the PR description to explain the high-level functionalities we want to add here?
- parquet 1.12 is only used when hudi is built with spark 3.2. the encryption feature needs to be somehow guarded by checking the intended spark version. so we need to find a way to make the functionality only available when people using spark 3.2+
...ent/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieFileWriterFactory.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/SystemUtils.java
Outdated
Show resolved
Hide resolved
hudi-integ-test/src/test/java/org/apache/hudi/integ/SystemUtils.java
Outdated
Show resolved
Hide resolved
Point 1: already added |
|
@liujinhui1994 per our discussions, some points here
|
ok |
|
@liujinhui1994 let's do #4804 instead. thanks |
ok |
Tips
What is the purpose of the pull request
Data security is becoming more and more important, if hudi can support encryption, it is very welcome
When querying, you need to pass the relevant key or obtain query permission based on the client's encrypted interface. If it fails, the result cannot be returned.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.