-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34029][SQL][TESTS] Add OrcEncryptionSuite and FakeKeyProvider #31603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR aims to add a basis for columnar encryption test framework by add `OrcEncryptionSuite` and `FakeKeyProvider`. Please note that we will improve more in both Apache Spark and Apache ORC in Apache Spark 3.2.0 timeframe. Apache ORC 1.6 supports columnar encryption. No. This is for a test case. Pass the newly added test suite. Closes #31065 from dongjoon-hyun/SPARK-34029. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
|
cc @maropu and @HyukjinKwon |
|
Kubernetes integration test starting |
|
Test build #135317 has finished for PR 31603 at commit
|
|
Kubernetes integration test status failure |
|
Could you review this please, @viirya ? |
| |) | ||
| |""".stripMargin) | ||
| sql("INSERT INTO encrypted VALUES('123456789', '[email protected]', 'Dongjoon Hyun')") | ||
| checkAnswer(sql("SELECT * FROM encrypted"), df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, when inserting/reading the table, is it allowed to specify different security options other than the ones in create table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's possible but it should comply with the original ones. Otherwise, it will read encrypted values like line 96.
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just adds build_and_test.yml change, compared with the previous one which was merged before).
lgtm
|
Thank you so much, @viirya ! |
|
I don't have a strong opinion on this. This added 30min GA job to test one case and I would prefer to avoid adding this given that we face GA resource issues globally going on. Do we know the reason why it needs to run the test in a separate forked JVM to pass? I know its tricky to investigate such case but it might be worthwhile taking a look to avoid adding another GA job. Otherwise I would name the tag to something like DedicatedJVM and add some more tests with this tag when we face such issue next time. |
|
Hi, @HyukjinKwon . This is a security feature and Apache ORC CryptoUtils creates the key provider from its Singleton Hadoop Shims Factory. If the other non-secured ORC code creates this without a proper configuration first, it will get NullKeyProvider and affects all the subsequent test cases.
Tag
|
|
BTW, I considered to put all security tests into this |
|
BTW, I'll create a new JIRA issue for |
|
Sure thanks @dongjoon-hyun for addressing my comment! |
What changes were proposed in this pull request?
This is a retry of #31065 . Last time, the newly add test cases passed in Jenkins and individually, but it's reverted because they fail when
GitHub Actionruns withSERIAL_SBT_TESTS=1.In this PR,
SecurityTesttag is used to isolateKeyProvider.This PR aims to add a basis for columnar encryption test framework by add
OrcEncryptionSuiteandFakeKeyProvider.Please note that we will improve more in both Apache Spark and Apache ORC in Apache Spark 3.2.0 timeframe.
Why are the changes needed?
Apache ORC 1.6 supports columnar encryption.
Does this PR introduce any user-facing change?
No. This is for a test case.
How was this patch tested?
Pass the newly added test suite.