Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Feb 20, 2021

What changes were proposed in this pull request?

This is a retry of #31065 . Last time, the newly add test cases passed in Jenkins and individually, but it's reverted because they fail when GitHub Action runs with SERIAL_SBT_TESTS=1.

In this PR, SecurityTest tag is used to isolate KeyProvider.

This PR aims to add a basis for columnar encryption test framework by add OrcEncryptionSuite and FakeKeyProvider.

Please note that we will improve more in both Apache Spark and Apache ORC in Apache Spark 3.2.0 timeframe.

Why are the changes needed?

Apache ORC 1.6 supports columnar encryption.

Does this PR introduce any user-facing change?

No. This is for a test case.

How was this patch tested?

Pass the newly added test suite.

This PR aims to add a basis for columnar encryption test framework by add `OrcEncryptionSuite` and `FakeKeyProvider`.

Please note that we will improve more in both Apache Spark and Apache ORC in Apache Spark 3.2.0 timeframe.

Apache ORC 1.6 supports columnar encryption.

No. This is for a test case.

Pass the newly added test suite.

Closes #31065 from dongjoon-hyun/SPARK-34029.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member Author

cc @maropu and @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Feb 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39897/

@SparkQA
Copy link

SparkQA commented Feb 21, 2021

Test build #135317 has finished for PR 31603 at commit 203c212.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39897/

@dongjoon-hyun
Copy link
Member Author

Could you review this please, @viirya ?

|)
|""".stripMargin)
sql("INSERT INTO encrypted VALUES('123456789', '[email protected]', 'Dongjoon Hyun')")
checkAnswer(sql("SELECT * FROM encrypted"), df)
Copy link
Member

@viirya viirya Feb 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, when inserting/reading the table, is it allowed to specify different security options other than the ones in create table?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's possible but it should comply with the original ones. Otherwise, it will read encrypted values like line 96.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just adds build_and_test.yml change, compared with the previous one which was merged before).

lgtm

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @viirya !

@dongjoon-hyun dongjoon-hyun deleted the SPARK-34486-RETRY branch February 21, 2021 23:05
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Feb 22, 2021

I don't have a strong opinion on this. This added 30min GA job to test one case and I would prefer to avoid adding this given that we face GA resource issues globally going on.

Do we know the reason why it needs to run the test in a separate forked JVM to pass? I know its tricky to investigate such case but it might be worthwhile taking a look to avoid adding another GA job.

Otherwise I would name the tag to something like DedicatedJVM and add some more tests with this tag when we face such issue next time.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Feb 22, 2021

Hi, @HyukjinKwon . This is a security feature and Apache ORC CryptoUtils creates the key provider from its Singleton Hadoop Shims Factory. If the other non-secured ORC code creates this without a proper configuration first, it will get NullKeyProvider and affects all the subsequent test cases.

Do we know the reason why it needs to run the test in a separate forked JVM to pass?

Tag DedicatedJVM sounds better to me, too. I'll make a follow-up.

DedicatedJVM

@dongjoon-hyun
Copy link
Member Author

BTW, I considered to put all security tests into this SecurityTest tag, but DedicatedJVM sounds useful because we are ignoring all dedicated JVM tests currently in GitHub Action.

@dongjoon-hyun
Copy link
Member Author

BTW, I'll create a new JIRA issue for DedicatedJVMTest.

@HyukjinKwon
Copy link
Member

Sure thanks @dongjoon-hyun for addressing my comment!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants