Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Sep 5, 2019

What changes were proposed in this pull request?

This patch proposes to add new tests to test the username of HiveClient to prevent changing the semantic unintentionally. The owner of Hive table has been changed back-and-forth, principal -> username -> principal, and looks like the change is not intentional. (Please refer SPARK-28996 for more details.) This patch intends to prevent this.

This patch also renames previous HiveClientSuite(s) to HivePartitionFilteringSuite(s) as it was commented as TODO, as well as previous tests are too narrowed to test only partition filtering.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Newly added UTs.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Sep 5, 2019

Looks like renaming is not recognized though git status showed it correctly. (Shouldn't I reuse the name of HiveClientSuite?) I'll try to play with this if someone concerns about the excessive lines of change.

@SparkQA
Copy link

SparkQA commented Sep 5, 2019

Test build #110182 has finished for PR 25696 at commit 8c1efd3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -0,0 +1,341 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a renaming, though Github doesn't seem to recognize it.

@@ -0,0 +1,29 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same: This is just a renaming, though Github doesn't seem to recognize it.

@HeartSaVioR
Copy link
Contributor Author

cc. @cloud-fan @gatorsmile @dongjoon-hyun

I'm also asking to review #23952 in pair, as I feel #23952 is the right way but that should have test. I'm OK to rebase this to reflect #23952 if we would like to merge #23952 before this.

@SparkQA
Copy link

SparkQA commented Sep 6, 2019

Test build #110215 has finished for PR 25696 at commit 0bdc71a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class HiveClientUserNameSuite(version: String)
  • class HiveClientUserNameSuites extends Suite with HiveClientVersions

@squito
Copy link
Contributor

squito commented Sep 9, 2019

shouldn't this and #23952 be put together and merged together as one PR?

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Sep 9, 2019

There're some reasons to make this be separated PR:

  1. I suggested adding UT in [SPARK-26929][SQL]fix table owner use user instead of principal when create table through spark-sql or beeline #23952 before, and the author didn't follow the suggestion so I had to put my own effort to deal with this. If I needed to raise the PR against author's fork, I should also persuade to merge it to reflect the PR, which seems to be unnecessary effort (and in my personal experience it has been progressed slowly on coordinating with other contributors... I've been also waiting for other contributor in [WIP][CORE][SPARK-28867] InMemoryStore checkpoint to speed up replay log file in HistoryServer #25577 to coordinate). If this patch got merged before, [SPARK-26929][SQL]fix table owner use user instead of principal when create table through spark-sql or beeline #23952 will make new tests be broken so [SPARK-26929][SQL]fix table owner use user instead of principal when create table through spark-sql or beeline #23952 is forced to apply the change. If this patch will be planned to be merged after [SPARK-26929][SQL]fix table owner use user instead of principal when create table through spark-sql or beeline #23952, I'm open to apply the change.
  2. This patch can be independent with [SPARK-26929][SQL]fix table owner use user instead of principal when create table through spark-sql or beeline #23952, as the new tests are passing with current behavior. This patch is still valuable without [SPARK-26929][SQL]fix table owner use user instead of principal when create table through spark-sql or beeline #23952 as we should have to have tests to prevent changing behavior unintentionally. We didn't notice the behavioral change in latest change of SPARK-22846 - we thought it fixed NPE but it also changed the behavior. This is a non-trivial issue.
  3. Minor, but authorship issue. IMHO I'm not sure who is the right main author of merged PR, as I expect there's similar amount of effort being put between [SPARK-26929][SQL]fix table owner use user instead of principal when create table through spark-sql or beeline #23952 and this.

"[email protected]", ugi, Array.empty)
proxyUgi.doAs(new PrivilegedExceptionAction[Unit]() {
override def run(): Unit = {
assert(proxyUgi.getUserName === getUserNameFromHiveClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems wrong for that the bug is about. proxyUgi.getUserName will return [email protected], and isn't the bug about it needing to be only proxyprincipal to match Hive's behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before SPARK-26929 it was "[email protected]" - as test passed. So setting HADOOP_USER_NAME and proxy user being set provided different results. I couldn't add test for HADOOP_USER_NAME as artificially setting system env. is hard.

Anyway SPARK-26929 removes the case and makes tests here failing (intended). I'll reflect the change.

@HeartSaVioR HeartSaVioR changed the title [SPARK-28996][SQL] Add tests regarding username of HiveClient [SPARK-28996][SQL][TESTS] Add tests regarding username of HiveClient Sep 16, 2019
@SparkQA
Copy link

SparkQA commented Sep 16, 2019

Test build #110663 has finished for PR 25696 at commit dfca82b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

Reflected the change of SPARK-26929 and test passed. Could we have a next round of review?

}

private val userName = UserGroupInformation.getCurrentUser.getShortUserName
override val userName = UserGroupInformation.getCurrentUser.getShortUserName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change user name at runtime? If we can then it should be def instead of val.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure, but we have been defining userName as val from Spark 2.2.0, starting from this commit 344f38b

Changing it to def is safer anyway (can handle both cases - whether user name is changed or not) so please let me know if we would like to make change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, that would be better to be handled via separate PR if necessary, to isolate two different things "adding test" and "fixing logic".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, we can investigate it in a separated pr

import org.apache.spark.util.Utils

class HiveClientUserNameSuite(version: String)
extends HiveVersionSuite(version) with BeforeAndAfterAll {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BeforeAndAfterAll is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding! Updated.

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Sep 17, 2019

Test build #110719 has finished for PR 25696 at commit ef220ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class HiveClientUserNameSuite(version: String) extends HiveVersionSuite(version)

@cloud-fan cloud-fan closed this in c862835 Sep 17, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@HeartSaVioR
Copy link
Contributor Author

Thanks all for reviewing and merging!

@HeartSaVioR HeartSaVioR deleted the SPARK-28996 branch September 17, 2019 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants