Skip to content

Conversation

@yhuai
Copy link
Contributor

@yhuai yhuai commented Apr 20, 2016

What changes were proposed in this pull request?

This PR has two main changes.

  1. Move Hive-specific methods from HiveContext to HiveSessionState, which help the work of removing HiveContext.
  2. Create a SparkSession Class, which will later be the entry point of Spark SQL users.

How was this patch tested?

Existing tests

This PR is trying to fix test failures of #12485.

Andrew Or added 30 commits April 18, 2016 10:45
This requires changing all the downstream places that take in
HiveContext and replacing that with (SQLContext, HiveSessionState).
Now both shared state and session state is tracked in SparkSession
and we use reflection to instantiate them. After this commit
SQLContext and HiveContext are just wrappers for SparkSession.
Previously we still tried to load HiveContext even if the user
explicitly specified an "in-memory" catalog impelmentation. Now
it will load a SQLContext in this case.
It was failing because we were passing in a subclass of
SparkContext into SparkSession, and the reflection was using
the wrong class to get the constructor. This is now fixed with
ClassTags.
Avoid some unnecessary casts.
The problem was that we weren't using the right QueryExecution
when we called TestHive.sessionState.executePlan. We were using
HiveQueryExecution instead of the custom one that we created
in TestHiveContext.

This turned out to be very difficult to fix due to the tight
coupling of QueryExecution within TestHiveContext. I had to
refactor this code significantly to extract the nested logic
one by one.
The problem was that we were getting everything from
executionHive's hiveconf and setting that in metadataHive,
overriding the value of `hive.metastore.warehouse.dir`,
which we customize in TestHive. This resulted in a bunch
of "Table src does not exist" errors from Hive.
@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56340 has finished for PR 12522 at commit af42981.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #2835 has finished for PR 12522 at commit af42981.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #2836 has finished for PR 12522 at commit af42981.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…use SQLContext. So, let's change the catalog conf's default value to in-memory. In the constructor of HiveContext, we will set this conf to hive.
@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56376 has finished for PR 12522 at commit 863976c.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Apr 20, 2016

test this please

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56381 has finished for PR 12522 at commit 863976c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 20, 2016

Merging this in master.

@asfgit asfgit closed this in 8fc267a Apr 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants