-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14720][SPARK-13643] Remove HiveContext (step 1) #12485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This requires changing all the downstream places that take in HiveContext and replacing that with (SQLContext, HiveSessionState).
Now both shared state and session state is tracked in SparkSession and we use reflection to instantiate them. After this commit SQLContext and HiveContext are just wrappers for SparkSession.
8255969 to
6019541
Compare
| def setConf(props: Properties): Unit = sessionState.setConf(props) | ||
|
|
||
| /** Set the given Spark SQL configuration property. */ | ||
| private[sql] def setConf[T](entry: ConfigEntry[T], value: T): Unit = conf.setConf(entry, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we also need to change this?
2ecc444 to
75d1115
Compare
|
Test build #56166 has finished for PR 12485 at commit
|
|
Test build #56170 has finished for PR 12485 at commit
|
|
Test build #56174 has finished for PR 12485 at commit
|
Previously we still tried to load HiveContext even if the user explicitly specified an "in-memory" catalog impelmentation. Now it will load a SQLContext in this case.
It was failing because we were passing in a subclass of SparkContext into SparkSession, and the reflection was using the wrong class to get the constructor. This is now fixed with ClassTags.
575fd47 to
bc35206
Compare
Avoid some unnecessary casts.
|
Test build #56254 has finished for PR 12485 at commit
|
| */ | ||
| private def reflect[T, Arg <: AnyRef]( | ||
| className: String, | ||
| ctorArg: Arg)(implicit ctorArgTag: ClassTag[Arg]): T = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need a class tag. You can call ctorArg.getClass().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't work because there are places where we pass subclasses of Arg in here (see SQLExecutionSuite)
The problem was that we weren't using the right QueryExecution when we called TestHive.sessionState.executePlan. We were using HiveQueryExecution instead of the custom one that we created in TestHiveContext. This turned out to be very difficult to fix due to the tight coupling of QueryExecution within TestHiveContext. I had to refactor this code significantly to extract the nested logic one by one.
|
Test build #56276 has finished for PR 12485 at commit
|
|
Test build #56273 has finished for PR 12485 at commit
|
|
Test build #56278 has finished for PR 12485 at commit
|
The problem was that we were getting everything from executionHive's hiveconf and setting that in metadataHive, overriding the value of `hive.metastore.warehouse.dir`, which we customize in TestHive. This resulted in a bunch of "Table src does not exist" errors from Hive.
|
Test build #56303 has finished for PR 12485 at commit
|
|
Test build #56305 has finished for PR 12485 at commit
|
|
Hey @andrewor14 - I took a quick look at this. Shouldn't we move the sessionstate stuff in its own pr since it would be easier to get that in and also easier to review? |
It may take time to track all places where we only use SQLContext. So, let's change the catalog conf's default value to in-memory. In the constructor of HiveContext, we will set this conf to hive.
|
Test build #56374 has finished for PR 12485 at commit
|
|
Test build #56382 has finished for PR 12485 at commit
|
…nState and Create a SparkSession class ## What changes were proposed in this pull request? This PR has two main changes. 1. Move Hive-specific methods from HiveContext to HiveSessionState, which help the work of removing HiveContext. 2. Create a SparkSession Class, which will later be the entry point of Spark SQL users. ## How was this patch tested? Existing tests This PR is trying to fix test failures of apache#12485. Author: Andrew Or <[email protected]> Author: Yin Huai <[email protected]> Closes apache#12522 from yhuai/spark-session.
## What changes were proposed in this pull request? This removes the class `HiveContext` itself along with all code usages associated with it. The bulk of the work was already done in #12485. This is mainly just code cleanup and actually removing the class. Note: A couple of things will break after this patch. These will be fixed separately. - the python HiveContext - all the documentation / comments referencing HiveContext - there will be no more HiveContext in the REPL (fixed by #12589) ## How was this patch tested? No change in functionality. Author: Andrew Or <[email protected]> Closes #12585 from andrewor14/delete-hive-context.
What changes were proposed in this pull request?
In Spark 2.0 we will have a new entry point for users known as the
SparkSession. This class will handle the lazy initialization of the Hive metastore if the user runs commands that require interaction with the metastore (e.g.CREATE TABLE). With this, we can remove theHiveContext, which is an odd API to be exposed to Spark users.This patch doesn't fully remove
HiveContextbut does most of the work. A follow-up patch will actually delete the file itself. I've left all the variable naming and any further refactor for later.How was this patch tested?
Jenkins.