-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46575][SQL][HIVE] Make HiveThriftServer2.startWithContext DevelopApi retriable and fix flakiness of ThriftServerWithSparkContextInHttpSuite #44575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SharedThriftServer.scala
Outdated
Show resolved
Hide resolved
|
@dongjoon-hyun @LuciferYang @cloud-fan, PTAL, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it a little weird to have a configuration to fix flakiness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out, @dongjoon-hyun. I completely agree with you that creating a new configuration for a test fix would be excessive. However, in this case, we can address both the DevelopAPI and test side issue as a typical use case simultaneously, This way, the solution would not be limited to just fixing the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we want it to be false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, when we retry the thrift server on the same sc. If this is true, sc will be stopped. And weird things follow in the next retries
[info] org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite *** ABORTED *** (151 milliseconds)
[info] java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
[info] This stopped SparkContext was created at:
[info]
[info] org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInHttpSuite.beforeAll(ThriftServerWithSparkContextSuite.scala:279)
[info] org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
[info] org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info] org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info] org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:69)
[info] org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info] org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info] sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[info] java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[info] java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info] java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info] java.base/java.lang.Thread.run(Thread.java:840)
[info]
[info] The currently active SparkContext was created at:
[info]
[info] (No active SparkContext.)
[info] at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:122)
[info] at org.apache.spark.sql.SparkSession.<init>(SparkSession.scala:115)
[info] at org.apache.spark.sql.SparkSession.newSession(SparkSession.scala:274)
[info] at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.startThriftServer(SharedThriftServer.scala:130)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about non-thriftserver? Maybe we should clarify what is "retry", then people can understand when to set this config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about non-thriftserver?
As we have a shutdown hook to perform self-terminating in SparkContext, for user self-contained apps called System.exit unexpectedly or intentionally, the context runs out.
The configuration controls the part for the thrift server binding to the SparkContext which shouldn't be affected by the shutdown hook. In other words, it does not affect non-thriftserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will anyone set this config in production? It looks fishy to me to add a new config to fix flaky tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will anyone set this config in production?
This question is difficult to answer. But, it can be used in production when using startWithContext together.
It looks fishy to me to add a new config to fix flaky tests.
Or, we can have another try to modify startWithContext directly to add a new bool parameter that works as same as the new config. But because the spark thrift-server is initialized by HiveConf instance, we still need a new key to store this bool value in order to pass it into the ThriftCLIService. Otherwise, we might need extra work to refactor the thrift-server bootstraping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still need a new key
I think a new local key that works like a function parameter is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flu hit me. I'm sorry for not getting back to you sooner. The comments are addressed.
|
|
||
| try { | ||
| hiveServer2 = HiveThriftServer2.startWithContext(sqlContext) | ||
| hiveServer2 = HiveThriftServer2.startWithContext(sqlContext, exitOnError = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add some comments to explain why we don't want to exit on error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Please check whether the comments added are informative or not.
| serverPort = t.getPortNumber | ||
| if (t.isInstanceOf[ThriftBinaryCLIService] && mode == ServerMode.http) { | ||
| logError("A previous Hive's SessionState is leaked, aborting this retry") | ||
| throw new IllegalStateException("HiveThriftServer2 started in binary mode " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we use SparkException.internalError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
|
Thank you for the review, @cloud-fan & @dongjoon-hyun. Merged to master. |
|
Thank you, @yaooqinn and all. |
…text(SQLContext)` method for compatibility ### What changes were proposed in this pull request? #44575 added a default parameter to `HiveThriftServer2.startWithContext` API. Although this is source-compatible, this is not binary-compatible - for eg. a binary built with new code won't be able to run with the old code. In this PR, we maintain forward and backward compatibility by keeping the API same and introducing a separate API for the additional parameter. ### Why are the changes needed? See above ### Does this PR introduce _any_ user-facing change? only developer API change ### How was this patch tested? Existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45727 from dragqueen95/thriftserver-back-compat. Lead-authored-by: Saksham Garg <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR adds an new param to HiveThriftServer2.startWithContext
to tell theThriftCLIServices whether to callSystem exitor not when encountering errors. When developers callHiveThriftServer2.startWithContextand if an error occurs,System exitwill be performed, stop the existingSqlContext/SparkContext`, and crash the user app.There is also such a use case in our tests. We intended to retry starting a thrift server three times in total but it might stop the underlying SparkContext early and fail the rest.
For example
https://github.com/apache/spark/actions/runs/7271496487/job/19812142981
Why are the changes needed?
HiveThriftServer2.startWithContextDoes this PR introduce any user-facing change?
no, developer API change and the default behavior is AS-IS.
How was this patch tested?
Verified ThriftServerWithSparkContextInHttpSuite locally
Was this patch authored or co-authored using generative AI tooling?
no