-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28260][SQL] Add CLOSED state to ExecutionState #25062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #107295 has finished for PR 25062 at commit
|
| import org.apache.spark.sql.{DataFrame, Row => SparkRow, SQLContext} | ||
| import org.apache.spark.sql.execution.HiveResult | ||
| import org.apache.spark.sql.execution.command.SetCommand | ||
| import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.listener |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: I would avoid importing the listener field directly, as it creates extra changes, and I think it's not common practice in Spark to import field from within an object (except for dsl, like, org.apache.spark.functions).
| } | ||
|
|
||
| def onOperationClosed(id: String): Unit = synchronized { | ||
| executionList(id).finishTimestamp = System.currentTimeMillis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a separate field closedTimestamp, and add a separate column in the tables in the UI (overall and within session). Both the time it finished execution, and was closed are interesting information to show - a long time between finished and closed shows that it spent a lot of time returning the result.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just add Fetch result time (Fetch result time = Close Time - Finish Time):
| Start Time | Finish Time | Duration | Fetch result time | Statement | State |
|---|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote for the former - add Close time and Execution time and Duration.
Fetch result time is a bit of a long label, and also I think it might be slightly misleading - this may also be the client being slow in processing the results, just sitting on an idle cursor, or actually closing it without fetching all results etc.
Having Duration for the total time gives all the same information and I think it's also more relevant, to have the time from start to close.
|
Test build #107347 has finished for PR 25062 at commit
|
juliuszsompolski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
|
cc @gatorsmile , @hvanhovell |
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks! Merged to master.
|
It might be misleading to end users. Could you create a doc for Web UI? See the JIRA https://issues.apache.org/jira/browse/SPARK-28373 |
## What changes were proposed in this pull request? The `ThriftServerTab` displays a FINISHED state when the operation finishes execution, but quite often it still takes a lot of time to fetch the results. OperationState has state CLOSED for when after the iterator is closed. This PR add CLOSED state to ExecutionState, and override the `close()` in SparkExecuteStatementOperation, SparkGetColumnsOperation, SparkGetSchemasOperation and SparkGetTablesOperation. ## How was this patch tested? manual tests 1. Add `Thread.sleep(10000)` before [SparkExecuteStatementOperation.scala#L112](https://github.com/apache/spark/blob/b2e7677f4d3d8f47f5f148680af39d38f2b558f0/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L112) 2. Switch to `ThriftServerTab`:  3. After a while:  Closes apache#25062 from wangyum/SPARK-28260. Authored-by: Yuming Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]>

What changes were proposed in this pull request?
The
ThriftServerTabdisplays a FINISHED state when the operation finishes execution, but quite often it still takes a lot of time to fetch the results. OperationState has state CLOSED for when after the iterator is closed. This PR add CLOSED state to ExecutionState, and override theclose()in SparkExecuteStatementOperation, SparkGetColumnsOperation, SparkGetSchemasOperation and SparkGetTablesOperation.How was this patch tested?
manual tests
Thread.sleep(10000)before SparkExecuteStatementOperation.scala#L112ThriftServerTab: