[SPARK-28260][SQL] Add CLOSED state to ExecutionState #25062

wangyum · 2019-07-06T12:34:31Z

What changes were proposed in this pull request?

The ThriftServerTab displays a FINISHED state when the operation finishes execution, but quite often it still takes a lot of time to fetch the results. OperationState has state CLOSED for when after the iterator is closed. This PR add CLOSED state to ExecutionState, and override the close() in SparkExecuteStatementOperation, SparkGetColumnsOperation, SparkGetSchemasOperation and SparkGetTablesOperation.

How was this patch tested?

manual tests

Add Thread.sleep(10000) before SparkExecuteStatementOperation.scala#L112
Switch to ThriftServerTab:
After a while:

SparkQA · 2019-07-06T12:56:19Z

Test build #107295 has finished for PR 25062 at commit e9b8520.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-07-06T12:57:01Z

cc @juliuszsompolski

juliuszsompolski · 2019-07-08T09:47:25Z

...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala

 import org.apache.spark.sql.{DataFrame, Row => SparkRow, SQLContext}
 import org.apache.spark.sql.execution.HiveResult
 import org.apache.spark.sql.execution.command.SetCommand
+import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.listener


style nit: I would avoid importing the listener field directly, as it creates extra changes, and I think it's not common practice in Spark to import field from within an object (except for dsl, like, org.apache.spark.functions).

juliuszsompolski · 2019-07-08T09:49:08Z

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala

    }

+    def onOperationClosed(id: String): Unit = synchronized {
+      executionList(id).finishTimestamp = System.currentTimeMillis


I would add a separate field closedTimestamp, and add a separate column in the tables in the UI (overall and within session). Both the time it finished execution, and was closed are interesting information to show - a long time between finished and closed shows that it spent a lot of time returning the result.
What do you think?

How about?

Execution Time = Finish Time - Start Time
Duration = Close Time - Start Time

or just add Fetch result time (Fetch result time = Close Time - Finish Time):

Start Time Finish Time Duration Fetch result time Statement State

I would vote for the former - add Close time and Execution time and Duration.
Fetch result time is a bit of a long label, and also I think it might be slightly misleading - this may also be the client being slow in processing the results, just sitting on an idle cursor, or actually closing it without fetching all results etc.
Having Duration for the total time gives all the same information and I think it's also more relevant, to have the time from start to close.

SparkQA · 2019-07-08T12:49:28Z

Test build #107347 has finished for PR 25062 at commit 7a4f416.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

juliuszsompolski

Looks good to me, thanks!

juliuszsompolski · 2019-07-08T13:03:25Z

cc @gatorsmile , @hvanhovell

gatorsmile

LGTM

Thanks! Merged to master.

gatorsmile · 2019-07-12T18:05:19Z

It might be misleading to end users. Could you create a doc for Web UI? See the JIRA https://issues.apache.org/jira/browse/SPARK-28373

## What changes were proposed in this pull request? The `ThriftServerTab` displays a FINISHED state when the operation finishes execution, but quite often it still takes a lot of time to fetch the results. OperationState has state CLOSED for when after the iterator is closed. This PR add CLOSED state to ExecutionState, and override the `close()` in SparkExecuteStatementOperation, SparkGetColumnsOperation, SparkGetSchemasOperation and SparkGetTablesOperation. ## How was this patch tested? manual tests 1. Add `Thread.sleep(10000)` before [SparkExecuteStatementOperation.scala#L112](https://github.com/apache/spark/blob/b2e7677f4d3d8f47f5f148680af39d38f2b558f0/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L112) 2. Switch to `ThriftServerTab`: ![image](https://user-images.githubusercontent.com/5399861/60809590-9dcf2500-a1bd-11e9-826e-33729bb97daf.png) 3. After a while: ![image](https://user-images.githubusercontent.com/5399861/60809719-e850a180-a1bd-11e9-9a6a-546146e626ab.png) Closes apache#25062 from wangyum/SPARK-28260. Authored-by: Yuming Wang <[email protected]> Signed-off-by: gatorsmile <[email protected]>

Add CLOSED state to ExecutionState

e9b8520

dongjoon-hyun added the SQL label Jul 6, 2019

juliuszsompolski reviewed Jul 8, 2019

View reviewed changes

juliuszsompolski mentioned this pull request Jul 8, 2019

[SPARK-28293][SQL] Implement Spark's own GetTableTypesOperation #25073

Closed

Add closeTimestamp

7a4f416

juliuszsompolski approved these changes Jul 8, 2019

View reviewed changes

gatorsmile reviewed Jul 12, 2019

View reviewed changes

gatorsmile closed this in 687dd4e Jul 12, 2019

wangyum deleted the SPARK-28260 branch July 26, 2019 11:00

[SPARK-28260][SQL] Add CLOSED state to ExecutionState #25062

[SPARK-28260][SQL] Add CLOSED state to ExecutionState #25062

Uh oh!

Conversation

wangyum commented Jul 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jul 6, 2019

Uh oh!

wangyum commented Jul 6, 2019

Uh oh!

juliuszsompolski Jul 8, 2019

Choose a reason for hiding this comment

Uh oh!

juliuszsompolski Jul 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangyum Jul 8, 2019

Choose a reason for hiding this comment

Uh oh!

wangyum Jul 8, 2019

Choose a reason for hiding this comment

Uh oh!

wangyum Jul 8, 2019

Choose a reason for hiding this comment

Uh oh!

juliuszsompolski Jul 8, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 8, 2019

Uh oh!

juliuszsompolski left a comment

Choose a reason for hiding this comment

Uh oh!

juliuszsompolski commented Jul 8, 2019

Uh oh!

gatorsmile left a comment

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jul 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wangyum commented Jul 6, 2019 •

edited

Loading

juliuszsompolski Jul 8, 2019 •

edited

Loading