Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Jul 6, 2019

What changes were proposed in this pull request?

The ThriftServerTab displays a FINISHED state when the operation finishes execution, but quite often it still takes a lot of time to fetch the results. OperationState has state CLOSED for when after the iterator is closed. This PR add CLOSED state to ExecutionState, and override the close() in SparkExecuteStatementOperation, SparkGetColumnsOperation, SparkGetSchemasOperation and SparkGetTablesOperation.

How was this patch tested?

manual tests

  1. Add Thread.sleep(10000) before SparkExecuteStatementOperation.scala#L112
  2. Switch to ThriftServerTab:
    image
  3. After a while:
    image

@SparkQA
Copy link

SparkQA commented Jul 6, 2019

Test build #107295 has finished for PR 25062 at commit e9b8520.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Jul 6, 2019

cc @juliuszsompolski

import org.apache.spark.sql.{DataFrame, Row => SparkRow, SQLContext}
import org.apache.spark.sql.execution.HiveResult
import org.apache.spark.sql.execution.command.SetCommand
import org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.listener
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: I would avoid importing the listener field directly, as it creates extra changes, and I think it's not common practice in Spark to import field from within an object (except for dsl, like, org.apache.spark.functions).

}

def onOperationClosed(id: String): Unit = synchronized {
executionList(id).finishTimestamp = System.currentTimeMillis
Copy link
Contributor

@juliuszsompolski juliuszsompolski Jul 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a separate field closedTimestamp, and add a separate column in the tables in the UI (overall and within session). Both the time it finished execution, and was closed are interesting information to show - a long time between finished and closed shows that it spent a lot of time returning the result.
What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?
image
Execution Time = Finish Time - Start Time
Duration = Close Time - Start Time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just add Fetch result time (Fetch result time = Close Time - Finish Time):

Start Time Finish Time Duration Fetch result time Statement State

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for the former - add Close time and Execution time and Duration.
Fetch result time is a bit of a long label, and also I think it might be slightly misleading - this may also be the client being slow in processing the results, just sitting on an idle cursor, or actually closing it without fetching all results etc.
Having Duration for the total time gives all the same information and I think it's also more relevant, to have the time from start to close.

@SparkQA
Copy link

SparkQA commented Jul 8, 2019

Test build #107347 has finished for PR 25062 at commit 7a4f416.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@juliuszsompolski
Copy link
Contributor

cc @gatorsmile , @hvanhovell

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks! Merged to master.

@gatorsmile
Copy link
Member

It might be misleading to end users. Could you create a doc for Web UI? See the JIRA https://issues.apache.org/jira/browse/SPARK-28373

vinodkc pushed a commit to vinodkc/spark that referenced this pull request Jul 18, 2019
## What changes were proposed in this pull request?

The `ThriftServerTab` displays a FINISHED state when the operation finishes execution, but quite often it still takes a lot of time to fetch the results. OperationState has state CLOSED for when after the iterator is closed. This PR add CLOSED state to ExecutionState, and override the `close()` in SparkExecuteStatementOperation, SparkGetColumnsOperation, SparkGetSchemasOperation and SparkGetTablesOperation.

## How was this patch tested?

manual tests
1. Add `Thread.sleep(10000)` before [SparkExecuteStatementOperation.scala#L112](https://github.com/apache/spark/blob/b2e7677f4d3d8f47f5f148680af39d38f2b558f0/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L112)
2. Switch to `ThriftServerTab`:
![image](https://user-images.githubusercontent.com/5399861/60809590-9dcf2500-a1bd-11e9-826e-33729bb97daf.png)
3. After a while:
![image](https://user-images.githubusercontent.com/5399861/60809719-e850a180-a1bd-11e9-9a6a-546146e626ab.png)

Closes apache#25062 from wangyum/SPARK-28260.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
@wangyum wangyum deleted the SPARK-28260 branch July 26, 2019 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants