Skip to content

Conversation

@iRakson
Copy link
Contributor

@iRakson iRakson commented Dec 4, 2019

What changes were proposed in this pull request?

Adding support for pagination in streaming tab for completed batch table using existing framework for pagination. Refer PR #26215

Why are the changes needed?

If our streaming job is running for long time and number of batches are huge then out of memory error may come while loading the streaming page. Introducing pagination will solve this problem and also improve the loading time of page. Besides jobs,stages,sql and thrift-server page contains pagination. So it also brings consistency.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Manually Tested.

Before
Screenshot from 2019-12-04 15-44-02

After
Screenshot from 2019-12-15 12-54-48

@iRakson iRakson changed the title [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streamin… [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab Dec 4, 2019
@iRakson
Copy link
Contributor Author

iRakson commented Dec 4, 2019

Please review @srowen

@iRakson
Copy link
Contributor Author

iRakson commented Dec 5, 2019

cc @shahidki31

@shahidki31
Copy link
Contributor

ok to test

@shahidki31
Copy link
Contributor

@iRakson above comments hasn't resolved?

@iRakson
Copy link
Contributor Author

iRakson commented Dec 5, 2019

I will push with all the changes in few minutes.

Copy link
Contributor

@shahidki31 shahidki31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made one pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the tooltip newly added in this PR or was that already there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tooltips were already there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to add sortable_customkey?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Not required. Removed this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CompletedBatchTableRow isn't required. You can get the data for sorting from BatchUIData itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i missed this. I followed the format form other pages. We can use BatchUIData directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after by

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to add these method here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to add these method here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mistakenly added these functions instead of using imports. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shahidki31 all review comments fixed.

Copy link
Contributor

@shahidki31 shahidki31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not doing pagination support for Active Batches table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the space between the line can be removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same code is there in the method createOutputOperationProgressBar? Can't we use that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it newly added? getFirstFailureReason method already exist right? can't we reuse it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Avoid duplication of the methods

@shahidki31
Copy link
Contributor

Jenkins, test this please

@iRakson
Copy link
Contributor Author

iRakson commented Dec 9, 2019

@shahidki31 all the methods that you mentioned are part of abstract class BatchTableBase.

@shahidki31
Copy link
Contributor

shahidki31 commented Dec 9, 2019

If we create a generic class for both Active and Completed tables, can't we avoid repetition of the same methods?. I think, I have done similar in AllExecutionPage.scala.

@iRakson
Copy link
Contributor Author

iRakson commented Dec 9, 2019

At this moment i have not added pagination support for active batches. Now i will add that in this PR only. This will help us preventing code duplication as well.

@iRakson iRakson force-pushed the streaming_pagination branch from 796293f to 84a5e77 Compare December 14, 2019 08:29
@iRakson
Copy link
Contributor Author

iRakson commented Dec 15, 2019

I have added pagination support for completed batch table as well as active batch table in streaming tab.
For both the tables there is one common class which will be used for creating both the tables. This eliminates the duplication of code.

@iRakson
Copy link
Contributor Author

iRakson commented Dec 15, 2019

cc @srowen @shahidki31

@iRakson iRakson changed the title [SPARK-30119][WebUI]Support Pagination for Completed Batch Table in Streaming Tab [SPARK-30119][WebUI]Support Pagination for Batch Tables in Streaming Tab Dec 15, 2019
s"&$pageNumberFormField=$page" +
s"&$streamingBatchTag.sort=$encodedSortColumn" +
s"&$streamingBatchTag.desc=$desc" +
s"&$pageSizeFormField=$pageSize"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to add #tableHeaderId here?

{SparkUIUtils.tooltip("Time taken to process all jobs of a batch", "top")}</th>
override def goButtonFormPath: String = {
val encodedSortColumn = URLEncoder.encode(sortColumn, UTF_8.name())
s"$parameterPath&$streamingBatchTag.sort=$encodedSortColumn&$streamingBatchTag.desc=$desc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also it seems we need to add #tableHeaderId ?

s"$parameterPath&$executionTag.sort=$encodedSortColumn&$executionTag.desc=$desc#$tableHeaderId"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the tableHeaderId in the link.

failureReasonForUI, rowspan = 1, includeFirstLineInExpandDetails = false)
}.getOrElse(<td>-</td>)
override def headers: Seq[Node] = {
val completedBatchTableHeaders = Seq("Batch Time", "Records", "Scheduling Delay",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the headers only for completedBatchTables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the tables are identical i.e. the schema is same for both. So headers will remain same for both. But yeah, completedBatchTableHeaders is misleading. So, i will update the variable name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the attached screenshot, I can see Total Delay isn't there in the ActiveBatches table and Status field isn't there in the CompletedBatches table?

val waitingBatches = listener.waitingBatches.sortBy(_.batchTime.milliseconds).reverse
val completedBatches = listener.retainedCompletedBatches.
sortBy(_.batchTime.milliseconds).reverse
val activeBatchData = waitingBatches ++ runningBatches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply append the two table data? Could you please check the output of the pages before and after the PR, which contains both runningBatches and waitingBatches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, running batches and waiting batches were shown in the same table (Active Batches Table) too.
To ensure that property only i appended the data of both running batches and waiting batches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we can simply append the table. Please refer earlier code. To check, if there is no change, could you please attach screenshot for that (before and after PR)

override protected def renderRows: Seq[Node] = {
// The "batchTime"s of "waitingBatches" must be greater than "runningBatches"'s, so display
// waiting batches before running batches
waitingBatches.flatMap(batch => <tr>{waitingBatchRow(batch)}</tr>) ++
runningBatches.flatMap(batch => <tr>{runningBatchRow(batch)}</tr>)
}

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 26, 2020
@github-actions github-actions bot closed this Mar 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants