Skip to content

Comments

[SPARK-31081][UI][SQL] Make display of stageId/stageAttemptId/taskId of sql metrics toggleable#27927

Closed
sarutak wants to merge 6 commits intoapache:masterfrom
sarutak:SPARK-31081
Closed

[SPARK-31081][UI][SQL] Make display of stageId/stageAttemptId/taskId of sql metrics toggleable#27927
sarutak wants to merge 6 commits intoapache:masterfrom
sarutak:SPARK-31081

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Mar 16, 2020

What changes were proposed in this pull request?

This is another solution for SPARK-31081 and #27849 .
I added a checkbox which can toggle display of stageId/taskid in the SQL's DAG page.
Mainly, I implemented the toggleable texts in boxes with HTML label feature provided by dagre-d3.
The additional metrics are enclosed by <span> and control the appearance of the text.
But the exception is additional metrics in clusters.
We can use HTML label for cluster but layout will be broken so I choosed normal text label for clusters.
Due to that, this solution contains a little bit tricky code inspark-sql-viz.js to manipulate the metric texts and generate DOMs.

Why are the changes needed?

It makes metrics harder to read after #26843 and user may not interest in extra info(stageId/StageAttemptId/taskId ) when they do not need debug.
#27849 control the appearance by a new configuration property but providing a checkbox is more flexible.

Does this PR introduce any user-facing change?

Yes.
[Additional metrics shown]
with-checked

[Additional metrics hidden]
without-chedked

How was this patch tested?

Tested manually with a simple DataFrame operation.

  • The appearance of additional metrics in the boxes are controlled by the newly added checkbox.
  • No error found with JS-debugger.
  • Checked/not-checked state is preserved after reloading.

@SparkQA
Copy link

SparkQA commented Mar 16, 2020

Test build #119879 has finished for PR 27927 at commit ecbcd54.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 16, 2020

Test build #119890 has finished for PR 27927 at commit e54ac44.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119897 has finished for PR 27927 at commit 6c9814a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

});
var isChecked = window.localStorage.getItem("stageId-and-taskId-checked") == "true";
$("#stageId-and-taskId-checkbox").prop("checked", isChecked);
$(".stageId-and-taskId-metrics").hide();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line seems unneeded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll remove it.

}

// labelSeparator should be a non-graphical character in order not to affect the width of boxes.
var labelSeparator = "\x01";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain more about the labelSeparator as "\x01" here? It seem that the label of cluster will contain the <span class='stageId-and-taskId-metrics'> as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seem that the label of cluster will contain the as well

Yes, but we can't add <span class='stageId-and-and-taskId-metrics'> here because the label type of cluster is not html.
I tried to make the label type html but I found that the DAG had broken layout.

The width of boxes are determined based on the length of text by dagre-d3.
So if the separator character has width (graphical), the width of boxes are slightly widened against the original texts.

</div>
<div>
<input type="checkbox" id="stageId-and-taskId-checkbox"></input>
<span>Show Stage/Task IDs</span>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

Show Stage/Task IDs with the max metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just have this be the entire format:
Show the Stage (Stage Attempt): Task ID that corresponds to the max metric

@gengliangwang
Copy link
Member

gengliangwang commented Mar 18, 2020

@sarutak Thanks for improving this!!

cc @tgravescs @Ngone51 as well
BTW, I try the PR on my local setup. The checkbox size is normal, not like the one the PR description..
image

@tgravescs
Copy link
Contributor

@sarutak thanks for picking this up.

Copy link
Member

@Ngone51 Ngone51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this part well but only a minor nit. Overall, it looks good and cool from the screenshot!

Comment on lines 225 to 229
if (isChecked) {
additionalMetrics.show();
} else {
additionalMetrics.hide();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll fix it.

@sarutak
Copy link
Member Author

sarutak commented Mar 22, 2020

BTW, I try the PR on my local setup. The checkbox size is normal, not like the one the PR description..

Maybe, it depends on OS. I use Pop!_OS which is based on Ubuntu.

@sarutak
Copy link
Member Author

sarutak commented Mar 22, 2020

I've reflected comments and modified the captures in the description.

@SparkQA
Copy link

SparkQA commented Mar 22, 2020

Test build #120155 has finished for PR 27927 at commit 22384e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 22, 2020

Test build #120157 has finished for PR 27927 at commit 377fb49.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gengliangwang
Copy link
Member

Thanks, merging to master/3.0

gengliangwang pushed a commit that referenced this pull request Mar 24, 2020
…of sql metrics toggleable

### What changes were proposed in this pull request?

This is another solution for `SPARK-31081` and #27849 .
I added a checkbox which can toggle display of stageId/taskid in the SQL's DAG page.
Mainly, I implemented the toggleable texts in boxes with HTML label feature provided by `dagre-d3`.
The additional metrics are enclosed by `<span>` and control the appearance of the text.
But the exception is additional metrics in clusters.
We can use HTML label for cluster but layout will be broken so I choosed normal text label for clusters.
Due to that, this solution contains a little bit tricky code in`spark-sql-viz.js` to manipulate the metric texts and generate DOMs.

### Why are the changes needed?

It makes metrics harder to read after #26843 and user may not interest in extra info(stageId/StageAttemptId/taskId ) when they do not need debug.
#27849 control the appearance by a new configuration property but providing a checkbox is more flexible.

### Does this PR introduce any user-facing change?

Yes.
[Additional metrics shown]
![with-checked](https://user-images.githubusercontent.com/4736016/77244214-0f6cd780-6c56-11ea-9275-a30758dd5339.png)

[Additional metrics hidden]
![without-chedked](https://user-images.githubusercontent.com/4736016/77244219-14ca2200-6c56-11ea-9874-33a466085fce.png)

### How was this patch tested?

Tested manually with a simple DataFrame operation.
* The appearance of additional metrics in the boxes are controlled by the newly added checkbox.
* No error found with JS-debugger.
* Checked/not-checked state is preserved after reloading.

Closes #27927 from sarutak/SPARK-31081.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
(cherry picked from commit 999c9ed)
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>

// labelSeparator should be a non-graphical character in order not to affect the width of boxes.
var labelSeparator = "\x01";
var stageAndTaskMetricsPattern = "^(.*)(\\(stage.*attempt.*task[^)]*\\))(.*)$";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but a minor comment: now we display task info like stage 1 (attempt 2): task 3. A more standard (and simpler) way to refer to a stage is stage 1.2: task 3.

@sarutak can you help fix it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll fix it.

cloud-fan pushed a commit that referenced this pull request Mar 27, 2020
…StageId

### What changes were proposed in this pull request?

In ExecutionPage, metrics format for stageId, attemptId and taskId are displayed like `(stageId (attemptId): taskId)` for now.
I changed this format like `(stageId.attemptId taskId)`.

### Why are the changes needed?

As cloud-fan suggested  [here](#27927 (comment)), `stageId.attemptId` is more standard in Spark.

### Does this PR introduce any user-facing change?

Yes. Before applying this change, we can see the UI like as follows.
![with-checked](https://user-images.githubusercontent.com/4736016/77682421-42a6c200-6fda-11ea-92e4-e9f4554adb71.png)

And after this change applied, we can like as follows.
![fix-merics-format-with-checked](https://user-images.githubusercontent.com/4736016/77682493-61a55400-6fda-11ea-801f-91a67da698fd.png)

### How was this patch tested?

Modified `SQLMetricsSuite` and manual test.

Closes #28039 from sarutak/improve-metrics-format.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Mar 27, 2020
…StageId

### What changes were proposed in this pull request?

In ExecutionPage, metrics format for stageId, attemptId and taskId are displayed like `(stageId (attemptId): taskId)` for now.
I changed this format like `(stageId.attemptId taskId)`.

### Why are the changes needed?

As cloud-fan suggested  [here](#27927 (comment)), `stageId.attemptId` is more standard in Spark.

### Does this PR introduce any user-facing change?

Yes. Before applying this change, we can see the UI like as follows.
![with-checked](https://user-images.githubusercontent.com/4736016/77682421-42a6c200-6fda-11ea-92e4-e9f4554adb71.png)

And after this change applied, we can like as follows.
![fix-merics-format-with-checked](https://user-images.githubusercontent.com/4736016/77682493-61a55400-6fda-11ea-801f-91a67da698fd.png)

### How was this patch tested?

Modified `SQLMetricsSuite` and manual test.

Closes #28039 from sarutak/improve-metrics-format.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit bc37fdc)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…of sql metrics toggleable

### What changes were proposed in this pull request?

This is another solution for `SPARK-31081` and apache#27849 .
I added a checkbox which can toggle display of stageId/taskid in the SQL's DAG page.
Mainly, I implemented the toggleable texts in boxes with HTML label feature provided by `dagre-d3`.
The additional metrics are enclosed by `<span>` and control the appearance of the text.
But the exception is additional metrics in clusters.
We can use HTML label for cluster but layout will be broken so I choosed normal text label for clusters.
Due to that, this solution contains a little bit tricky code in`spark-sql-viz.js` to manipulate the metric texts and generate DOMs.

### Why are the changes needed?

It makes metrics harder to read after apache#26843 and user may not interest in extra info(stageId/StageAttemptId/taskId ) when they do not need debug.
apache#27849 control the appearance by a new configuration property but providing a checkbox is more flexible.

### Does this PR introduce any user-facing change?

Yes.
[Additional metrics shown]
![with-checked](https://user-images.githubusercontent.com/4736016/77244214-0f6cd780-6c56-11ea-9275-a30758dd5339.png)

[Additional metrics hidden]
![without-chedked](https://user-images.githubusercontent.com/4736016/77244219-14ca2200-6c56-11ea-9874-33a466085fce.png)

### How was this patch tested?

Tested manually with a simple DataFrame operation.
* The appearance of additional metrics in the boxes are controlled by the newly added checkbox.
* No error found with JS-debugger.
* Checked/not-checked state is preserved after reloading.

Closes apache#27927 from sarutak/SPARK-31081.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…StageId

### What changes were proposed in this pull request?

In ExecutionPage, metrics format for stageId, attemptId and taskId are displayed like `(stageId (attemptId): taskId)` for now.
I changed this format like `(stageId.attemptId taskId)`.

### Why are the changes needed?

As cloud-fan suggested  [here](apache#27927 (comment)), `stageId.attemptId` is more standard in Spark.

### Does this PR introduce any user-facing change?

Yes. Before applying this change, we can see the UI like as follows.
![with-checked](https://user-images.githubusercontent.com/4736016/77682421-42a6c200-6fda-11ea-92e4-e9f4554adb71.png)

And after this change applied, we can like as follows.
![fix-merics-format-with-checked](https://user-images.githubusercontent.com/4736016/77682493-61a55400-6fda-11ea-801f-91a67da698fd.png)

### How was this patch tested?

Modified `SQLMetricsSuite` and manual test.

Closes apache#28039 from sarutak/improve-metrics-format.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@sarutak sarutak deleted the SPARK-31081 branch June 4, 2021 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants