Skip to content

Conversation

@fjh100456
Copy link
Contributor

What changes were proposed in this pull request?

Modified succeeded num in job detail page from "completed = stageData.completedIndices.size" to "completed = stageData.numCompleteTasks",which making succeeded tasks num in all jobs page and job detail page look more consistent, and more easily to find which stages the speculative task(s) were in.

How was this patch tested?

manual tests

…and job detail page on spark web ui when speculative task(s) exist.

## What changes were proposed in this pull request?

Modified succeeded num in job detail page from "completed = stageData.completedIndices.size" to "completed = stageData.numCompleteTasks",which making succeeded tasks num in all jobs page and job detail page look more consistent, and more easily to find which stages the speculative task(s) were in.

## How was this patch tested?

manual tests
@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #3706 has finished for PR 17923 at commit 78b0f97.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ajbozarth
Copy link
Member

This is a rollback of an old Spark 1.1 blocker by @pwendell SPARK-3020. it seems this was intentionally done this way

@fjh100456
Copy link
Contributor Author

I see, but really it's not easy to find the speculative tasks, especially when the succeeded task num are inconsistent on the all jobs page and the job detail page.
Shall we let them seems consistent by modifying the num on all jobs page? Or it's also on purpose? @ajbozarth @pwendell

@srowen
Copy link
Member

srowen commented May 10, 2017

The two displays are at least inconsistent. If displaying "50/50" instead of "53/50" is intentional and been the behavior for a while, let's stick with that. However if some page still shows things like "53/50" we should fix that here.

@ajbozarth
Copy link
Member

I don't know @pwendell original reasoning for the change and I have no personal opinion. Given @pwendell seems to have been inactive for a while, @rxin you reviewed the original PR do you remember the reasoning?

@rxin
Copy link
Contributor

rxin commented May 10, 2017

sry too long ago

@ajbozarth
Copy link
Member

If no one can think of a reason to keep it this way I'm ok with this change, but should we get rid of stageData.completedIndices since it was only added for this purpose or is it used in other places now?

@fjh100456
Copy link
Contributor Author

The description of @pwendell 's jira looks as if it only because the number of tasks completed will exceed the total number of tasks. But I think this excess can remind the user that the "speculative" has been triggered, and users like me may check to be a host running problem or a data problem or other problem. The excess can be more convenient for me to discover speculative tasks and find them.

So I prefer to allow the "excess". I have checked the code, and "completedIndices" is used only for calculating the number of tasks, and no other place to use. If this is ok, I will remove "completedIndices".

@mridulm
Copy link
Contributor

mridulm commented May 11, 2017

The intent, if I remember correctly, was that we know exactly how many unfinished partitions remaining.
For example, 499/500 means there is one task (partition) left to be computed.
Adding total tasks there will mean that X / 500 does not give indication of how many partitions are actually remaining : whether this is still relevant, I have not looked into.

@srowen
Copy link
Member

srowen commented May 11, 2017

Yeah, I think the problem is: if you had 2 speculated tasks, and 2 failed, you'd still show 50/50 and that's indistinguishable from 0 speculated and 0 failed.

If there's doubt about the intent, I'd leave the behavior that was, at least, implemented intentionally, and match it instead of the other way around.

@fjh100456
Copy link
Contributor Author

Well, I'll change the succeeded tasks of the all jobs page to the same as the job detail page, and also record the completed tskid by a "completedIndices".
Is it ok?

@sbyim
Copy link

sbyim commented May 12, 2017 via email

…he same as the job detail page(by excluding speculated tasks).

## What changes were proposed in this pull request?

Count the completed tasks num by taskId to excluding speculated tasks.
Add a "completedIndices"(OpenHashSet) to record the the completed tasks by id, and show the completed tasks num by "job.completedIndices.size" instead of "job.numCompletedTasks".

## How was this patch tested?

manual tests
Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the right change to me. Jobs now need to remember which task IDs have completed to deliver a count of completed indices, like stages do.

@SparkQA
Copy link

SparkQA commented May 15, 2017

Test build #3712 has started for PR 17923 at commit d80f140.

@fjh100456
Copy link
Contributor Author

The build seems to have failed, but it should have no relationship with my changes.
Why does it show "No changes"? Something error?

@srowen
Copy link
Member

srowen commented May 17, 2017

@fjh100456 Jenkins is having problems

@SparkQA
Copy link

SparkQA commented May 19, 2017

Test build #3729 has started for PR 17923 at commit d80f140.

@SparkQA
Copy link

SparkQA commented May 19, 2017

Test build #3737 has finished for PR 17923 at commit d80f140.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 19, 2017

Test build #3739 has finished for PR 17923 at commit d80f140.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 19, 2017

Test build #3742 has finished for PR 17923 at commit d80f140.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

…Id, taskIndex) tuples in JobUIData

## What changes were proposed in this pull request?

Task index is not unique when there are multiple stages, so it is not possible to record the number of successful tasks through task index in the entire job, but should be recorded by StageId and TaskIndex.
If there is a retry stage, it will no longer exceed the total number of tasks.

## How was this patch tested?

manual tests
@SparkQA
Copy link

SparkQA commented May 20, 2017

Test build #3743 has finished for PR 17923 at commit d80f140.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented May 20, 2017

Oh @fjh100456 actually this test failure is legitimate, not due to recent Jenkins problems. The UI test needs to reflect the new display.

Cause: org.scalatest.exceptions.TestFailedException: "[2]/2 (1 failed)" was not equal to "[3]/2 (1 failed)"

@fjh100456
Copy link
Contributor Author

Yes, I've fixed the use case and bugs. @srowen

@fjh100456
Copy link
Contributor Author

test

@srowen
Copy link
Member

srowen commented May 22, 2017

@fjh100456 yes, can you fix the test?

@fjh100456
Copy link
Contributor Author

I had done.
Please rebuild again,thanks.

@SparkQA
Copy link

SparkQA commented May 22, 2017

Test build #3749 has finished for PR 17923 at commit 89c78ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented May 22, 2017

Merged to master

@asfgit asfgit closed this in 190d8b0 May 22, 2017
lycplus pushed a commit to lycplus/spark that referenced this pull request May 24, 2017
…and job detail page on spark web ui when speculative task(s) exist.

## What changes were proposed in this pull request?

Modified succeeded num in job detail page from "completed = stageData.completedIndices.size" to "completed = stageData.numCompleteTasks",which making succeeded tasks num in all jobs page and job detail page look more consistent, and more easily to find which stages the speculative task(s) were in.

## How was this patch tested?

manual tests

Author: fjh100456 <[email protected]>

Closes apache#17923 from fjh100456/master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants