Skip to content

Conversation

@witgo
Copy link
Contributor

@witgo witgo commented Jul 28, 2014

No description provided.

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA tests have started for PR 1619. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17280/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA results for PR 1619:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17280/consoleFull

@pwendell
Copy link
Contributor

Can you create a test for this? I'm not sure what happens here if the timeout is encountered.

@sarutak
Copy link
Member

sarutak commented Jul 29, 2014

@witgo @pwendell I have already noticed there is not a configuration for timeout for ConnectionManager, but the timeout for ConnectionManager does not resolve this issue because the channel used by receiving ack is implemented as non blocking I.O and SO_TIMEOUT is effects read after establishing connection. So, if remote executor hangs, it cannot establish connections with fetching executors.

Additionally, BasicBlockFetcherIterator is wait on LinkedBlockingQueue#take (result.take) so we should set FetchResult object which size is -1 to result queue of BasicBlockFetcherIterator.
(FetchResult which size is -1 means fetch failed)

I think remote errors can be classified following 2 cases.

  1. Remote Executor hang
    In this case, we need timeout for Fetch Request (Not read timeout)
    I'm trying to resolve this case in [SPARK-2677] BasicBlockFetchIterator#next can wait forever #1632

  2. Remote Executor not hang but error occurred
    In this case, remote executor should send message which means error occurred in remote Executor.
    I'm trying to resolve this case in [SPARK-2583] ConnectionManager cannot distinguish whether error occurred or not #1490
    This is ongoing.
    Can anyone review this too?

@SparkQA
Copy link

SparkQA commented Jul 29, 2014

QA tests have started for PR 1619. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17350/consoleFull

@witgo
Copy link
Contributor Author

witgo commented Jul 29, 2014

@sarutak I think add a heartbeat detection mechanism is a good solution

@SparkQA
Copy link

SparkQA commented Jul 29, 2014

QA results for PR 1619:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17350/consoleFull

@witgo
Copy link
Contributor Author

witgo commented Jul 29, 2014

@sarutak ConnectionManager.scala#L259 to deal with the situation of connection cannot be established.

@SparkQA
Copy link

SparkQA commented Jul 29, 2014

QA tests have started for PR 1619. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17356/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 29, 2014

QA results for PR 1619:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17356/consoleFull

@witgo witgo changed the title [WIP][SPARK-2677]BasicBlockFetchIterator#next can wait forever [SPARK-2677][SPARK-2717]BasicBlockFetchIterator#next can wait forever Jul 30, 2014
@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA tests have started for PR 1619. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17449/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA tests have started for PR 1619. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17450/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA tests have started for PR 1619. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17451/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA results for PR 1619:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17449/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA results for PR 1619:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17450/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA results for PR 1619:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17451/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA tests have started for PR 1619. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17580/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA results for PR 1619:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17580/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA tests have started for PR 1619. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17583/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA results for PR 1619:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17583/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA tests have started for PR 1619. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18097/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA results for PR 1619:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18097/consoleFull

@witgo witgo closed this Aug 17, 2014
@witgo witgo deleted the SPARK-2677 branch August 17, 2014 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants