Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Aug 16, 2019

What changes were proposed in this pull request?

In pr #24533 , it prevent retry to a removed Executor.
In my test, I can't catch exceptions from
new OneForOneBlockFetcher(client, appId, execId, blockIds, listener, transportConf, tempFileManager).start()
And I check the code carefully, method start() will handle exception of IOException in it's retry logical, won't throw it out. until it meet maxRetry times or meet exception that is not IOException.

And if we meet the situation that when we fetch block , the executor is dead, when we rerun
RetryingBlockFetcher.BlockFetchStarter.createAndStart()
we may failed when we create a transport client to dead executor. it will throw a IOException.
We should catch this IOException.

Why are the changes needed?

Old solution not comprehensive. Didn't cover more case.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existed Unit Test

@AngersZhuuuu
Copy link
Contributor Author

gentle ping @cloud-fan @felixcheung

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Aug 16, 2019

Test build #109174 has finished for PR 25469 at commit e2dbe4b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

@cloud-fan strange test result of Unit test. The shown failed Unit test can be run correctly in my computer

@maropu
Copy link
Member

maropu commented Aug 16, 2019

retest this please

@maropu
Copy link
Member

maropu commented Aug 16, 2019

@AngersZhuuuu ok, the other prs hit the same errors, so they are not related to this pr.

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu ok, the other prs hit the same errors, so they are not related to this pr.

Got it, thanks.

@SparkQA
Copy link

SparkQA commented Aug 16, 2019

Test build #109183 has finished for PR 25469 at commit e2dbe4b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

@maropu passed test but seem failed due to some strange reason

@cloud-fan
Copy link
Contributor

retest this please

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27637][Shuffle][FLLOW-UP]For nettyBlockTransferService, if IOException occurred while create client, check whether relative executor is alive before retry #24533 [SPARK-27637][Shuffle][FOLLOW-UP]For nettyBlockTransferService, if IOException occurred while create client, check whether relative executor is alive before retry #24533 Aug 16, 2019
@SparkQA
Copy link

SparkQA commented Aug 16, 2019

Test build #109199 has finished for PR 25469 at commit e2dbe4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants