Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

NOTE: Do not merge. If the test passes and we are OK with the direction, I'll cherry-pick these commits one by one manually.

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

yanxiaole and others added 2 commits October 15, 2020 15:28
… History server

### What changes were proposed in this pull request?
This PR adds a try catch wrapping the History server scan logic to log and swallow the exception per entry.

### Why are the changes needed?
As discussed in apache#29350 , one entry failure shouldn't affect others.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually tested.

Closes apache#29374 from yanxiaole/SPARK-32557.

Authored-by: Yan Xiaole <xiaole.yan@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…oading new applications in SHS""

This reverts commit e40c147.
@HeartSaVioR
Copy link
Contributor Author

This is a proposal on handling backport of #30037 to branch-3.0.

@dongjoon-hyun
Copy link
Member

Thank you for investigating, @HeartSaVioR ! Let's see.

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34418/

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34418/

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Test build #129813 has finished for PR 30051 at commit 3a0af10.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

@dongjoon-hyun The tests passed - are you OK with the approach I proposed?

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Test build #129975 has finished for PR 30051 at commit 3a0af10.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this, please

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34583/

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34585/

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34583/

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34585/

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Test build #129977 has finished for PR 30051 at commit 3a0af10.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

I'll go with this approach as this is a simplest way to deal with. Cherry-picking two commits to branch-3.0.

@HeartSaVioR
Copy link
Contributor Author

Following commits are just landed to branch-3.0.
15ed312
02f80cf

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Oct 19, 2020

Hey, @HeartSaVioR . I don't think the revert of revert is a correct way here. At least, if you really want to revert of revert, you should make a PR and pass the UT in the community instead of silent reverting of reverting.

02f80cf Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS""

The way I see this is the following.

  1. The original commit was wrong because it's committed without testing.
  2. Hence, the revert is legitimate to recover branch-3.0.
  3. After you tweaks branch-3.0 yesterday, you cannot claim of revert of revert. Instead, you had better land SPARK-33146 as a normal backporting PR with [3.0] tag.

In general, Revert of revert is used when the first revert decision was wrong. Here, it looks like you use it as a claim that your first commit was right.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Oct 19, 2020

There're two points here if I understand correctly, so I'll comment each point.

  1. Using revert of revert

I never used revert to claim or blame something. That is simply moving the commit out of, nothing else. Same applies to this case, revert of revert. I didn't intend anything, but I'm sorry if you feel I use revert to disagree with your revert. I agree I should cherry-pick the origin commit again instead of doing revert of revert. My bad.

  1. Necessity of creating a backport PR

I'm sorry but I disagree, because of the different view on root cause of the issue.

The original commit was wrong because SPARK-32557 didn't land to branch-3.0, despite it was a follow-up of SPARK-32529 which was landed to branch-3.0. While the type of SPARK-32557 is marked as "improvement" (I guess that was the reason it didn't land to branch-3.0), I can simply turn it to "bug" as it actually fixes a bug in real world.

That's a two sides of the coin - unless the improvement doesn't change the functionality, it is somewhat likely possible to fix a bug. The strict rule about when to backport (backport only for a bug type) is really not pragmatic and that should be only enforced when we don't believe others (while that can be still used as a guideline). Is it the case?

In my view I did two works 1) SPARK-32557 is missing in branch-3.0 so I "corrected" it. 2) After that I cherry picked SPARK-33146 to branch-3.0 again as "usual practice" on merging phase. To not break again I made WIP PR to confirm these works don't break anything. I'm not sure I didn't respect some policy here.

Btw, that was my bad I didn't do some verification after cherry-picking and simply pushing. Though that was my bad, I think it's arguable (and probably meaningful) topic that how many verifications can be done during cherry-picking.

IMHO, enforcing to build "without test" on cherry-picked branch is somewhat pragmatic, as it requires around 15+ mins which may be acceptable. That still breaks the flow, but well, we may account it for "responsibility".

Building with test is completely different story - we'll let our development environment be stuck for 3~4 hours, which doesn't seem to be something we want to enforce to mergers. If that is enforced I expect the negative impact on avoiding to port back while it's ideal to port back. (Again this wouldn't happen if we ported back SPARK-32557.)

Always requesting to contributors to make a backport PRs and require them to hang around for more than 4 hours isn't also the solution - I think it's too hard for volunteers (both mergers and contributors). Accounting all of questions I raised, I guess it costs less we tolerate the possibility of test failures and fix it later.

@dongjoon-hyun
Copy link
Member

Okay, it seems that we agree to disagree. Thank you for sharing your opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants