Skip to content

Conversation

@holdenk
Copy link
Contributor

@holdenk holdenk commented May 22, 2020

What changes were proposed in this pull request?

Increase the timeout and register the listener earlier to avoid any race condition of the job starting before the listener is registered.

Why are the changes needed?

The test is currently semi-flaky.

Does this PR introduce any user-facing change?

No

How was this patch tested?

I'm currently running the following bash script on my dev machine to verify the flakiness decreases. It has gotten to 356 iterations without any test failures so I believe issue is fixed.

set -ex
./build/sbt clean compile package
((failures=0))
for (( i=0;i<1000;++i )); do
  echo "Run $i"
  ((failed=0))
  ./build/sbt "core/testOnly org.apache.spark.scheduler.WorkerDecommissionSuite" || ((failed=1))
  echo "Resulted in $failed"
  ((failures=failures+failed))
  echo "Current status is failures: $failures out of $i runs"
done

holdenk added 2 commits May 21, 2020 21:39
…eached the point in the scheduling, make the task take slightly longer and wait an extra two seconds for scheduling to run it's corse
@holdenk
Copy link
Contributor Author

holdenk commented May 22, 2020

cc @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented May 22, 2020

Test build #123012 has finished for PR 28614 at commit b5e83de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor Author

holdenk commented May 22, 2020

I intend to merge by 5pm pacific if no one has suggestions as it is a test-only improvement.

@asfgit asfgit closed this in 721cba5 May 23, 2020
@holdenk
Copy link
Contributor Author

holdenk commented May 23, 2020

Merged to master.

@dongjoon-hyun
Copy link
Member

+1, late LGTM.

holdenk added a commit to holdenk/spark that referenced this pull request Jun 25, 2020
### What changes were proposed in this pull request?

Increase the timeout and register the listener earlier to avoid any race condition of the job starting before the listener is registered.

### Why are the changes needed?

The test is currently semi-flaky.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?
I'm currently running the following bash script on my dev machine to verify the flakiness decreases. It has gotten to 356 iterations without any test failures so I believe issue is fixed.

```
set -ex
./build/sbt clean compile package
((failures=0))
for (( i=0;i<1000;++i )); do
  echo "Run $i"
  ((failed=0))
  ./build/sbt "core/testOnly org.apache.spark.scheduler.WorkerDecommissionSuite" || ((failed=1))
  echo "Resulted in $failed"
  ((failures=failures+failed))
  echo "Current status is failures: $failures out of $i runs"
done
```

Closes apache#28614 from holdenk/SPARK-31791-improve-cache-block-migration-test-reliability.

Authored-by: Holden Karau <[email protected]>
Signed-off-by: Holden Karau <[email protected]>
holdenk added a commit to holdenk/spark that referenced this pull request Oct 27, 2020
### What changes were proposed in this pull request?

Increase the timeout and register the listener earlier to avoid any race condition of the job starting before the listener is registered.

### Why are the changes needed?

The test is currently semi-flaky.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?
I'm currently running the following bash script on my dev machine to verify the flakiness decreases. It has gotten to 356 iterations without any test failures so I believe issue is fixed.

```
set -ex
./build/sbt clean compile package
((failures=0))
for (( i=0;i<1000;++i )); do
  echo "Run $i"
  ((failed=0))
  ./build/sbt "core/testOnly org.apache.spark.scheduler.WorkerDecommissionSuite" || ((failed=1))
  echo "Resulted in $failed"
  ((failures=failures+failed))
  echo "Current status is failures: $failures out of $i runs"
done
```

Closes apache#28614 from holdenk/SPARK-31791-improve-cache-block-migration-test-reliability.

Authored-by: Holden Karau <[email protected]>
Signed-off-by: Holden Karau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants