-
Notifications
You must be signed in to change notification settings - Fork 683
[Test] Split E2E nightly operator tests into RayCluster/GCS and RayJob runners #3932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Test] Split E2E nightly operator tests into RayCluster/GCS and RayJob runners #3932
Conversation
…b runners Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
.buildkite/test-e2e.yml
Outdated
| @@ -1,4 +1,4 @@ | |||
| - label: 'Test E2E (nightly operator)' | |||
| - label: 'Test E2E (nightly operator) - RayCluster & GCS' | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - label: 'Test E2E (nightly operator) - RayCluster & GCS' | |
| - label: 'Test RayCluster and GCS E2E (nightly operator)' |
nit: align naming format with others
.buildkite/test-e2e.yml
Outdated
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) | ||
| - echo "--- END:e2e (nightly operator) tests finished" | ||
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -run "TestRayCluster|TestGcs" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) | ||
| - echo "--- END:e2e (nightly operator) RayCluster & GCS tests finished" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RayCluster and GCS E2E
| - echo "--- END:e2e (nightly operator) RayCluster & GCS tests finished" | |
| - echo "--- END:RayCluster and GCS E2E (nightly operator) tests finished" |
nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to also align naming format under other e2e tests in the same file?
For example,
kuberay/.buildkite/test-e2e.yml
Line 33 in d1e750d
| - echo "--- START:Running e2e rayservice (nightly operator) tests" |
WDYT, @machichima @Future-Outlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I think we can open a follow-up PR after this PR is merged
.buildkite/test-e2e.yml
Outdated
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -run "TestRayCluster|TestGcs" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) | ||
| - echo "--- END:e2e (nightly operator) RayCluster & GCS tests finished" | ||
|
|
||
| - label: 'Test E2E (nightly operator) - RayJob' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - label: 'Test E2E (nightly operator) - RayJob' | |
| - label: 'Test RayJob E2E (nightly operator)' |
nit: align naming format with others
.buildkite/test-e2e.yml
Outdated
| - mkdir -p "$(pwd)/tmp" && export KUBERAY_TEST_OUTPUT_DIR=$(pwd)/tmp | ||
| - echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR" | ||
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -run "TestRayJob" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) | ||
| - echo "--- END:e2e (nightly operator) RayJob tests finished" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - echo "--- END:e2e (nightly operator) RayJob tests finished" | |
| - echo "--- END:RayJob E2E (nightly operator) tests finished" |
nit
.buildkite/test-e2e.yml
Outdated
| - set -o pipefail | ||
| - mkdir -p "$(pwd)/tmp" && export KUBERAY_TEST_OUTPUT_DIR=$(pwd)/tmp | ||
| - echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR" | ||
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -run "TestRayJob" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using go test -timeout 30m -v ./test/e2e -run "TestRayJob" (run flag), I think it would be better to split rayJob tests into new folder e.g. e2e-rayjob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @owenowenisme!
I think this is a great idea, and the current structure also works well. Perhaps we could leave it to the maintainer or committer to decide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Future-Outlier !
The reasons that I suggested split into files is that maybe not all tests in the future follows the convention of testRayJobxxx.
And also, rayServiceE2E and other e2e test are splited into files, so we might want to follow this pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this solution, but I found that I need to make some changes to this file:
https://github.com/ray-project/kuberay/blob/master/ray-operator/test/e2e/support.go
Maybe we can address this in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: I split rayJob tests into new folder e2erayjob, thank you all!
Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: machichima <[email protected]>
win5923
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
400Ping
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
.buildkite/test-e2e.yml
Outdated
| - echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR" | ||
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) | ||
| - echo "--- END:e2e (nightly operator) tests finished" | ||
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -run "TestRayCluster|TestGcs" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it could have a more solid match.
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -run "TestRayCluster|TestGcs" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) | |
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -run "^TestRayCluster|^TestGcs" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.buildkite/test-e2e.yml
Outdated
| - set -o pipefail | ||
| - mkdir -p "$(pwd)/tmp" && export KUBERAY_TEST_OUTPUT_DIR=$(pwd)/tmp | ||
| - echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR" | ||
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -run "TestRayJob" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ditto
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -run "TestRayJob" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) | |
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -run "^TestRayJob" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
machichima
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: fscnick <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
.buildkite/test-e2e.yml
Outdated
| - set -o pipefail | ||
| - mkdir -p "$(pwd)/tmp" && export KUBERAY_TEST_OUTPUT_DIR=$(pwd)/tmp | ||
| - echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR" | ||
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -run "^TestRayJob" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filtering both ^TestRayCluster|^TestGcs and ^TestRayJob could be risky, as it may silently ignore some tests that do not match the filters in the future. Or do we have a mechanism to force developers to use these prefixes? Otherwise, we'd better either split the test packages or use filters like ^TestRayJob and ^(?!TestRayJob) to cover all tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use filters like ^TestRayJob and ^(?!TestRayJob) to include or exclude relevant tests in this PR.
Once this PR is merged, I can open an issue to split the test packages. Does this look good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go (Golang) use Google's RE2 regular expression engine, and it does not support negative lookahead or lookbehind in its regular expressions.
google/re2#156
So I use this go test -timeout 30m -v ./test/e2e -skip "^TestRayJob" in label Test RayCluster and GCS E2E (nightly operator) instead in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, thank you for the review!
Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: Rueian <[email protected]>
.buildkite/test-e2e.yml
Outdated
| - echo "KUBERAY_TEST_OUTPUT_DIR=$$KUBERAY_TEST_OUTPUT_DIR" | ||
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) | ||
| - echo "--- END:e2e (nightly operator) tests finished" | ||
| - KUBERAY_TEST_TIMEOUT_SHORT=1m KUBERAY_TEST_TIMEOUT_MEDIUM=5m KUBERAY_TEST_TIMEOUT_LONG=10m go test -timeout 30m -v ./test/e2e -skip "^TestRayJob" 2>&1 | awk -f ../.buildkite/format.awk | tee $$KUBERAY_TEST_OUTPUT_DIR/gotest.log || (kubectl logs --tail -1 -l app.kubernetes.io/name=kuberay | tee $$KUBERAY_TEST_OUTPUT_DIR/kuberay-operator.log && cd $$KUBERAY_TEST_OUTPUT_DIR && find . -name "*.log" | tar -cf /artifact-mount/e2e-log.tar -T - && exit 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go test -timeout 30m -v ./test/e2e -skip "^TestRayJob"
This is pretty hacky. Can you create a new directory e2erayjob instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem, just updated, thank you!
Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: kaihsun <[email protected]>
kenchung285
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid defining the same functions multiple times in the codebase. We can add some util functions which are used in multiple Golang packages to https://github.com/ray-project/kuberay/tree/master/ray-operator/test/support.
Note that I didn't list all of them.
ray-operator/test/e2e/support.go
Outdated
|
|
||
| func newRayClusterSpec(options ...option[rayv1ac.RayClusterSpecApplyConfiguration]) *rayv1ac.RayClusterSpecApplyConfiguration { | ||
| return rayClusterSpecWith(rayClusterSpec(), options...) | ||
| func newRayClusterSpec() *rayv1ac.RayClusterSpecApplyConfiguration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used in both e2e/ and e2erayjob/. Could you avoid defining it multiple times?
| })))) | ||
| } | ||
|
|
||
| func workerPodTemplateApplyConfiguration() *corev1ac.PodTemplateSpecApplyConfiguration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used in both e2e/ and e2erayjob/. Could you avoid defining it multiple times?
| return apply(template, options...) | ||
| } | ||
|
|
||
| func headPodTemplateApplyConfiguration() *corev1ac.PodTemplateSpecApplyConfiguration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used in both e2e/ and e2erayjob/. Could you avoid defining it multiple times?
Signed-off-by: Future-Outlier <[email protected]>
…rt readfile Signed-off-by: Future-Outlier <[email protected]>
kevin85421
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
| //go:embed *.py | ||
| var _files embed.FS | ||
|
|
||
| func ReadFile(t Test, fileName string) []byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace it with the upper-level ReadFile.
| //go:embed *.py | ||
| var _files embed.FS | ||
|
|
||
| func ReadFile(t Test, fileName string) []byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace it with the upper-level ReadFile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial issue has been fixed, but there are still some duplicated functions. I’ve opened a follow-up issue for additional cleanup: #3947
Thank you for the advice!
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
…b runners (ray-project#3932) Signed-off-by: Future-Outlier <[email protected]> Co-authored-by: machichima <[email protected]> Co-authored-by: fscnick <[email protected]> Co-authored-by: Rueian <[email protected]> Co-authored-by: kaihsun <[email protected]>
…e2erayservice/support.go by reusing test/support/support.go implementations to improve maintainability and reduce redundancy. Related to ray-project#3932 Signed-off-by: HSIU-CHI LIU (Tomlord) <[email protected]>
…e2erayservice/support.go by reusing test/support/support.go implementations to improve maintainability and reduce redundancy. Related to #3932 (#4038) Signed-off-by: HSIU-CHI LIU (Tomlord) <[email protected]> Signed-off-by: Hsiu-Chi Liu (Tomlord) <[email protected]>


Why are these changes needed?
We have recently noticed that the E2E nightly operator tests in CI have become flaky.
https://buildkite.com/ray-project/ray-ecosystem-ci-kuberay-ci/builds/10289#01988d2f-1b10-46c7-825e-826f7c895dfb
It appears that the “Test E2E (nightly operator)” step takes more than 30 minutes, causing it to time out.
To address this, we can split the nightly operator tests into two runners:
• Runner 1: RayCluster / GCS tests
• Runner 2: RayJob tests
The execution times for these two groups are similar, and splitting them will make debugging easier in the future.
Related issue number
#3930
Checks