-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-44544][INFRA] Deduplicate run_python_packaging_tests
#42146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44544][INFRA] Deduplicate run_python_packaging_tests
#42146
Conversation
dev/run-tests.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we should probably deduplicate because it runs for every python module test. another way is just to add an env variable, and enable it in only one split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just checking whether disabling this test works.
will do so if this is the root cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this workflow: https://github.com/panbingkun/spark/actions/runs/5653268655/job/15314215187 Look, it seems to be a problem with run_python_packaging_tests.
Because all the tests in pyspark-pandas-connect-part-3 have ended running, run_python_packaging_tests caused an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure about this, since in https://github.com/zhengruifeng/spark/actions/runs/5652363908/job/15311889590 I also disabled all the packaging tests. (you can check that there is no packaging tests in pyspark-core),
but pyspark-sql and pyspark-pandas-slow-connect still failed ...
I just rebase the PR to re-test whether disabling works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I see packaging test in pyspark-core successed in https://github.com/apache/spark/actions/runs/5652170312/job/15311427056, so maybe packaging test itself is fine?
TBH, I don't figure out what is happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the test results, after disable run_python_packaging_tests and splitting, it was successful.
https://github.com/panbingkun/spark/actions/runs/5654907063/job/15318981268
caf49cd to
7bd48e9
Compare
7bd48e9 to
35ccb99
Compare
|
It seems disabling https://github.com/zhengruifeng/spark/actions/runs/5654026436/job/15316814903 https://github.com/zhengruifeng/spark/actions/runs/5654706264/job/15324666018 |
a1f5da4 to
ba071c4
Compare
|
also cc @Yikun |
|
let me take a quick look at #42159 too. I think we can just remove some directory before running pip test. |
|
@zhengruifeng @HyukjinKwon |
|
yeah. let me take a quick look at #42159 |
b204c5c to
134a4ff
Compare
run_python_packaging_tests
604131c to
c0a46fd
Compare
fix bash again
c0a46fd to
04c271e
Compare
|
@HyukjinKwon I think this PR is orthogonal to #42159:
shall we go with this PR first (after all tests pass)? |
|
also cc @LuciferYang @dongjoon-hyun |
### What changes were proposed in this pull request? it seems that `run_python_packaging_tests` requires some disk space and cause some pyspark modules fail, this PR is to make `run_python_packaging_tests` only enabled within `pyspark-errors` (which is the smallest pyspark test module)  ### Why are the changes needed? 1, it seems it is the `run_python_packaging_tests` that cause the `No space left` error; 2, the `run_python_packaging_tests` is tested in all `pyspark-*` test modules, should be deduplicated; ### Does this PR introduce _any_ user-facing change? no, infra-only ### How was this patch tested? updated CI Closes #42146 from zhengruifeng/infra_skip_py_packing_tests. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]> (cherry picked from commit 748eaff) Signed-off-by: Ruifeng Zheng <[email protected]>
|
all python tests passed, merged to master/branch-3.5 |
LuciferYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
late LGTM
@zhengruifeng Maybe branch-3.4 also need it? |
|
@panbingkun good catch! Let me backport it to 3.4 |
### What changes were proposed in this pull request? cherry-pick #42146 to 3.4 ### Why are the changes needed? can not cherry-pick clearly, so make this PR ### Does this PR introduce _any_ user-facing change? no, infra-only ### How was this patch tested? updated CI Closes #42172 from zhengruifeng/cp_fix. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? run `run_python_packaging_tests` when there are any changes in PySpark ### Why are the changes needed? #42146 make CI run `run_python_packaging_tests` only within `pyspark-errors` (see https://github.com/apache/spark/actions/runs/5666118302/job/15359190468 and https://github.com/apache/spark/actions/runs/5668071930/job/15358091003)  but I ignored that `pyspark-errors` maybe skipped (because no related source changes), so the `run_python_packaging_tests` maybe also skipped unexpectedly (see https://github.com/apache/spark/actions/runs/5666523657/job/15353485731)  this PR is to run `run_python_packaging_tests` even if `pyspark-errors` is skipped ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated CI Closes #42173 from zhengruifeng/infra_followup. Lead-authored-by: Ruifeng Zheng <[email protected]> Co-authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? run `run_python_packaging_tests` when there are any changes in PySpark ### Why are the changes needed? #42146 make CI run `run_python_packaging_tests` only within `pyspark-errors` (see https://github.com/apache/spark/actions/runs/5666118302/job/15359190468 and https://github.com/apache/spark/actions/runs/5668071930/job/15358091003)  but I ignored that `pyspark-errors` maybe skipped (because no related source changes), so the `run_python_packaging_tests` maybe also skipped unexpectedly (see https://github.com/apache/spark/actions/runs/5666523657/job/15353485731)  this PR is to run `run_python_packaging_tests` even if `pyspark-errors` is skipped ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated CI Closes #42173 from zhengruifeng/infra_followup. Lead-authored-by: Ruifeng Zheng <[email protected]> Co-authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]> (cherry picked from commit f794734) Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? run `run_python_packaging_tests` when there are any changes in PySpark ### Why are the changes needed? #42146 make CI run `run_python_packaging_tests` only within `pyspark-errors` (see https://github.com/apache/spark/actions/runs/5666118302/job/15359190468 and https://github.com/apache/spark/actions/runs/5668071930/job/15358091003)  but I ignored that `pyspark-errors` maybe skipped (because no related source changes), so the `run_python_packaging_tests` maybe also skipped unexpectedly (see https://github.com/apache/spark/actions/runs/5666523657/job/15353485731)  this PR is to run `run_python_packaging_tests` even if `pyspark-errors` is skipped ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated CI Closes #42173 from zhengruifeng/infra_followup. Lead-authored-by: Ruifeng Zheng <[email protected]> Co-authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]> (cherry picked from commit f794734) Signed-off-by: Ruifeng Zheng <[email protected]>
…das-slow-connect GA testing time ### What changes were proposed in this pull request? The pr aims to balancing `pyspark-pandas-connect` and `pyspark-pandas-slow-connect` GA testing time. ### Why are the changes needed? After pr: #42146, the difference in testing time between `pyspark-pandas-connect` and `pyspark-pandas-slow-connect` is a bit significant, which affects the overall running time. In order to make GA operation more efficient and stable. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually monitor GA. Closes #42115 from panbingkun/free_disk_space. Lead-authored-by: panbingkun <[email protected]> Co-authored-by: panbingkun <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? cherry-pick apache#42146 to 3.4 ### Why are the changes needed? can not cherry-pick clearly, so make this PR ### Does this PR introduce _any_ user-facing change? no, infra-only ### How was this patch tested? updated CI Closes apache#42172 from zhengruifeng/cp_fix. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? run `run_python_packaging_tests` when there are any changes in PySpark ### Why are the changes needed? apache#42146 make CI run `run_python_packaging_tests` only within `pyspark-errors` (see https://github.com/apache/spark/actions/runs/5666118302/job/15359190468 and https://github.com/apache/spark/actions/runs/5668071930/job/15358091003)  but I ignored that `pyspark-errors` maybe skipped (because no related source changes), so the `run_python_packaging_tests` maybe also skipped unexpectedly (see https://github.com/apache/spark/actions/runs/5666523657/job/15353485731)  this PR is to run `run_python_packaging_tests` even if `pyspark-errors` is skipped ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated CI Closes apache#42173 from zhengruifeng/infra_followup. Lead-authored-by: Ruifeng Zheng <[email protected]> Co-authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]> (cherry picked from commit f794734) Signed-off-by: Ruifeng Zheng <[email protected]>
What changes were proposed in this pull request?
it seems that
run_python_packaging_testsrequires some disk space and cause some pyspark modules fail, this PR is to makerun_python_packaging_testsonly enabled withinpyspark-errors(which is the smallest pyspark test module)Why are the changes needed?
1, it seems it is the
run_python_packaging_teststhat cause theNo space lefterror;2, the
run_python_packaging_testsis tested in allpyspark-*test modules, should be deduplicated;Does this PR introduce any user-facing change?
no, infra-only
How was this patch tested?
updated CI