Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jan 23, 2024

What changes were proposed in this pull request?

This PR cleans up the obsolete code in PySpark coverage script

Why are the changes needed?

We used to use coverage_daemon.py for Python workers to track the coverage of the Python worker side (e.g., the coverage within Python UDF), added in #20204. However, seems it does not work anymore. In fact, it has been multiple years that it stopped working. The approach of replacing the Python worker itself was a bit hacky workaround. We should just get rid of them first, and find a proper way.

This should also deflake the scheduled jobs, and speed up the build.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually tested via:

./run-tests-with-coverage --python-executables=python3 --testname="pyspark.sql.functions.builtin"
...
Finished test(python3): pyspark.sql.tests.test_functions (87s)
Tests passed in 87 seconds
Combining collected coverage data under
...
Creating XML report file at python/coverage.xml
Wrote XML report to coverage.xml
Reporting the coverage data at /.../spark/python/test_coverage/coverage_data/coverage
Name                                    Stmts   Miss Branch BrPart  Cover
-------------------------------------------------------------------------
pyspark/__init__.py                        48      7     10      3    76%
pyspark/_globals.py                        16      3      4      2    75%
pyspark/accumulators.py                   123     38     26      5    66%
pyspark/broadcast.py                      121     79     40      3    33%
pyspark/conf.py                            99     33     50      5    64%
pyspark/context.py                        451    216    151     26    51%
pyspark/errors/__init__.py                  3      0      0      0   100%
pyspark/errors/error_classes.py             3      0      0      0   100%
pyspark/errors/exceptions/__init__.py       0      0      0      0   100%
pyspark/errors/exceptions/base.py          91     15     24      4    83%
pyspark/errors/exceptions/captured.py     168     81     57     17    48%
pyspark/errors/utils.py                    34      8      6      2    70%
pyspark/files.py                           34     15     12      3    57%
pyspark/find_spark_home.py                 30     24     12      2    19%
pyspark/java_gateway.py                   114     31     30     12    69%
pyspark/join.py                            66     58     58      0     6%
pyspark/profiler.py                       244    182     92      3    22%
pyspark/rdd.py                           1064    741    378      9    27%
pyspark/rddsampler.py                      68     50     32      0    18%
pyspark/resource/__init__.py                5      0      0      0   100%
pyspark/resource/information.py            11      4      4      0    73%
pyspark/resource/profile.py               110     82     58      1    27%
pyspark/resource/requests.py              139     90     70      0    35%
pyspark/resultiterable.py                  14      6      2      1    56%
pyspark/serializers.py                    349    185     90     13    43%
pyspark/shuffle.py                        397    322    180      1    13%
pyspark/sql/__init__.py                    14      0      0      0   100%
pyspark/sql/catalog.py                    203    127     66      2    30%
pyspark/sql/column.py                     268     78     64     12    67%
pyspark/sql/conf.py                        40     16     10      3    58%
pyspark/sql/context.py                    170     95     58      2    47%
pyspark/sql/dataframe.py                  900    475    459     40    45%
pyspark/sql/functions/__init__.py           3      0      0      0   100%
pyspark/sql/functions/builtin.py         1741    542   1126     26    76%
pyspark/sql/functions/partitioning.py      41     19     18      3    59%
pyspark/sql/group.py                       81     30     32      3    65%
pyspark/sql/observation.py                 54     37     22      1    26%
pyspark/sql/pandas/__init__.py              1      0      0      0   100%
pyspark/sql/pandas/conversion.py          277    249    156      2     8%
pyspark/sql/pandas/functions.py            67     49     34      0    18%
pyspark/sql/pandas/group_ops.py            89     65     22      2    25%
pyspark/sql/pandas/map_ops.py              37     27     10      2    26%
pyspark/sql/pandas/serializers.py         381    323    172      0    10%
pyspark/sql/pandas/typehints.py            41     32     26      1    15%
pyspark/sql/pandas/types.py               407    383    326      1     3%
pyspark/sql/pandas/utils.py                29     11     10      5    59%
pyspark/sql/profiler.py                    80     47     54      1    39%
pyspark/sql/readwriter.py                 362    253    146      7    27%
pyspark/sql/session.py                    469    206    228     22    56%
pyspark/sql/sql_formatter.py               41     26     16      1    28%
pyspark/sql/streaming/__init__.py           4      0      0      0   100%
pyspark/sql/streaming/listener.py         400    200    186      1    61%
pyspark/sql/streaming/query.py            102     63     40      1    39%
pyspark/sql/streaming/readwriter.py       268    207    118      2    21%
pyspark/sql/streaming/state.py            100     68     44      0    29%
pyspark/sql/tests/__init__.py               0      0      0      0   100%
pyspark/sql/tests/test_functions.py       646      2    244      7    99%
pyspark/sql/types.py                     1013    355    528     74    62%
pyspark/sql/udf.py                        240    132     90     20    42%
pyspark/sql/udtf.py                       152     98     52      2    33%
pyspark/sql/utils.py                      160     83     54     10    45%
pyspark/sql/window.py                      89     23     56      5    77%
pyspark/statcounter.py                     79     58     20      0    21%
pyspark/status.py                          36     13      6      0    55%
pyspark/storagelevel.py                    41      9      0      0    78%
pyspark/taskcontext.py                    111     63     40      1    40%
pyspark/testing/__init__.py                 2      0      0      0   100%
pyspark/testing/sqlutils.py               149     44     52      1    75%
pyspark/testing/utils.py                  312    238    162      2    17%
pyspark/traceback_utils.py                 38      4     14      6    81%
pyspark/util.py                           153    120     56      2    18%
pyspark/version.py                          1      0      0      0   100%
...

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@HyukjinKwon
Copy link
Member Author

Merged to master.

dongjoon-hyun pushed a commit that referenced this pull request Feb 26, 2024
…-tests-with-coverage

### What changes were proposed in this pull request?

This PR is a followup of #44842 that removes obsolete comment in run-tests-with-coverage.

### Why are the changes needed?

To remove obsolete information.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45245 from HyukjinKwon/SPARK-46802-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
…-tests-with-coverage

### What changes were proposed in this pull request?

This PR is a followup of apache#44842 that removes obsolete comment in run-tests-with-coverage.

### Why are the changes needed?

To remove obsolete information.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45245 from HyukjinKwon/SPARK-46802-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants