Skip to content

Conversation

@wgtmac
Copy link
Owner

@wgtmac wgtmac commented Mar 17, 2024

No description provided.

@github-actions github-actions bot added the BUILD label Mar 17, 2024
@wgtmac wgtmac mentioned this pull request Mar 17, 2024
15 tasks
@wgtmac
Copy link
Owner Author

wgtmac commented Mar 17, 2024

The failure of python linter in https://github.com/wgtmac/spark/actions/runs/8312721080/job/22748412563?pr=6 should be unrelated.

starting mypy annotations test...
annotations failed mypy checks:
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/mutation_guard.py:1: error: disable_error_code: Invalid error code(s): method-assign  [misc]
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/eval_frame.py:1: error: disable_error_code: Invalid error code(s): method-assign  [misc]
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/debug_utils.py:1: error: disable_error_code: Invalid error code(s): method-assign  [misc]
python/pyspark/pandas/plot/matplotlib.py:23: error: Module "matplotlib.axes._base" has no attribute "_process_plot_format"  [attr-defined]
Found 4 errors in 4 files (checked 688 source files)
1
Error: Process completed with exit code 1.

zhengruifeng and others added 2 commits March 17, 2024 14:28
### What changes were proposed in this pull request?
Pin `pyarrow==12.0.1` in CI

### Why are the changes needed?
to fix test failure,  https://github.com/apache/spark/actions/runs/6167186123/job/16738683632

```
======================================================================
FAIL [0.095s]: test_from_to_pandas (pyspark.pandas.tests.data_type_ops.test_datetime_ops.DatetimeOpsTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 122, in _assert_pandas_equal
    assert_series_equal(
  File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 931, in assert_series_equal
    assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
  File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 415, in assert_attr_equal
    raise_assert_detail(obj, msg, left_attr, right_attr)
  File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 599, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: Attributes of Series are different

Attribute "dtype" are different
[left]:  datetime64[ns]
[right]: datetime64[us]
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI and manually test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#42897 from zhengruifeng/pin_pyarrow.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
(cherry picked from commit e3d2dfa)
Signed-off-by: Dongjoon Hyun <[email protected]>
…equirement, `<13.0.0`

### What changes were proposed in this pull request?

This PR aims to add `pyarrow` upper bound requirement, `<13.0.0`, to Apache Spark 3.5.x.

### Why are the changes needed?

PyArrow 13.0.0 has breaking changes mentioned by apache#42920 which is a part of Apache Spark 4.0.0.

### Does this PR introduce _any_ user-facing change?

No, this only clarifies the upper bound.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45553 from dongjoon-hyun/SPARK-47432.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@wgtmac
Copy link
Owner Author

wgtmac commented Mar 18, 2024

Same with RC0. I will ignore those unrelated failures: #7

### What changes were proposed in this pull request?

Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when one is using proxy settings. Change the generated link to be consistent with other links and include a trailing slash

### Why are the changes needed?

When using a proxy, an invalid redirect is issued if this is not included

### Does this PR introduce _any_ user-facing change?

Only that people will be able to use these links if they are using a proxy

### How was this patch tested?

With a proxy installed I went to the location this link would generate and could go to the page, when it redirects with the link as it exists.

Edit: Further tested by building a version of our application with this patch applied, the links work now.

### Was this patch authored or co-authored using generative AI tooling?

No.

Page with working link
<img width="913" alt="Screenshot 2024-03-18 at 4 45 27 PM" src="https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3">

Goes correctly to
<img width="539" alt="Screenshot 2024-03-18 at 4 45 36 PM" src="https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5">

Before it would redirect and we'd get a 404.

<img width="639" alt="image" src="https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef">

Closes apache#45527 from HuwCampbell/patch-1.

Authored-by: Huw Campbell <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 9b466d3)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link

Could you rebase once more, please?
PySpark tests should pass now (except the Python linter failure)

@wgtmac
Copy link
Owner Author

wgtmac commented Mar 19, 2024

Done!

@wgtmac wgtmac closed this Mar 22, 2024
@williamhyun williamhyun mentioned this pull request Jul 13, 2024
13 tasks
wgtmac pushed a commit that referenced this pull request May 3, 2025
…anRelationPushDown

### What changes were proposed in this pull request?

Add the timezone information to a cast expression when the destination type requires it.

### Why are the changes needed?

When current_timestamp() is materialized as a string, the timezone information is gone (e.g., 2024-12-27 10:26:27.684158) which prohibits further optimization rules from being applied to the affected data source.

For example,

```
Project [1735900357973433#10 AS current_timestamp()#6]
+- 'Project [cast(2025-01-03 10:32:37.973433#11 as timestamp) AS 1735900357973433#10]
   +- RelationV2[2025-01-03 10:32:37.973433#11] xxx
```

-> This query fails to execute because the injected cast expression lacks the timezone information.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#49549 from changgyoopark-db/SPARK-50870.

Authored-by: changgyoopark-db <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 24abb0f)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants