Test Apache ORC 1.9.3 RC0 #7

wgtmac · 2024-03-17T05:30:24Z

No description provided.

wgtmac · 2024-03-17T13:46:32Z

It seems that all failures are about python and unrelated.

dongjoon-hyun · 2024-03-17T21:23:38Z

Yes, I found that recently two days ago and had been investigating this here.

[CI TEST][3.5] test apache/spark#45545

Apache Spark CI will be recovered during this weekend, @wgtmac .

### What changes were proposed in this pull request? Pin `pyarrow==12.0.1` in CI ### Why are the changes needed? to fix test failure, https://github.com/apache/spark/actions/runs/6167186123/job/16738683632 ``` ====================================================================== FAIL [0.095s]: test_from_to_pandas (pyspark.pandas.tests.data_type_ops.test_datetime_ops.DatetimeOpsTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 122, in _assert_pandas_equal assert_series_equal( File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 931, in assert_series_equal assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}") File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 415, in assert_attr_equal raise_assert_detail(obj, msg, left_attr, right_attr) File "/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 599, in raise_assert_detail raise AssertionError(msg) AssertionError: Attributes of Series are different Attribute "dtype" are different [left]: datetime64[ns] [right]: datetime64[us] ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI and manually test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42897 from zhengruifeng/pin_pyarrow. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]> (cherry picked from commit e3d2dfa) Signed-off-by: Dongjoon Hyun <[email protected]>

…equirement, `<13.0.0` ### What changes were proposed in this pull request? This PR aims to add `pyarrow` upper bound requirement, `<13.0.0`, to Apache Spark 3.5.x. ### Why are the changes needed? PyArrow 13.0.0 has breaking changes mentioned by apache#42920 which is a part of Apache Spark 4.0.0. ### Does this PR introduce _any_ user-facing change? No, this only clarifies the upper bound. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45553 from dongjoon-hyun/SPARK-47432. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

wgtmac · 2024-03-18T01:18:54Z

@dongjoon-hyun Thanks! Then I will ignore these unrelated failures.

### What changes were proposed in this pull request? Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when one is using proxy settings. Change the generated link to be consistent with other links and include a trailing slash ### Why are the changes needed? When using a proxy, an invalid redirect is issued if this is not included ### Does this PR introduce _any_ user-facing change? Only that people will be able to use these links if they are using a proxy ### How was this patch tested? With a proxy installed I went to the location this link would generate and could go to the page, when it redirects with the link as it exists. Edit: Further tested by building a version of our application with this patch applied, the links work now. ### Was this patch authored or co-authored using generative AI tooling? No. Page with working link <img width="913" alt="Screenshot 2024-03-18 at 4 45 27 PM" src="https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3"> Goes correctly to <img width="539" alt="Screenshot 2024-03-18 at 4 45 36 PM" src="https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5"> Before it would redirect and we'd get a 404. <img width="639" alt="image" src="https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef"> Closes apache#45527 from HuwCampbell/patch-1. Authored-by: Huw Campbell <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 9b466d3) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2024-03-18T16:17:26Z

Could you rebase once more, please?
PySpark tests should pass now (except the Python linter failure)

wgtmac · 2024-03-19T01:03:43Z

Done!

dongjoon-hyun · 2024-03-19T17:22:47Z

Thank you.

Could you re-trigger the failed-pipeline only?

https://github.com/wgtmac/spark/actions/runs/8335890609/job/22812181330?pr=7

…plan properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly cherry-pick bugfix apache#45214 to 3.4 ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes apache#46290 from zhengruifeng/connect_fix_read_join_34. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

…plan properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly cherry-pick bugfix apache#45214 to 3.5 ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes apache#46291 from zhengruifeng/connect_fix_read_join_35. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

github-actions bot added the BUILD label Mar 17, 2024

wgtmac mentioned this pull request Mar 17, 2024

Release Apache ORC 1.9.3 apache/orc#1851

Closed

15 tasks

zhengruifeng and others added 2 commits March 17, 2024 14:28

wgtmac mentioned this pull request Mar 18, 2024

Test Apache ORC 1.9.3-SNAPSHOT #6

Closed

Test Apache ORC 1.9.3 RC0

da8e4cf

wgtmac force-pushed the release_orc_1.9.3_rc0 branch from fd69f32 to da8e4cf Compare March 19, 2024 01:03

github-actions bot added INFRA SQL STRUCTURED STREAMING PYTHON WEB UI labels Mar 19, 2024

wgtmac closed this Mar 22, 2024

williamhyun mentioned this pull request Jul 13, 2024

Release ORC 1.9.4 apache/orc#1969

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test Apache ORC 1.9.3 RC0 #7

Test Apache ORC 1.9.3 RC0 #7

Uh oh!

wgtmac commented Mar 17, 2024

Uh oh!

wgtmac commented Mar 17, 2024

Uh oh!

dongjoon-hyun commented Mar 17, 2024 •

edited

Loading

Uh oh!

wgtmac commented Mar 18, 2024

Uh oh!

dongjoon-hyun commented Mar 18, 2024

Uh oh!

wgtmac commented Mar 19, 2024

Uh oh!

dongjoon-hyun commented Mar 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Test Apache ORC 1.9.3 RC0 #7

Test Apache ORC 1.9.3 RC0 #7

Uh oh!

Conversation

wgtmac commented Mar 17, 2024

Uh oh!

wgtmac commented Mar 17, 2024

Uh oh!

dongjoon-hyun commented Mar 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wgtmac commented Mar 18, 2024

Uh oh!

dongjoon-hyun commented Mar 18, 2024

Uh oh!

wgtmac commented Mar 19, 2024

Uh oh!

dongjoon-hyun commented Mar 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dongjoon-hyun commented Mar 17, 2024 •

edited

Loading