Skip to content

Conversation

@yihua
Copy link
Contributor

@yihua yihua commented Feb 14, 2023

Change Logs

Previously, we found that Spark Datasource read of metadata table was broken and the issue is fixed by #7924. However, the TestMetadataTableWithSparkDataSource guarding the exact same functionality did not fail in CI or local mvn command below. After investigation, the Hudi Spark configs (spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog, spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension) are not properly added to the Spark session in the test environment.

This PR sets the proper Hudi Spark configs for Spark Datasource tests and adds one more test on reading metadata table through Spark Datasource.

Impact

After this change, without the fix #7924, the following test fails which is consistent with the behavior of spark-shell (previously it passed without raising the alarm).

mvn clean test -Dspark3.3 -Dscala-2.12 -DwildcardSuites="abc" -Dtest=TestMetadataTableWithSparkDataSource -DfailIfNoTests=false -pl hudi-spark-datasource/hudi-spark -am

Risk level

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@yihua yihua added priority:critical Production degraded; pipelines stalled engine:spark Spark integration labels Feb 14, 2023
@yihua yihua force-pushed the HUDI-5785-enhance-mdt-spark-datasource-test branch from 9223732 to 4e7da70 Compare February 14, 2023 07:03
Copy link
Member

@xushiyan xushiyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@apache apache deleted a comment from hudi-bot Feb 14, 2023
Copy link
Contributor

@alexeykudinkin alexeykudinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start the process of unifying all of the utilities to make sure we're not getting bitten by the same thing again
https://github.com/apache/hudi/pull/7702/files#diff-93d5c78a2db3470cef4a643a3b41b8b97876f411310a5653d232525c87a6d749

@yihua
Copy link
Contributor Author

yihua commented Feb 14, 2023

Let's start the process of unifying all of the utilities to make sure we're not getting bitten by the same thing again https://github.com/apache/hudi/pull/7702/files#diff-93d5c78a2db3470cef4a643a3b41b8b97876f411310a5653d232525c87a6d749

I created this to unify all APIs to construct Spark configs: HUDI-5788

@nsivabalan nsivabalan force-pushed the HUDI-5785-enhance-mdt-spark-datasource-test branch from 4e7da70 to 1bf2b8c Compare March 17, 2023 21:20
@nsivabalan
Copy link
Contributor

Since this has been approved already, will go ahead and merge once CI is green.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan dismissed alexeykudinkin’s stale review March 18, 2023 16:45

We got 2 approvals and moving ahead

@nsivabalan nsivabalan merged commit 102b535 into apache:master Mar 18, 2023
nsivabalan pushed a commit to nsivabalan/hudi that referenced this pull request Mar 18, 2023
- Enhancing spark ds tests to ensure tests for MDT spark datasource read tests are robust
nsivabalan pushed a commit to nsivabalan/hudi that referenced this pull request Mar 22, 2023
- Enhancing spark ds tests to ensure tests for MDT spark datasource read tests are robust
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
- Enhancing spark ds tests to ensure tests for MDT spark datasource read tests are robust
stayrascal pushed a commit to stayrascal/hudi that referenced this pull request Apr 20, 2023
- Enhancing spark ds tests to ensure tests for MDT spark datasource read tests are robust
KnightChess pushed a commit to KnightChess/hudi that referenced this pull request Jan 2, 2024
- Enhancing spark ds tests to ensure tests for MDT spark datasource read tests are robust
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine:spark Spark integration priority:critical Production degraded; pipelines stalled

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants