[HUDI-5193] Enhancing spark-ds write tests for some of the core user flows by nsivabalan · Pull Request #7179 · apache/hudi

nsivabalan · 2022-11-10T18:09:31Z

Change Logs

We realized we don't have good test coverage for some of the core user flows w/ spark data source writes. So, enhancing the tests to cover the scenarios.

Tests coverage added w/ spark-data source writes:

COW and MOR * (w/ and w/o metadata)
    Partitioned(BLOOM, SIMPLE, GLOBAL_BLOOM), non-partitioned(GLOBAL_BLOOM).
        Immutable data. pure bulk_insert row writing. 
        Immutable w/ file sizing. pure inserts. 
        initial bulk ingest, followed by updates.

Impact

Will help catch any bugs around core user flows upfront.

Risk level (write none, low medium or high below)

low.

Documentation Update

N/A

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

jonvex · 2022-11-16T15:42:35Z

def compareUpdateDfWithHudiDf(inputDf: Dataset[Row], hudiDf: Dataset[Row], beforeDf: Dataset[Row], colsToCompare: String ) : Unit = {
    val hudiWithoutMeta = hudiDf.drop(HoodieRecord.RECORD_KEY_METADATA_FIELD, HoodieRecord.PARTITION_PATH_METADATA_FIELD, HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, HoodieRecord.COMMIT_TIME_METADATA_FIELD, HoodieRecord.FILENAME_METADATA_FIELD)
    hudiWithoutMeta.registerTempTable("hudiTbl")
    inputDf.registerTempTable("inputTbl")
    beforeDf.registerTempTable("beforeTbl")
    val hudiDfToCompare = spark.sqlContext.sql("select " + colsToCompare + " from hudiTbl")
    val inputDfToCompare = spark.sqlContext.sql("select " + colsToCompare + " from inputTbl")
    val beforeDfToCompare = spark.sqlContext.sql("select " + colsToCompare + " from beforeTbl")

    assertEquals(hudiDfToCompare.intersect(inputDfToCompare).count, inputDfToCompare.count)
    assertEquals(hudiDfToCompare.except(inputDfToCompare).except(beforeDfToCompare).count, 0)
  }

jonvex · 2022-11-16T15:43:51Z

In commonOpts you have

DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",

but then in options you set

(DataSourceWriteOptions.RECORDKEY_FIELD.key() -> recordKeys) +

jonvex · 2022-11-16T15:45:06Z

I think doMORReadOptimizedQquery should call compareEntireInputDfWithHudiDf instead of compareUpdateDfWithHudiDf but I am less confident about this than my first two suggestions

jonvex · 2022-11-16T16:55:57Z

read incremental query doesn't have .option(HoodieMetadataConfig.ENABLE.key, isMetadataEnabledOnRead)

jonvex · 2022-11-16T17:21:49Z

think its supposed to be records4 not records3
var inputDF4 = spark.read.json(spark.sparkContext.parallelize(records3, 2))

jonvex · 2022-11-17T15:05:59Z

doMORReadOptimizedQquery has two Q's

jonvex · 2022-11-18T01:52:41Z

Inconsistent use of DF and Df

jonvex · 2022-11-21T14:42:51Z

Changes look good to me

...rk-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSparkDataSource.scala

hudi-bot · 2023-05-29T07:31:26Z

CI report:

c92b4ad Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

nsivabalan changed the title ~~[WIP] enhancing spark-ds tests~~ [HUDI-5193] Enhancing spark-ds write tests for some of the core user flows Nov 10, 2022

nsivabalan marked this pull request as ready for review November 10, 2022 20:11

nsivabalan added the priority:critical Production degraded; pipelines stalled label Nov 10, 2022

xushiyan self-assigned this Nov 13, 2022

jonvex reviewed Nov 21, 2022

View reviewed changes

...rk-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSparkDataSource.scala Outdated Show resolved Hide resolved

jonvex mentioned this pull request Nov 22, 2022

[HUDI-5269] Enhancing spark-sql write tests for some of the core user flows #7230

Merged

4 tasks

nsivabalan force-pushed the spark-ds-tests-enhance branch 2 times, most recently from c5c2e42 to d4e392c Compare November 23, 2022 00:18

nsivabalan added the release-0.12.2 Patches targetted for 0.12.2 label Dec 6, 2022

codope added test-coverage and removed release-0.12.2 Patches targetted for 0.12.2 labels Dec 7, 2022

vinothchandar added the release-0.14.0 label Apr 25, 2023

xushiyan force-pushed the spark-ds-tests-enhance branch 2 times, most recently from 51623ae to a508f40 Compare May 28, 2023 10:49

nsivabalan and others added 6 commits May 29, 2023 11:26

enhancing spark-ds tests

20a4455

Enhancing spark-ds write tests for some of the core user flows

5ee59a9

addressing comments

aef1220

fixing tests

3ecd172

addressing comments

774ec80

fix cache issue

c92b4ad

xushiyan force-pushed the spark-ds-tests-enhance branch from a508f40 to c92b4ad Compare May 29, 2023 03:26

xushiyan approved these changes May 29, 2023

View reviewed changes

xushiyan merged commit 182a210 into apache:master May 29, 2023

hudi-bot mentioned this pull request Dec 9, 2025

Enhancing core user flow tests for spark-datasource writes #15556

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-5193] Enhancing spark-ds write tests for some of the core user flows#7179

[HUDI-5193] Enhancing spark-ds write tests for some of the core user flows#7179
xushiyan merged 6 commits intoapache:masterfrom
nsivabalan:spark-ds-tests-enhance

nsivabalan commented Nov 10, 2022 •

edited

Loading

Uh oh!

jonvex commented Nov 16, 2022 •

edited

Loading

Uh oh!

jonvex commented Nov 16, 2022

Uh oh!

jonvex commented Nov 16, 2022

Uh oh!

jonvex commented Nov 16, 2022

Uh oh!

jonvex commented Nov 16, 2022

Uh oh!

jonvex commented Nov 17, 2022

Uh oh!

jonvex commented Nov 18, 2022

Uh oh!

jonvex commented Nov 21, 2022

Uh oh!

Uh oh!

hudi-bot commented May 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

nsivabalan commented Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

jonvex commented Nov 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonvex commented Nov 16, 2022

Uh oh!

jonvex commented Nov 16, 2022

Uh oh!

jonvex commented Nov 16, 2022

Uh oh!

jonvex commented Nov 16, 2022

Uh oh!

jonvex commented Nov 17, 2022

Uh oh!

jonvex commented Nov 18, 2022

Uh oh!

jonvex commented Nov 21, 2022

Uh oh!

Uh oh!

hudi-bot commented May 29, 2023

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nsivabalan commented Nov 10, 2022 •

edited

Loading

jonvex commented Nov 16, 2022 •

edited

Loading