[SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingSuite` #38704

LuciferYang · 2022-11-18T04:43:03Z

What changes were proposed in this pull request?

This pr ignore collect data with single partition larger than 2GB bytes array limit in DatasetLargeResultCollectingSuite as default due it cannot run successfully with Spark default Java Options.

Why are the changes needed?

Avoid local test failure.

Does this PR introduce any user-facing change?

No, just for test

How was this patch tested?

Pass GA
Manual test: in my test environment, change -Xmx4g to -Xmx10g, maven and sbt can test successfully in my

LuciferYang · 2022-11-18T04:43:26Z

cc @HyukjinKwon

HyukjinKwon

Thanks. LGTM

mridulm · 2022-11-18T17:10:20Z

sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala

+  // default Java Options, if user need do local test, please make the following changes:
+  // - Maven test: change `-Xmx4g` of `scalatest-maven-plugin` in `sql/core/pom.xml` to `-Xmx10g`
+  // - SBT test: change `-Xmx4g` of `Test / javaOptions` in `SparkBuild.scala` to `-Xmx10g`
+  ignore("collect data with single partition larger than 2GB bytes array limit") {


@liuzqt, I know this was iterated on multiple times to get it to work - instead of the shared local spark session, did it work locally when using a local spark cluster instead ?

Yes @LuciferYang is right, need to change -Xmx4g to -Xmx10g to make it work (it works for both shared local session and local cluster, but without the change neither work).

Thanks for the fix! Previously I only tested this using IDE and I guess it increased the mem under the hood......Sorry for the inconvenience.

So how do we move forward? This is a blocking for developers

I think we can leave it as ignore for now with the comments about using larger mem to make it work. I'm not sure if we're able to configure the build args for a specific test suite.

LuciferYang · 2022-11-22T00:15:00Z

friendly ping @HyukjinKwon

HyukjinKwon · 2022-11-22T06:50:52Z

Merged to master.

LuciferYang · 2022-11-22T06:52:40Z

Thanks @HyukjinKwon @mridulm @liuzqt

…larger than 2GB bytes array limit` in `DatasetLargeResultCollectingSuite` ### What changes were proposed in this pull request? This pr ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingSuite` as default due it cannot run successfully with Spark default Java Options. ### Why are the changes needed? Avoid local test failure. ### Does this PR introduce _any_ user-facing change? No, just for test ### How was this patch tested? - Pass GA - Manual test: in my test environment, change `-Xmx4g` to `-Xmx10g`, maven and sbt can test successfully in my Closes apache#38704 from LuciferYang/SPARK-41193. Authored-by: yangjie01 <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

ignore test

91c1da0

github-actions bot added the SQL label Nov 18, 2022

LuciferYang mentioned this pull request Nov 18, 2022

[SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB #38064

Closed

HyukjinKwon approved these changes Nov 18, 2022

View reviewed changes

mridulm reviewed Nov 18, 2022

View reviewed changes

HyukjinKwon closed this in 8496059 Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingSuite` #38704

[SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingSuite` #38704

Uh oh!

LuciferYang commented Nov 18, 2022

Uh oh!

LuciferYang commented Nov 18, 2022

Uh oh!

HyukjinKwon left a comment

Uh oh!

mridulm Nov 18, 2022

Uh oh!

liuzqt Nov 18, 2022

Uh oh!

LuciferYang Nov 21, 2022

Uh oh!

liuzqt Nov 21, 2022

Uh oh!

LuciferYang commented Nov 22, 2022

Uh oh!

HyukjinKwon commented Nov 22, 2022

Uh oh!

LuciferYang commented Nov 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-41193][SQL][TESTS] Ignore collect data with single partition larger than 2GB bytes array limit in DatasetLargeResultCollectingSuite #38704

[SPARK-41193][SQL][TESTS] Ignore collect data with single partition larger than 2GB bytes array limit in DatasetLargeResultCollectingSuite #38704

Uh oh!

Conversation

LuciferYang commented Nov 18, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

LuciferYang commented Nov 18, 2022

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

mridulm Nov 18, 2022

Choose a reason for hiding this comment

Uh oh!

liuzqt Nov 18, 2022

Choose a reason for hiding this comment

Uh oh!

LuciferYang Nov 21, 2022

Choose a reason for hiding this comment

Uh oh!

liuzqt Nov 21, 2022

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Nov 22, 2022

Uh oh!

HyukjinKwon commented Nov 22, 2022

Uh oh!

LuciferYang commented Nov 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingSuite` #38704

[SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollectingSuite` #38704