Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Unlike DataFrameWriter.saveAsTable where we explicitly get the "path" option and treat it as table location, DataFrameWriterV2 doesn't do it and treats the "path" option as a normal option which doesn't have any real impact.

This PR fixes it, and adds a legacy config to restore the old behavior.

Why are the changes needed?

bug fix

Does this PR introduce any user-facing change?

Yes, now DataFrameWriterV2 can correctly write data to the specified path for file source tables.

How was this patch tested?

new test

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Feb 21, 2025
properties = properties.toMap,
provider = provider,
optionExpression = OptionList(Seq.empty),
location = if (ignorePathOption) None else CaseInsensitiveMap(options.toMap).get("path"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual change.

@cloud-fan
Copy link
Contributor Author

@dongjoon-hyun shall we include it in 3.5.5? Also cc @aokolnychyi

.internal()
.doc("When set to true, DataFrameWriterV2 ignores the 'path' option and always write data " +
"to the default table location.")
.version("4.0.0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for 3.5.5. BTW, @cloud-fan , if we want to include this, we need to change this to 3.5.5.

withSQLConf(SQLConf.LEGACY_DF_WRITER_V2_IGNORE_PATH_OPTION.key -> ignorePath.toString) {
withTable("t1", "t2") {
spark.range(10).writeTo("t1").using("json").create()
checkAnswer(spark.table("t1"), spark.range(10).toDF())
Copy link
Member

@dongjoon-hyun dongjoon-hyun Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. spark.range(10).toDF() seems to be repeated three times. Shall we define one to reuse during this unit test?

buildConf("spark.sql.legacy.dataFrameWriterV2IgnorePathOption")
.internal()
.doc("When set to true, DataFrameWriterV2 ignores the 'path' option and always write data " +
"to the default table location.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we fix the bug? It means users might already use the 'path' option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it. The bug exists about 3 years. So let the bug as a legacy behavior looks better.

.internal()
.doc("When set to true, DataFrameWriterV2 ignores the 'path' option and always write data " +
"to the default table location.")
.version("3.5.5")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to 3.5.6 after the 3.5.5 RC1 vote passes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to 3.5.6 after the 3.5.5 RC1 vote passes.

Ya, let's wait and see the vote result because it's still open.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find

parameters = Map("methodName" -> "`writeTo`"))
}

test("SPARK-51281: create/replace file source tables") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: are we testing replace here? Why not just the name of the jira?

@manuzhang
Copy link
Member

BTW, do we have documentations on DataFrameWriterV2 API? I don't find any from searching on the Spark website.

@cloud-fan
Copy link
Contributor Author

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master/4.0/3.5!

@cloud-fan cloud-fan closed this in a3671e5 Feb 27, 2025
cloud-fan added a commit that referenced this pull request Feb 27, 2025
### What changes were proposed in this pull request?

Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact.

This PR fixes it, and adds a legacy config to restore the old behavior.

### Why are the changes needed?

bug fix

### Does this PR introduce _any_ user-facing change?

Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables.

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #50040 from cloud-fan/prop.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a3671e5)
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan added a commit that referenced this pull request Feb 27, 2025
Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact.

This PR fixes it, and adds a legacy config to restore the old behavior.

bug fix

Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables.

new test

no

Closes #50040 from cloud-fan/prop.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a3671e5)
Signed-off-by: Wenchen Fan <[email protected]>
@yaooqinn
Copy link
Member

yaooqinn commented Mar 5, 2025

Hi, https://github.com/apache/spark/actions/runs/13567662565/job/37924455859

The doc build CI for 3.5 seems broken after this PR merged

@cloud-fan
Copy link
Contributor Author

hmm how can it be? This PR doesn't touch any doc...

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 6, 2025

@cloud-fan . It's a unidoc failure and it happens on all commit builder and PR builder (on branch-3.5) after this PR.

[info] Main Scala API documentation to /__w/spark/spark/target/scala-2.12/unidoc...
[info] Main Java API documentation to /__w/spark/spark/target/javaunidoc...
[error] /__w/spark/spark/common/utils/target/java/org/apache/spark/ErrorInfo.java:11:1:  error: illegal combination of modifiers: abstract and static
...
[warn] javadoc exited with exit code 1
...
[error] (Javaunidoc / doc) sbt.inc.Doc$JavadocGenerationFailed
[error] Total time: 404 s (06:44), completed Mar 5, 2025 6:07:14 PM

@dongjoon-hyun
Copy link
Member

I verified that the reverting recovers unidoc.

$ git log --oneline -n1
c8c0f1feb0f (HEAD -> branch-3.5) Revert "[SPARK-51281][SQL] DataFrameWriterV2 should respect the path option"

$ ./build/sbt unidoc
...
[info] Main Scala API documentation successful.
[success] Total time: 242 s (04:02), completed Mar 5, 2025 6:12:13 PM

Let me revert this from branch-3.5 to recover branch-3.5 CI and unblock other PRs.

@dongjoon-hyun
Copy link
Member

Now, this is reverted from branch-3.5 via c8c0f1f due to the CI failure.

Please make a backporting PR to branch-3.5 once more, @cloud-fan .

@dongjoon-hyun
Copy link
Member

For the record, branch-3.5 is recovered.

Screenshot 2025-03-05 at 20 15 03

cloud-fan added a commit to cloud-fan/spark that referenced this pull request Mar 6, 2025
Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact.

This PR fixes it, and adds a legacy config to restore the old behavior.

bug fix

Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables.

new test

no

Closes apache#50040 from cloud-fan/prop.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a3671e5)
Signed-off-by: Wenchen Fan <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Mar 15, 2025
backport #50040 to 3.5

### What changes were proposed in this pull request?

Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact.

This PR fixes it, and adds a legacy config to restore the old behavior.

### Why are the changes needed?

bug fix

### Does this PR introduce _any_ user-facing change?

Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables.

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #50179 from cloud-fan/backport.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…ath option (apache#705)

* [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option

Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact.

This PR fixes it, and adds a legacy config to restore the old behavior.

bug fix

Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables.

new test

no

Closes apache#50040 from cloud-fan/prop.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit a3671e5)
Signed-off-by: Wenchen Fan <[email protected]>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
### What changes were proposed in this pull request?

Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact.

This PR fixes it, and adds a legacy config to restore the old behavior.

### Why are the changes needed?

bug fix

### Does this PR introduce _any_ user-facing change?

Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables.

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#50040 from cloud-fan/prop.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit b23ae55)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants