-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-51281][SQL] DataFrameWriterV2 should respect the path option #50040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| properties = properties.toMap, | ||
| provider = provider, | ||
| optionExpression = OptionList(Seq.empty), | ||
| location = if (ignorePathOption) None else CaseInsensitiveMap(options.toMap).get("path"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual change.
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala
Outdated
Show resolved
Hide resolved
|
@dongjoon-hyun shall we include it in 3.5.5? Also cc @aokolnychyi |
| .internal() | ||
| .doc("When set to true, DataFrameWriterV2 ignores the 'path' option and always write data " + | ||
| "to the default table location.") | ||
| .version("4.0.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for 3.5.5. BTW, @cloud-fan , if we want to include this, we need to change this to 3.5.5.
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
| withSQLConf(SQLConf.LEGACY_DF_WRITER_V2_IGNORE_PATH_OPTION.key -> ignorePath.toString) { | ||
| withTable("t1", "t2") { | ||
| spark.range(10).writeTo("t1").using("json").create() | ||
| checkAnswer(spark.table("t1"), spark.range(10).toDF()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. spark.range(10).toDF() seems to be repeated three times. Shall we define one to reuse during this unit test?
| buildConf("spark.sql.legacy.dataFrameWriterV2IgnorePathOption") | ||
| .internal() | ||
| .doc("When set to true, DataFrameWriterV2 ignores the 'path' option and always write data " + | ||
| "to the default table location.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we fix the bug? It means users might already use the 'path' option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it. The bug exists about 3 years. So let the bug as a legacy behavior looks better.
| .internal() | ||
| .doc("When set to true, DataFrameWriterV2 ignores the 'path' option and always write data " + | ||
| "to the default table location.") | ||
| .version("3.5.5") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to 3.5.6 after the 3.5.5 RC1 vote passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to 3.5.6 after the 3.5.5 RC1 vote passes.
Ya, let's wait and see the vote result because it's still open.
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala
Show resolved
Hide resolved
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice find
| parameters = Map("methodName" -> "`writeTo`")) | ||
| } | ||
|
|
||
| test("SPARK-51281: create/replace file source tables") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: are we testing replace here? Why not just the name of the jira?
|
BTW, do we have documentations on DataFrameWriterV2 API? I don't find any from searching on the Spark website. |
|
We only have the API doc: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameWriterV2.html |
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
|
thanks for the review, merging to master/4.0/3.5! |
### What changes were proposed in this pull request? Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact. This PR fixes it, and adds a legacy config to restore the old behavior. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables. ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #50040 from cloud-fan/prop. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a3671e5) Signed-off-by: Wenchen Fan <[email protected]>
Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact. This PR fixes it, and adds a legacy config to restore the old behavior. bug fix Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables. new test no Closes #50040 from cloud-fan/prop. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a3671e5) Signed-off-by: Wenchen Fan <[email protected]>
|
Hi, https://github.com/apache/spark/actions/runs/13567662565/job/37924455859 The doc build CI for 3.5 seems broken after this PR merged |
|
hmm how can it be? This PR doesn't touch any doc... |
|
@cloud-fan . It's a |
|
I verified that the reverting recovers Let me revert this from |
|
Now, this is reverted from branch-3.5 via c8c0f1f due to the CI failure. Please make a backporting PR to |
Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact. This PR fixes it, and adds a legacy config to restore the old behavior. bug fix Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables. new test no Closes apache#50040 from cloud-fan/prop. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a3671e5) Signed-off-by: Wenchen Fan <[email protected]>
backport #50040 to 3.5 ### What changes were proposed in this pull request? Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact. This PR fixes it, and adds a legacy config to restore the old behavior. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables. ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes #50179 from cloud-fan/backport. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…ath option (apache#705) * [SPARK-51281][SQL] DataFrameWriterV2 should respect the path option Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact. This PR fixes it, and adds a legacy config to restore the old behavior. bug fix Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables. new test no Closes apache#50040 from cloud-fan/prop. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a3671e5) Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? Unlike `DataFrameWriter.saveAsTable` where we explicitly get the "path" option and treat it as table location, `DataFrameWriterV2` doesn't do it and treats the "path" option as a normal option which doesn't have any real impact. This PR fixes it, and adds a legacy config to restore the old behavior. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? Yes, now `DataFrameWriterV2` can correctly write data to the specified path for file source tables. ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#50040 from cloud-fan/prop. Lead-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit b23ae55) Signed-off-by: Wenchen Fan <[email protected]>

What changes were proposed in this pull request?
Unlike
DataFrameWriter.saveAsTablewhere we explicitly get the "path" option and treat it as table location,DataFrameWriterV2doesn't do it and treats the "path" option as a normal option which doesn't have any real impact.This PR fixes it, and adds a legacy config to restore the old behavior.
Why are the changes needed?
bug fix
Does this PR introduce any user-facing change?
Yes, now
DataFrameWriterV2can correctly write data to the specified path for file source tables.How was this patch tested?
new test
Was this patch authored or co-authored using generative AI tooling?
no