Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 11, 2019

What changes were proposed in this pull request?

This PR aims to fix the behavior of mode("default") to set SaveMode.ErrorIfExists. Also, this PR updates the exception message by adding default explicitly.

Why are the changes needed?

This is reported during GRAPH API PR. This builder pattern should work like the documentation.

Does this PR introduce any user-facing change?

Yes if the app has multiple mode() invocation including mode("default") and the mode("default") is the last invocation. This is really a corner case.

  • Previously, the last invocation was handled as No-Op.
  • After this bug fix, it will work like the documentation.

How was this patch tested?

Pass the Jenkins with the newly added test case.

@MaxGekk
Copy link
Member

MaxGekk commented Oct 11, 2019

I am looking at DataStreamWriter, is this "default" mode specific to DataFrameWriter? DataStreamWriter uses

def apply(outputMode: String): OutputMode = {
outputMode.toLowerCase(Locale.ROOT) match {
case "append" =>
OutputMode.Append
case "complete" =>
OutputMode.Complete
case "update" =>
OutputMode.Update
case _ =>
throw new IllegalArgumentException(s"Unknown output mode $outputMode. " +
"Accepted output modes are 'append', 'complete', 'update'")
}
}
to convert string modes. Can we move the logic of default mode there, and reuse it from DataFrameWriter?

@MaxGekk
Copy link
Member

MaxGekk commented Oct 11, 2019

This list does not include default:

* `append`: Append contents of this :class:`DataFrame` to existing data.
* `overwrite`: Overwrite existing data.
* `error` or `errorifexists`: Throw an exception if data already exists.
* `ignore`: Silently ignore this operation if data already exists.
. Could you update it as well.

@dongjoon-hyun
Copy link
Member Author

This sounds like different one. We need a different JIRA.

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/streaming/InternalOutputModes.scala

For spark/python/pyspark/sql/readwriter.py, I'll updated the doc.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29442][SQL] Set default mode should override the existing mode [SPARK-29442][SQL][PYSPARK] Set default mode should override the existing mode Oct 11, 2019
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Oct 13, 2019

Test build #111980 has finished for PR 26094 at commit a4751f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29442][SQL][PYSPARK] Set default mode should override the existing mode [SPARK-29442][SQL] Set default mode should override the existing mode Oct 14, 2019
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Oct 14, 2019

Hi, All. Thank you for review.
Advertising the hidden default seems to be a different issue because we need to update Scala/Python/R together. I removed the comment changes. Now, this PR only has a code fix (including exception message fix). If needed, I also can revert the exception message change, too.

@SparkQA
Copy link

SparkQA commented Oct 14, 2019

Test build #112043 has finished for PR 26094 at commit b752271.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Oct 14, 2019

Thank you for review, @srowen , @MaxGekk , @viirya .
Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-29442 branch October 14, 2019 20:15
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Oct 14, 2019

cc, @cloud-fan and @brkyvz .
The original change was introduced at #24233 by my suggestion because DSv1 and DSv2 have different default behaviors at that time. However, recently #25876 recovers the default behavior design in order to simplify and use SaveMode.ErrorIfExists by default. Please let me know if there is some conflicts with DSv2 development plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants