Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

When users do not specify the path in DataFrameReader APIs, it can get a confusing error message. For example,

spark.read.json()

Error message:

Unable to infer schema for JSON at . It must be specified manually;

After the fix, the error message will be like:

'path' is not specified

Another major goal of this PR is to add test cases for the latest changes in #13727.

  • orc read APIs
  • illegal format name
  • save API - empty path or illegal path
  • load API - empty path
  • illegal compression
  • fixed a test case in the existing test case prevent all column partitioning

How was this patch tested?

Test cases are added.

@gatorsmile gatorsmile changed the title [SPARK-16126] [SQL] Better Message When using DataFrameReader without path [SPARK-16126] [SQL] Better Error Message When using DataFrameReader without path Jun 22, 2016
@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61016 has finished for PR 13837 at commit a1ae724.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OrcSourceSuite extends OrcSuite

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61044 has started for PR 13837 at commit 635046a.

@gatorsmile
Copy link
Member Author

Weird? How to stop this test case run?

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 23, 2016

Test build #61074 has finished for PR 13837 at commit 635046a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @tdas @zsxwing Could you review this PR? It adds the test cases for #13727

Thanks!

val availableCodecs = shortParquetCompressionCodecNames.keys.map(_.toLowerCase)
throw new IllegalArgumentException(s"Codec [$codecName] " +
s"is not available. Available codecs are ${availableCodecs.mkString(", ")}.")
s"is not available. Known codecs are ${availableCodecs.mkString(", ")}.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it consistent with the output of the other cases. See the code:

case e: ClassNotFoundException =>
throw new IllegalArgumentException(s"Codec [$codecName] " +
s"is not available. Known codecs are ${shortCompressionCodecNames.keys.mkString(", ")}.")

Copy link
Member

@HyukjinKwon HyukjinKwon Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Available was intentionally used because Parquet only supports snappy, gzip or lzo whereas Known was used for text-based ones (Please see #10805 (comment)) as they support compression codecs including other codecs but that lists the known ones.

@gatorsmile gatorsmile closed this Aug 22, 2016
@gatorsmile gatorsmile reopened this Nov 12, 2016
@SparkQA
Copy link

SparkQA commented Nov 12, 2016

Test build #68556 has finished for PR 13837 at commit 635046a.

  • This patch fails MiMa tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2016

Test build #68567 has finished for PR 13837 at commit b6bdf92.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 13, 2016

Test build #68579 has finished for PR 13837 at commit 4511037.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

expect_error(read.df(source = "json"),
paste("Error in loadDF : analysis error - Unable to infer schema for JSON at .",
"It must be specified manually"))
paste("Error in loadDF : illegal argument - 'path' is not specified"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall this test is intentionally testing without path argument?
cc @HyukjinKwon

Copy link
Member

@HyukjinKwon HyukjinKwon Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cc'ing me. Yes, I did. It seems the changes are reasonable as it seems this checking applies to the data sources that need path.

val equality = sparkSession.sessionState.conf.resolver
StructType(schema.filterNot(f => partitionColumns.exists(equality(_, f.name))))
}.orElse {
if (allPaths.isEmpty && !format.isInstanceOf[TextFileFormat]) {
Copy link
Member

@HyukjinKwon HyukjinKwon Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gatorsmile, would this be better if we explain here text data source is excluded because text datasource always uses a schema consisting of a string field if the schema is not explicitly given?

BTW, should we maybe change text.TextFileFormat to TextFileFormat https://github.com/gatorsmile/spark/blob/45110370fb1889f244a6750ef2a18dbc9f1ba9c2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L139 ?

@felixcheung
Copy link
Member

hi - where are we on this one?

@HyukjinKwon
Copy link
Member

(gentle ping)

@asfgit asfgit closed this in 5d2750a May 18, 2017
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
## What changes were proposed in this pull request?

This PR proposes to close PRs ...

  - inactive to the review comments more than a month
  - WIP and inactive more than a month
  - with Jenkins build failure but inactive more than a month
  - suggested to be closed and no comment against that
  - obviously looking inappropriate (e.g., Branch 0.5)

To make sure, I left a comment for each PR about a week ago and I could not have a response back from the author in these PRs below:

Closes apache#11129
Closes apache#12085
Closes apache#12162
Closes apache#12419
Closes apache#12420
Closes apache#12491
Closes apache#13762
Closes apache#13837
Closes apache#13851
Closes apache#13881
Closes apache#13891
Closes apache#13959
Closes apache#14091
Closes apache#14481
Closes apache#14547
Closes apache#14557
Closes apache#14686
Closes apache#15594
Closes apache#15652
Closes apache#15850
Closes apache#15914
Closes apache#15918
Closes apache#16285
Closes apache#16389
Closes apache#16652
Closes apache#16743
Closes apache#16893
Closes apache#16975
Closes apache#17001
Closes apache#17088
Closes apache#17119
Closes apache#17272
Closes apache#17971

Added:
Closes apache#17778
Closes apache#17303
Closes apache#17872

## How was this patch tested?

N/A

Author: hyukjinkwon <[email protected]>

Closes apache#18017 from HyukjinKwon/close-inactive-prs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants