[SPARK-16126] [SQL] Better Error Message When using DataFrameReader without `path` #13837

gatorsmile · 2016-06-22T06:35:43Z

What changes were proposed in this pull request?

When users do not specify the path in DataFrameReader APIs, it can get a confusing error message. For example,

spark.read.json()

Error message:

Unable to infer schema for JSON at . It must be specified manually;

After the fix, the error message will be like:

'path' is not specified

Another major goal of this PR is to add test cases for the latest changes in #13727.

orc read APIs
illegal format name
save API - empty path or illegal path
load API - empty path
illegal compression
fixed a test case in the existing test case prevent all column partitioning

How was this patch tested?

Test cases are added.

SparkQA · 2016-06-22T07:53:43Z

Test build #61016 has finished for PR 13837 at commit a1ae724.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class OrcSourceSuite extends OrcSuite

SparkQA · 2016-06-22T16:18:41Z

Test build #61044 has started for PR 13837 at commit 635046a.

gatorsmile · 2016-06-22T22:24:57Z

Weird? How to stop this test case run?

gatorsmile · 2016-06-22T22:25:17Z

retest this please

SparkQA · 2016-06-23T00:33:10Z

Test build #61074 has finished for PR 13837 at commit 635046a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-06-23T00:58:00Z

cc @tdas @zsxwing Could you review this PR? It adds the test cases for #13727

Thanks!

tdas · 2016-06-23T22:09:05Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala

      val availableCodecs = shortParquetCompressionCodecNames.keys.map(_.toLowerCase)
      throw new IllegalArgumentException(s"Codec [$codecName] " +
-        s"is not available. Available codecs are ${availableCodecs.mkString(", ")}.")
+        s"is not available. Known codecs are ${availableCodecs.mkString(", ")}.")


why this change?

Just to make it consistent with the output of the other cases. See the code:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CompressionCodecs.scala

Lines 49 to 51 in d6dc12e

case e: ClassNotFoundException =>

throw new IllegalArgumentException(s"Codec [$codecName] " +

s"is not available. Known codecs are ${shortCompressionCodecNames.keys.mkString(", ")}.")

Available was intentionally used because Parquet only supports snappy, gzip or lzo whereas Known was used for text-based ones (Please see #10805 (comment)) as they support compression codecs including other codecs but that lists the known ones.

SparkQA · 2016-11-12T08:11:01Z

Test build #68556 has finished for PR 13837 at commit 635046a.

This patch fails MiMa tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2016-11-12T20:40:27Z

Test build #68567 has finished for PR 13837 at commit b6bdf92.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-13T07:14:50Z

Test build #68579 has finished for PR 13837 at commit 4511037.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-11-13T22:27:55Z

R/pkg/inst/tests/testthat/test_sparkSQL.R

  expect_error(read.df(source = "json"),
-               paste("Error in loadDF : analysis error - Unable to infer schema for JSON at .",
-                     "It must be specified manually"))
+               paste("Error in loadDF : illegal argument - 'path' is not specified"))


I recall this test is intentionally testing without path argument?
cc @HyukjinKwon

Thanks for cc'ing me. Yes, I did. It seems the changes are reasonable as it seems this checking applies to the data sources that need path.

HyukjinKwon · 2016-11-14T03:18:56Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala

          val equality = sparkSession.sessionState.conf.resolver
          StructType(schema.filterNot(f => partitionColumns.exists(equality(_, f.name))))
        }.orElse {
+          if (allPaths.isEmpty && !format.isInstanceOf[TextFileFormat]) {


Hi @gatorsmile, would this be better if we explain here text data source is excluded because text datasource always uses a schema consisting of a string field if the schema is not explicitly given?

BTW, should we maybe change text.TextFileFormat to TextFileFormat https://github.com/gatorsmile/spark/blob/45110370fb1889f244a6750ef2a18dbc9f1ba9c2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L139 ?

felixcheung · 2017-03-11T04:45:56Z

hi - where are we on this one?

HyukjinKwon · 2017-05-11T12:47:48Z

(gentle ping)

## What changes were proposed in this pull request? This PR proposes to close PRs ... - inactive to the review comments more than a month - WIP and inactive more than a month - with Jenkins build failure but inactive more than a month - suggested to be closed and no comment against that - obviously looking inappropriate (e.g., Branch 0.5) To make sure, I left a comment for each PR about a week ago and I could not have a response back from the author in these PRs below: Closes apache#11129 Closes apache#12085 Closes apache#12162 Closes apache#12419 Closes apache#12420 Closes apache#12491 Closes apache#13762 Closes apache#13837 Closes apache#13851 Closes apache#13881 Closes apache#13891 Closes apache#13959 Closes apache#14091 Closes apache#14481 Closes apache#14547 Closes apache#14557 Closes apache#14686 Closes apache#15594 Closes apache#15652 Closes apache#15850 Closes apache#15914 Closes apache#15918 Closes apache#16285 Closes apache#16389 Closes apache#16652 Closes apache#16743 Closes apache#16893 Closes apache#16975 Closes apache#17001 Closes apache#17088 Closes apache#17119 Closes apache#17272 Closes apache#17971 Added: Closes apache#17778 Closes apache#17303 Closes apache#17872 ## How was this patch tested? N/A Author: hyukjinkwon <[email protected]> Closes apache#18017 from HyukjinKwon/close-inactive-prs.

gatorsmile added 6 commits June 17, 2016 11:24

test cases

8d021e4

add test cases.

5e4a3c6

fix and test cases

2643715

Merge remote-tracking branch 'upstream/master' into dfWriterAudit

cfc0188

more test case

3007fe6

fix test case

a1ae724

gatorsmile changed the title ~~[SPARK-16126] [SQL] Better Message When using DataFrameReader without path~~ [SPARK-16126] [SQL] Better Error Message When using DataFrameReader without path Jun 22, 2016

fix test case

635046a

tdas reviewed Jun 23, 2016
View reviewed changes

gatorsmile closed this Aug 22, 2016

gatorsmile reopened this Nov 12, 2016

gatorsmile added 2 commits November 12, 2016 10:32

Merge remote-tracking branch 'upstream/master' into dfWriterAudit

6bf0779

fix nit

b6bdf92

fix test cases

4511037

felixcheung reviewed Nov 13, 2016

View reviewed changes

HyukjinKwon reviewed Nov 14, 2016

View reviewed changes

HyukjinKwon mentioned this pull request May 17, 2017

[INFRA] Close stale PRs #18017

Closed

asfgit closed this in 5d2750a May 18, 2017

	case e: ClassNotFoundException =>
	throw new IllegalArgumentException(s"Codec [$codecName] " +
	s"is not available. Known codecs are ${shortCompressionCodecNames.keys.mkString(", ")}.")

[SPARK-16126] [SQL] Better Error Message When using DataFrameReader without path #13837

[SPARK-16126] [SQL] Better Error Message When using DataFrameReader without path #13837

Uh oh!

Conversation

gatorsmile commented Jun 22, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jun 22, 2016

Uh oh!

SparkQA commented Jun 22, 2016

Uh oh!

gatorsmile commented Jun 22, 2016

Uh oh!

gatorsmile commented Jun 22, 2016

Uh oh!

SparkQA commented Jun 23, 2016

Uh oh!

gatorsmile commented Jun 23, 2016

Uh oh!

tdas Jun 23, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 24, 2016

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 12, 2016

Uh oh!

SparkQA commented Nov 12, 2016

Uh oh!

SparkQA commented Nov 13, 2016

Uh oh!

felixcheung Nov 13, 2016

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixcheung commented Mar 11, 2017

Uh oh!

HyukjinKwon commented May 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-16126] [SQL] Better Error Message When using DataFrameReader without `path` #13837

[SPARK-16126] [SQL] Better Error Message When using DataFrameReader without `path` #13837

HyukjinKwon Nov 14, 2016 •

edited

Loading

HyukjinKwon Nov 14, 2016 •

edited

Loading

HyukjinKwon Nov 14, 2016 •

edited

Loading