Skip to content

Comments

[SPARK-30554][SQL] Return Iterable from FailureSafeParser.rawParser#27264

Closed
MaxGekk wants to merge 8 commits intoapache:masterfrom
MaxGekk:failuresafe-parser-seq
Closed

[SPARK-30554][SQL] Return Iterable from FailureSafeParser.rawParser#27264
MaxGekk wants to merge 8 commits intoapache:masterfrom
MaxGekk:failuresafe-parser-seq

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jan 17, 2020

What changes were proposed in this pull request?

Changed signature of rawParser passed to FailureSafeParser. I propose to change return type from Seq to Iterable. I took Iterable to easier port the changes on Scala collections 2.13. Also, I replaced Seq by Option in CSV datasource - UnivocityParser, and in JSON parser exception one place in the case when specified schema is StructType, and JSON input is an array.

Why are the changes needed?

Seq is unnecessary requirement for return type from rawParser which may not have multiple rows per input like CSV datasource.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By existing test suites JsonSuite, UnivocityParserSuite, JsonFunctionsSuite, JsonExpressionsSuite, CsvSuite, and CsvFunctionsSuite.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 17, 2020

@HyukjinKwon @srowen Please, review this PR.

@srowen
Copy link
Member

srowen commented Jan 17, 2020

Seems OK; is it just some code polish or does it make an impact on performance?

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 17, 2020

@srowen Main goal is code polish. Potentially, it could improve performance slightly because Seq builder is slower than Option.

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116956 has finished for PR 27264 at commit 1f7d9fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116964 has finished for PR 27264 at commit 04d8194.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…-parser-seq

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 19, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jan 19, 2020

Test build #117002 has finished for PR 27264 at commit cf5ab4e.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 19, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jan 19, 2020

Test build #117011 has finished for PR 27264 at commit cf5ab4e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 19, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Jan 19, 2020

Test build #117014 has finished for PR 27264 at commit cf5ab4e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 19, 2020

Test build #117015 has finished for PR 27264 at commit cf5ab4e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Thanks @MaxGekk for addressing my comment.

Merged to master.

@MaxGekk MaxGekk deleted the failuresafe-parser-seq branch June 5, 2020 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants