Skip to content

[SPARK-27197][SQL][TEST] Add ReadNestedSchemaTest for file-based data sources#24139

Closed
dongjoon-hyun wants to merge 2 commits intoapache:masterfrom
dongjoon-hyun:SPARK-27197
Closed

[SPARK-27197][SQL][TEST] Add ReadNestedSchemaTest for file-based data sources#24139
dongjoon-hyun wants to merge 2 commits intoapache:masterfrom
dongjoon-hyun:SPARK-27197

Conversation

@dongjoon-hyun
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

The reader schema is said to be evolved (or projected) when it changed after the data is written by writers. Apache Spark file-based data sources have a test coverage for that; e.g. ReadSchemaSuite.scala. This PR aims to add a test coverage for nested columns by adding and hiding nested columns.

How was this patch tested?

Pass the Jenkins with newly added tests.

@dongjoon-hyun
Copy link
Copy Markdown
Member Author

cc @cloud-fan , @gatorsmile , @maropu , @dbtsai , @viirya , @HyukjinKwon .

@dbtsai
Copy link
Copy Markdown
Member

dbtsai commented Mar 19, 2019

LGTM.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 19, 2019

Test build #103653 has finished for PR 24139 at commit f903978.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait AddNestedColumnTest extends ReadSchemaTest
  • trait HideNestedColumnTest extends ReadSchemaTest

@viirya
Copy link
Copy Markdown
Member

viirya commented Mar 19, 2019

retest this please.

@maropu
Copy link
Copy Markdown
Member

maropu commented Mar 19, 2019

It this pr related to #23964?

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 19, 2019

Test build #103661 has finished for PR 24139 at commit f903978.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait AddNestedColumnTest extends ReadSchemaTest
  • trait HideNestedColumnTest extends ReadSchemaTest

@dongjoon-hyun
Copy link
Copy Markdown
Member Author

dongjoon-hyun commented Mar 19, 2019

Yes. @maropu . There was a question about the other data sources (like Avro/JSON) works with nested column pruning. This PR verifies that explicitly.

@dongjoon-hyun
Copy link
Copy Markdown
Member Author

Thank you for review, @dbtsai , @viirya , @maropu . I'll update once more to address @maropu 's comment.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 19, 2019

Test build #103676 has started for PR 24139 at commit 4f148b9.

@dongjoon-hyun
Copy link
Copy Markdown
Member Author

Retest this please.

@dbtsai
Copy link
Copy Markdown
Member

dbtsai commented Mar 20, 2019

Thanks. Merged into master.

@dbtsai dbtsai closed this in 4d52477 Mar 20, 2019
@HyukjinKwon
Copy link
Copy Markdown
Member

HyukjinKwon commented Mar 20, 2019

@dbtsai, tests were running and the last comments addressed are quite code changes ...

@SparkQA
Copy link
Copy Markdown

SparkQA commented Mar 20, 2019

Test build #103695 has finished for PR 24139 at commit 4f148b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Copy Markdown
Member Author

It passes finally. Thank you all, @dbtsai , @viirya , @maropu , @HyukjinKwon .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-27197 branch March 20, 2019 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants