Skip to content

Conversation

@gengliangwang
Copy link
Member

What changes were proposed in this pull request?

File source V1 supports reading output of FileStreamSink as batch. #11897
We should support this in file source V2 as well. When reading with paths, we first check if there is metadata log of FileStreamSink. If yes, we use MetadataLogFileIndex for listing files; Otherwise, we use InMemoryFileIndex.

How was this patch tested?

Unit test

@gengliangwang
Copy link
Member Author

gengliangwang commented Jun 18, 2019

This PR also resolves several TODO testing items in #24830.

@SparkQA
Copy link

SparkQA commented Jun 18, 2019

Test build #106609 has finished for PR 24900 at commit f70fbe8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2019

Test build #106610 has finished for PR 24900 at commit e666203.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

@gengliangwang
Copy link
Member Author

} finally {
q.stop()
}
withTempDir { output =>
Copy link
Member Author

@gengliangwang gengliangwang Jun 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just removing withSQLConf

spark.read.load(outputDir.getCanonicalPath).as[Int]
}
assertMigrationError(e.getMessage, sparkMetadataDir, legacySparkMetadataDir)
val e = intercept[SparkException] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just removing withSQLConf and one TODO comment.

@SparkQA
Copy link

SparkQA commented Jun 18, 2019

Test build #106622 has finished for PR 24900 at commit e666203.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.set(SQLConf.USE_V1_SOURCE_READER_LIST, "csv,json,orc,text,parquet")
.set(SQLConf.USE_V1_SOURCE_WRITER_LIST, "csv,json,orc,text,parquet")

test("partitioned writing and batch reading") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you highlight the difference between this test case and the one in FileStreamSinkV2Suite?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For V1 suite, it uses Parquet V1, and it matches HadoopFsRelation / FileScanRDD to check if the plan is as expected. Also, partition pruning is tested.

For V2 suite, it uses Parquet V2, and it matches FileTable / BatchScanExec to check if the plan is as expected. As partition pruning is not supported in V2 yet, we can't test it.

I can abstract the code if the two cases look duplicated to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, do you mean changing the test case name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's abstract the code, this test case is too long to duplicate.

@sundarcse1216
Copy link

sundarcse1216 commented Jun 19, 2019 via email

@SparkQA
Copy link

SparkQA commented Jun 19, 2019

Test build #106658 has finished for PR 24900 at commit 1ed79cf.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 19, 2019

Test build #106678 has finished for PR 24900 at commit 1ed79cf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 19, 2019

Test build #106683 has finished for PR 24900 at commit 1ed79cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in f510761 Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants