[SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting it from uri scheme #16815

uncleGen · 2017-02-06T01:39:47Z

What changes were proposed in this pull request?

Caused by: java.lang.IllegalArgumentException: Wrong FS: s3a://**************/checkpoint/7b2231a3-d845-4740-bfa3-681850e5987f/metadata, expected: file:///
	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
	at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
	at org.apache.spark.sql.execution.streaming.StreamMetadata$.read(StreamMetadata.scala:51)
	at org.apache.spark.sql.execution.streaming.StreamExecution.<init>(StreamExecution.scala:100)
	at org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:232)
	at org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:269)
	at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:262)

Can easily replicate on spark standalone cluster by providing checkpoint location uri scheme anything other than "file://" and not overriding in config.

WorkAround --conf spark.hadoop.fs.defaultFS=s3a://somebucket or set it in sparkConf or spark-default.conf

How was this patch tested?

existing ut

SparkQA · 2017-02-06T04:02:12Z

Test build #72420 has finished for PR 16815 at commit c8c3e4b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

uncleGen · 2017-02-06T06:52:03Z

cc @srowen @zsxwing

srowen · 2017-02-06T10:31:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala

  /** Read the metadata from file if it exists */
  def read(metadataFile: Path, hadoopConf: Configuration): Option[StreamMetadata] = {
-    val fs = FileSystem.get(hadoopConf)
+    val fs = FileSystem.get(metadataFile.toUri, hadoopConf)


I think this should be metadataFile.getFileSystem(hadoopConf)

uncleGen · 2017-02-06T10:56:30Z

retest this please.

SparkQA · 2017-02-06T14:31:31Z

Test build #72436 has finished for PR 16815 at commit 754f705.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-06T14:31:35Z

Test build #3558 has finished for PR 16815 at commit 754f705.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-06T14:34:31Z

Test build #72435 has finished for PR 16815 at commit 754f705.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2017-02-06T16:18:17Z

LGTM, though checkpointing to S3 has its own separate issues related to rename performance and listing inconsistency. While this fix lets people request different filesystems for the data, checkpointing is still at risk of not working on s3 or swift, probably not GCS. It will work on Azure though.

felixcheung · 2017-02-07T03:40:17Z

yea! - I found this earlier but forgot to track it.

zsxwing · 2017-02-07T05:02:08Z

LGTM. Merging to master and 2.1.

… it from uri scheme ## What changes were proposed in this pull request? ``` Caused by: java.lang.IllegalArgumentException: Wrong FS: s3a://**************/checkpoint/7b2231a3-d845-4740-bfa3-681850e5987f/metadata, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82) at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426) at org.apache.spark.sql.execution.streaming.StreamMetadata$.read(StreamMetadata.scala:51) at org.apache.spark.sql.execution.streaming.StreamExecution.<init>(StreamExecution.scala:100) at org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:232) at org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:269) at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:262) ``` Can easily replicate on spark standalone cluster by providing checkpoint location uri scheme anything other than "file://" and not overriding in config. WorkAround --conf spark.hadoop.fs.defaultFS=s3a://somebucket or set it in sparkConf or spark-default.conf ## How was this patch tested? existing ut Author: uncleGen <hustyugm@gmail.com> Closes #16815 from uncleGen/SPARK-19407. (cherry picked from commit 7a0a630) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

… it from uri scheme ## What changes were proposed in this pull request? ``` Caused by: java.lang.IllegalArgumentException: Wrong FS: s3a://**************/checkpoint/7b2231a3-d845-4740-bfa3-681850e5987f/metadata, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82) at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426) at org.apache.spark.sql.execution.streaming.StreamMetadata$.read(StreamMetadata.scala:51) at org.apache.spark.sql.execution.streaming.StreamExecution.<init>(StreamExecution.scala:100) at org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:232) at org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:269) at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:262) ``` Can easily replicate on spark standalone cluster by providing checkpoint location uri scheme anything other than "file://" and not overriding in config. WorkAround --conf spark.hadoop.fs.defaultFS=s3a://somebucket or set it in sparkConf or spark-default.conf ## How was this patch tested? existing ut Author: uncleGen <hustyugm@gmail.com> Closes apache#16815 from uncleGen/SPARK-19407.

defaultFS is used FileSystem.get instead of getting it from uri scheme

c8c3e4b

srowen reviewed Feb 6, 2017

View reviewed changes

update

754f705

asfgit closed this in 7a0a630 Feb 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting it from uri scheme #16815

[SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting it from uri scheme #16815

Uh oh!

uncleGen commented Feb 6, 2017 •

edited

Loading

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

uncleGen commented Feb 6, 2017

Uh oh!

srowen Feb 6, 2017

Uh oh!

uncleGen commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

steveloughran commented Feb 6, 2017

Uh oh!

felixcheung commented Feb 7, 2017

Uh oh!

zsxwing commented Feb 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting it from uri scheme #16815

[SPARK-19407][SS] defaultFS is used FileSystem.get instead of getting it from uri scheme #16815

Uh oh!

Conversation

uncleGen commented Feb 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

uncleGen commented Feb 6, 2017

Uh oh!

srowen Feb 6, 2017

Choose a reason for hiding this comment

Uh oh!

uncleGen commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

steveloughran commented Feb 6, 2017

Uh oh!

felixcheung commented Feb 7, 2017

Uh oh!

zsxwing commented Feb 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

uncleGen commented Feb 6, 2017 •

edited

Loading