Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@ public enum StorageSchemes {
S3A("s3a", false), S3("s3", false),
// Google Cloud Storage
GCS("gs", false),
// Azure WASB
WASB("wasb", false), WASBS("wasbs", false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how Spark manages append on WASB.
Hadoop implementation of WASB is designed for single thread append:
https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Append_API_Support_and_Configuration
https://issues.apache.org/jira/browse/HADOOP-12635

Copy link
Contributor

@n3nash n3nash Nov 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, HDFS append implementation guarantees single writer by having a lease on the file, looks like Azure implementation doesn't provide that guarantee..In Hudi, we have a single writer model for append usages but Spark dag retries complicates this and introduces multi-writer appends

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3nash fair.. can we create a JIRA for tracking this ? Seems like something we should play with and understand.. I assume you have reading material on the append semantics..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Azure ADLS
ADL("adl", false),
// Azure ADLS Gen2
ABFS("abfs", false), ABFSS("abfss", false),
// View FS for federated setups. If federating across cloud stores, then append support is false
VIEWFS("viewfs", true);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@ public void testStorageSchemes() {
assertFalse(StorageSchemes.isSchemeSupported("s2"));
assertFalse(StorageSchemes.isAppendSupported("s3a"));
assertFalse(StorageSchemes.isAppendSupported("gs"));
assertFalse(StorageSchemes.isAppendSupported("wasb"));
assertFalse(StorageSchemes.isAppendSupported("adl"));
assertFalse(StorageSchemes.isAppendSupported("abfs"));
assertTrue(StorageSchemes.isAppendSupported("viewfs"));
try {
StorageSchemes.isAppendSupported("s2");
Expand Down