Skip to content

Conversation

@pgaref
Copy link
Contributor

@pgaref pgaref commented Oct 2, 2020

This reverts commit c87e60d

Change-Id: Ie8b783e0b1e0e32d9a54f6663e9aae5dd0a0f94f

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

…licy for Random IO formats"

Change-Id: Icae85d2f6ece7ad3e18ab55ddca098684382173f
@abstractdog abstractdog merged commit e716463 into apache:master Oct 6, 2020
@steveloughran
Copy link
Contributor

why the revert?

@pgaref
Copy link
Contributor Author

pgaref commented Oct 7, 2020

Hey @steveloughran --- the approach of the above patch was a bit off, one problem was that the Fs objects were lazility initalized and could end up throwing exceptions when setting the option eagerly.
The most important issue was that the LLAP IO creates its own FS object (and the above where only used for output) so the option itself was not properly propagated.

A solution for all this could be the S3A openFileWithOptions call that adds file options for the open File call instead of on the FS (still needs to add support for fadvise though)
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L4828

Talking about this TODO:
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1136

@steveloughran
Copy link
Contributor

you actually want to use the openFile() API Call which is already in hadoop-3.3, but which is being tweaked (apache/hadoop#2168) to have a standard option for seek policy for all object stores, and the ability to pass in file length -which will let the S3A & ABFS connectors skip a head probe

  @Override
  public SeekableInputStream newStream() throws IOException {
    FutureDataInputStreamBuilder builder = fs.openFile(getPath())
      .opt("fs.s3a.experimental.input.fadvise", "random")  // supported in Hadoop 3.3.0
      .opt("fs.option.openfile.fadvise", "random");     // this will be the new standard one

    if (length > 0) {
      builder.opt("fs.option.openfile.length", length);    // also in the PR. Pass in the end of the split & s3a will be happy too
    }
    CompletableFuture<FSDataInputStream> streamF = builder.build();
    return HadoopStreams.wrap(FutureIOSupport.awaitFuture(streamF));
  }

That API is there today, if you want to try playing with it,.

@steveloughran
Copy link
Contributor

(that openFileWithOptions() is how the openFile() is implemented, it's not meant to be the API call which apps make. That's covered: here

szehon-ho pushed a commit to szehon-ho/hive that referenced this pull request Oct 14, 2020
…licy for Random IO formats" (apache#1547) (Panagiotis Garefalakis reviewed by Laszlo Bodor)

Change-Id: Icae85d2f6ece7ad3e18ab55ddca098684382173f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants