Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1164,7 +1164,7 @@ deletion, preventing the stores' use as drop-in replacements for HDFS.

### `boolean rename(Path src, Path d)`

In terms of its specification, `rename()` is one of the most complex operations within a filesystem .
In terms of its specification, `rename()` is one of the most complex operations within a filesystem.

In terms of its implementation, it is the one with the most ambiguity regarding when to return false
versus raising an exception.
Expand All @@ -1187,7 +1187,6 @@ Source `src` must exist:

exists(FS, src) else raise FileNotFoundException


`dest` cannot be a descendant of `src`:

if isDescendant(FS, src, dest) : raise IOException
Expand Down Expand Up @@ -1283,6 +1282,15 @@ that the parent directories of the destination also exist.

exists(FS', parent(dest))

*S3A FileSystem*

The outcome is as a normal rename, with the additional (implicit) feature that
the parent directories of the destination then exist:
`exists(FS', parent(dest))`

There is a check for and rejection if the `parent(dest)` is a file, but
no checks for any other ancestors.

*Other Filesystems (including Swift) *

Other filesystems strictly reject the operation, raising a `FileNotFoundException`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1463,9 +1463,6 @@ public boolean rename(Path src, Path dst) throws IOException {
LOG.info("{}", e.getMessage());
LOG.debug("rename failure", e);
return e.getExitCode();
} catch (FileNotFoundException e) {
LOG.debug(e.toString());
return false;
}
}

Expand Down Expand Up @@ -1518,9 +1515,9 @@ private Pair<S3AFileStatus, S3AFileStatus> initiateRename(
// whether or not it can be the destination of the rename.
if (srcStatus.isDirectory()) {
if (dstStatus.isFile()) {
throw new RenameFailedException(src, dst,
"source is a directory and dest is a file")
.withExitCode(srcStatus.isFile());
throw new FileAlreadyExistsException(
"Failed to rename " + src + " to " + dst
+"; source is a directory and dest is a file");
} else if (dstStatus.isEmptyDirectory() != Tristate.TRUE) {
throw new RenameFailedException(src, dst,
"Destination is a non-empty directory")
Expand All @@ -1531,9 +1528,9 @@ private Pair<S3AFileStatus, S3AFileStatus> initiateRename(
// source is a file. The destination must be a directory,
// empty or not
if (dstStatus.isFile()) {
throw new RenameFailedException(src, dst,
"Cannot rename onto an existing file")
.withExitCode(false);
throw new FileAlreadyExistsException(
"Failed to rename " + src + " to " + dst
+ "; destination file exists");
}
}

Expand All @@ -1544,17 +1541,24 @@ private Pair<S3AFileStatus, S3AFileStatus> initiateRename(
if (!pathToKey(parent).isEmpty()
&& !parent.equals(src.getParent())) {
try {
// only look against S3 for directories; saves
// a HEAD request on all normal operations.
// make sure parent isn't a file.
// don't look for parent being a dir as there is a risk
// of a race between dest dir cleanup and rename in different
// threads.
S3AFileStatus dstParentStatus = innerGetFileStatus(parent,
false, StatusProbeEnum.DIRECTORIES);
false, StatusProbeEnum.FILE);
// if this doesn't raise an exception then it's one of
// raw S3: parent is a file: error
// guarded S3: parent is a file or a dir.
if (!dstParentStatus.isDirectory()) {
throw new RenameFailedException(src, dst,
"destination parent is not a directory");
}
} catch (FileNotFoundException e2) {
throw new RenameFailedException(src, dst,
"destination has no parent ");
} catch (FileNotFoundException expected) {
// nothing was found. Don't worry about it;
// expect rename to implicitly create the parent dir (raw S3)
// or the s3guard parents (guarded)

}
}
}
Expand Down Expand Up @@ -2761,7 +2765,8 @@ private void createFakeDirectoryIfNecessary(Path f)
* @throws IOException IO problem
*/
@Retries.RetryTranslated
void maybeCreateFakeParentDirectory(Path path)
@VisibleForTesting
protected void maybeCreateFakeParentDirectory(Path path)
throws IOException, AmazonClientException {
Path parent = path.getParent();
if (parent != null && !parent.isRoot()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1126,6 +1126,40 @@ We also recommend using applications/application
options which do not rename files when committing work or when copying data
to S3, but instead write directly to the final destination.

## Rename not behaving as "expected"

S3 is not a filesystem. The S3A connector mimics file and directory rename by

* HEAD then LIST of source path. The source MUST exist, else a `FileNotFoundException`
is raised.
* HEAD then LIST of the destination path.
This SHOULD NOT exist.
If it does and if the source is a directory, the destination MUST be an empty directory.
If the source is a file, the destination MAY be a directory, empty or not.
If the destination exists and relevant conditions are not met, a `FileAlreadyExistsException`
is raised.
* If the destination path does not exist, a HEAD request of the parent path
to verify that there is no object there.
Directory markers are not checked for, nor that the path has any children,
* File-by-file copy of source objects to destination.
Parallelized, with page listings of directory objects and issuing of DELETE requests.
* Post-delete recreation of source parent directory marker, if needed.

This is slow (`O(data)`) and can cause timeouts on code which is required
to send regular progress reports/heartbeats -for example, distCp.
It is _very unsafe_ if the calling code expects atomic renaming as part
of any commit algorithm.
This is why the [S3A Committers](committers.md) or similar are needed to safely
commit output.

There is also the risk of race conditions arising if many processes/threads
are working with the same directory tree
[HADOOP-16721](https://issues.apache.org/jira/browse/HADOOP-16721).

To reduce this risk, since Hadoop 3.3.1, the S3A connector no longer verifies the parent directory
of the destination of a rename is a directory -only that it is _not_ a file.
You can rename a directory or file deep under a file if you try -after which
there is no guarantee of the files being found in listings. Try not to do that.

## <a name="encryption"></a> S3 Server Side Encryption

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
import org.apache.hadoop.fs.s3a.Statistic;

import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
import static org.apache.hadoop.fs.contract.ContractTestUtils.skip;
import static org.apache.hadoop.fs.contract.ContractTestUtils.verifyFileContents;
import static org.apache.hadoop.fs.contract.ContractTestUtils.writeDataset;
import static org.apache.hadoop.fs.s3a.Constants.METADATASTORE_AUTHORITATIVE;
Expand Down Expand Up @@ -106,9 +107,7 @@ public void setup() throws Exception {

@Override
public void testRenameDirIntoExistingDir() throws Throwable {
describe("Verify renaming a dir into an existing dir puts the files"
+" from the source dir into the existing dir"
+" and leaves existing files alone");
describe("S3A rename into an existing directory returns false");
FileSystem fs = getFileSystem();
String sourceSubdir = "source";
Path srcDir = path(sourceSubdir);
Expand Down Expand Up @@ -169,4 +168,9 @@ public void testRenamePopulatesFileAncestors2() throws Exception {
validateAncestorsMoved(src, dest, nestedFile);

}

@Override
public void testRenameFileUnderFileSubdir() throws Exception {
skip("Rename deep paths under files is allowed");
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -18,28 +18,28 @@

package org.apache.hadoop.fs.s3a;

import java.io.FileNotFoundException;

import org.junit.Before;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TestName;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileAlreadyExistsException;
import org.apache.hadoop.fs.FileSystemContractBaseTest;
import org.apache.hadoop.fs.Path;

import static org.apache.hadoop.fs.contract.ContractTestUtils.skip;
import static org.apache.hadoop.test.LambdaTestUtils.intercept;
import static org.junit.Assume.*;
import static org.junit.Assert.*;

/**
* Tests a live S3 system. If your keys and bucket aren't specified, all tests
* are marked as passed.
*
* This uses BlockJUnit4ClassRunner because FileSystemContractBaseTest from
* TestCase which uses the old Junit3 runner that doesn't ignore assumptions
* properly making it impossible to skip the tests if we don't have a valid
* bucket.
**/
*/
public class ITestS3AFileSystemContract extends FileSystemContractBaseTest {

protected static final Logger LOG =
Expand Down Expand Up @@ -77,7 +77,7 @@ public Path getTestBaseDir() {

@Test
public void testMkdirsWithUmask() throws Exception {
// not supported
skip("Not supported");
}

@Test
Expand All @@ -103,8 +103,38 @@ public void testRenameDirectoryAsExistingDirectory() throws Exception {
}

@Test
public void testMoveDirUnderParent() throws Throwable {
// not support because
// Fails if dst is a directory that is not empty.
public void testRenameDirectoryAsExistingFile() throws Exception {
assumeTrue(renameSupported());

Path src = path("testRenameDirectoryAsExistingFile/dir");
fs.mkdirs(src);
Path dst = path("testRenameDirectoryAsExistingFileNew/newfile");
createFile(dst);
intercept(FileAlreadyExistsException.class,
() -> rename(src, dst, false, true, true));
}

@Test
public void testRenameDirectoryMoveToNonExistentDirectory()
throws Exception {
skip("does not fail");
}

@Test
public void testRenameFileMoveToNonExistentDirectory() throws Exception {
skip("does not fail");
}

@Test
public void testRenameFileAsExistingFile() throws Exception {
intercept(FileAlreadyExistsException.class,
() -> super.testRenameFileAsExistingFile());
}

@Test
public void testRenameNonExistentPath() throws Exception {
intercept(FileNotFoundException.class,
() -> super.testRenameNonExistentPath());

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ public void testRollingRenames() throws Exception {
}

S3AFileSystem fs = getFileSystem();
assertFalse("Renaming deleted file should have failed",
intercept(FileNotFoundException.class, () ->
fs.rename(dir2[0], dir1[0]));
assertTrue("Renaming over existing file should have succeeded",
fs.rename(dir1[0], dir0[0]));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ void deleteObjectAtPath(Path f,
}

@Override
void maybeCreateFakeParentDirectory(Path path)
protected void maybeCreateFakeParentDirectory(Path path)
throws IOException, AmazonClientException {
// no-op
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -827,6 +827,16 @@ public static void removeBaseAndBucketOverrides(
removeBaseAndBucketOverrides(getTestBucketName(conf), conf, options);
}

/**
* Disable S3Guard from the test bucket in a configuration.
* @param conf configuration.
*/
public static void disableS3GuardInTestBucket(Configuration conf) {
removeBaseAndBucketOverrides(getTestBucketName(conf), conf,
S3_METADATA_STORE_IMPL,
DIRECTORY_MARKER_POLICY);
conf.set(S3_METADATA_STORE_IMPL, S3GUARD_METASTORE_NULL);
}
/**
* Call a function; any exception raised is logged at info.
* This is for test teardowns.
Expand Down
Loading