Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
import java.util.UUID;

import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
Expand Down Expand Up @@ -395,7 +396,8 @@ private Path createDirWithEmptySubFolder() throws IOException {
Path path = getContract().getTestPath();
fs.delete(path, true);
// create a - non-qualified - Path for a subdir
Path subfolder = path.suffix('/' + this.methodName.getMethodName());
Path subfolder = path.suffix('/' + this.methodName.getMethodName()
+ "-" + UUID.randomUUID());
mkdirs(subfolder);
return subfolder;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -545,7 +545,7 @@ private Constants() {
public static final String S3GUARD_METASTORE_LOCAL_ENTRY_TTL =
"fs.s3a.s3guard.local.ttl";
public static final int DEFAULT_S3GUARD_METASTORE_LOCAL_ENTRY_TTL
= 10 * 1000;
= 60 * 1000;

/**
* Use DynamoDB for the metadata: {@value}.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1343,6 +1343,16 @@ public boolean hasMetadataStore() {
return !S3Guard.isNullMetadataStore(metadataStore);
}

/**
* Does the filesystem have an authoritative metadata store?
* @return true if there is a metadata store and the authoritative flag
* is set for this filesystem.
*/
@VisibleForTesting
boolean hasAuthoritativeMetadataStore() {
return hasMetadataStore() && allowAuthoritative;
}

/**
* Get the metadata store.
* This will always be non-null, but may be bound to the
Expand Down Expand Up @@ -2382,6 +2392,38 @@ S3AFileStatus innerGetFileStatus(final Path f,
"deleted by S3Guard");
}

// if ms is not authoritative, check S3 if there's any recent
// modification - compare the modTime to check if metadata is up to date
// Skip going to s3 if the file checked is a directory. Because if the
// dest is also a directory, there's no difference.
// TODO After HADOOP-16085 the modification detection can be done with
// etags or object version instead of modTime
if (!pm.getFileStatus().isDirectory() &&
!allowAuthoritative) {
LOG.debug("Metadata for {} found in the non-auth metastore.", path);
final long msModTime = pm.getFileStatus().getModificationTime();

S3AFileStatus s3AFileStatus;
try {
s3AFileStatus = s3GetFileStatus(path, key, tombstones);
} catch (FileNotFoundException fne) {
s3AFileStatus = null;
}
if (s3AFileStatus == null) {
LOG.warn("Failed to find file {}. Either it is not yet visible, or "
+ "it has been deleted.", path);
} else {
final long s3ModTime = s3AFileStatus.getModificationTime();

if(s3ModTime > msModTime) {
LOG.debug("S3Guard metadata for {} is outdated, updating it",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might include the 2 mod times in the debug log

path);
return S3Guard.putAndReturn(metadataStore, s3AFileStatus,
instrumentation);
}
}
}

FileStatus msStatus = pm.getFileStatus();
if (needEmptyDirectoryFlag && msStatus.isDirectory()) {
if (pm.isEmptyDirectory() != Tristate.UNKNOWN) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Preconditions;
Expand Down Expand Up @@ -184,6 +186,9 @@ public static FileStatus[] dirMetaToStatuses(DirListingMetadata dirMeta) {
*
* Also update the MetadataStore to reflect the resulting directory listing.
*
* In not authoritative case: update file metadata if mod_time in listing
* of a file is greater then what is currently in the ms
*
* @param ms MetadataStore to use.
* @param path path to directory
* @param backingStatuses Directory listing from the backing store.
Expand Down Expand Up @@ -219,13 +224,26 @@ public static FileStatus[] dirListingUnion(MetadataStore ms, Path path,
// Since the authoritative case is already handled outside this function,
// we will basically start with the set of directory entries in the
// DirListingMetadata, and add any that only exist in the backingStatuses.

boolean changed = false;
final Map<Path, FileStatus> dirMetaMap = dirMeta.getListing().stream()
.collect(Collectors.toMap(
pm -> pm.getFileStatus().getPath(), PathMetadata::getFileStatus)
);

for (FileStatus s : backingStatuses) {
if (deleted.contains(s.getPath())) {
continue;
}

if (!isAuthoritative){
FileStatus status = dirMetaMap.get(s.getPath());
if (status != null
&& s.getModificationTime() > status.getModificationTime()) {
LOG.debug("Update ms with newer metadata of: {}", status);
ms.put(new PathMetadata(s));
}
}

// Minor race condition here. Multiple threads could add to this
// mutable DirListingMetadata. Since it is backed by a
// ConcurrentHashMap, the last put() wins.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,9 @@ two different reasons:

* Authoritative S3Guard
* S3Guard can be set as authoritative, which means that an S3A client will
avoid round-trips to S3 when **getting directory listings** if there is a fully
cached version of the directory stored in metadata store.
avoid round-trips to S3 when **getting file metadata**, and **getting
directory listings** if there is a fully cached version of the directory
stored in metadata store.
* This mode can be set as a configuration property
`fs.s3a.metadatastore.authoritative`
* All interactions with the S3 bucket(s) must be through S3A clients sharing
Expand All @@ -128,16 +129,20 @@ two different reasons:

More on Authoritative S3Guard:

* It is not treating the MetadataStore (e.g. dynamodb) as the source of truth
in general.
* It is the ability to short-circuit S3 list objects and serve listings from
the MetadataStore in some circumstances.
* This setting is about treating the MetadataStore (e.g. dynamodb) as the source
of truth in general, and also to short-circuit S3 list objects and serve
listings from the MetadataStore in some circumstances.
* For S3A to skip S3's get object metadata, and serve it directly from the
MetadataStore, the following things must all be true:
1. The S3A client is configured to allow MetadataStore to be authoritative
source of a file metadata (`fs.s3a.metadatastore.authoritative=true`).
1. The MetadataStore has the file metadata for the path stored in it.
* For S3A to skip S3's list objects on some path, and serve it directly from
the MetadataStore, the following things must all be true:
1. The MetadataStore implementation persists the bit
`DirListingMetadata.isAuthorititative` set when calling
`MetadataStore#put` (`DirListingMetadata`)
1. The S3A client is configured to allow metadatastore to be authoritative
1. The S3A client is configured to allow MetadataStore to be authoritative
source of a directory listing (`fs.s3a.metadatastore.authoritative=true`).
1. The MetadataStore has a **full listing for path** stored in it. This only
happens if the FS client (s3a) explicitly has stored a full directory
Expand All @@ -154,8 +159,9 @@ recommended that you leave the default setting here:
</property>
```

Note that a MetadataStore MAY persist this bit. (Not MUST).
Setting this to `true` is currently an experimental feature.
Note that a MetadataStore MAY persist this bit in the directory listings. (Not
MUST).

Note that if this is set to true, it may exacerbate or persist existing race
conditions around multiple concurrent modifications and listings of a given
Expand Down Expand Up @@ -396,6 +402,48 @@ for two buckets with a shared table, while disabling it for the public
bucket.


### Out-of-band operations with S3Guard

We call an operation out-of-band (OOB) when a bucket is used by a client with
S3Guard, and another client runs a write (e.g delete, move, rename,
overwrite) operation on an object in the same bucket without S3Guard.

The definition of behaviour in S3AFileSystem/MetadataStore in case of OOBs:
* A client with S3Guard
* B client without S3Guard (Directly to S3)


* OOB OVERWRITE, authoritative mode:
* A client creates F1 file
* B client overwrites F1 file with F2 (Same, or different file size)
* A client's getFileStatus returns F1 metadata

* OOB OVERWRITE, NOT authoritative mode:
* A client creates F1 file
* B client overwrites F1 file with F2 (Same, or different file size)
* A client's getFileStatus returns F2 metadata. In not authoritative mode we
check S3 for the file. If the modification time of the file in S3 is greater
than in S3Guard, we can safely return the S3 file metadata and update the
cache.

* OOB DELETE, authoritative mode:
* A client creates F file
* B client deletes F file
* A client's getFileStatus returns that the file is still there

* OOB DELETE, NOT authoritative mode:
* A client creates F file
* B client deletes F file
* A client's getFileStatus returns that the file is still there

Note: authoritative and NOT authoritative mode behaves the same at
OOB DELETE case.

The behaviour in case of getting directory listings:
* File status in metadata store gets updated during the listing the same way
as in getFileStatus.


## S3Guard Command Line Interface (CLI)

Note that in some cases an AWS region or `s3a://` URI can be provided.
Expand Down
Loading