-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-19140. [ABFS, S3A] Add IORateLimiter API #6703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.hadoop.fs; | ||
|
|
||
| import java.time.Duration; | ||
| import javax.annotation.Nullable; | ||
|
|
||
| import org.apache.hadoop.classification.InterfaceAudience; | ||
| import org.apache.hadoop.classification.InterfaceStability; | ||
|
|
||
| /** | ||
| * An optional interface for classes that provide rate limiters. | ||
| * For a filesystem source, the operation name SHOULD be one of | ||
| * those listed in | ||
| * {@link org.apache.hadoop.fs.statistics.StoreStatisticNames} | ||
| * if the operation is listed there. | ||
| * <p> | ||
| * This interfaces is intended to be exported by FileSystems so that | ||
| * applications wishing to perform bulk operations may request access | ||
| * to a rate limiter <i>which is shared across all threads interacting | ||
| * with the store.</i>. | ||
| * That is: the rate limiting is global to the specific instance of the | ||
| * object implementing this interface. | ||
| * <p> | ||
| * It is not expected to be shared with other instances of the same | ||
| * class, or across processes. | ||
| * <p> | ||
| * This means it is primarily of benefit when limiting bulk operations | ||
| * which can overload an (object) store from a small pool of threads. | ||
| * Examples of this can include: | ||
| * <ul> | ||
| * <li>Bulk delete operations</li> | ||
| * <li>Bulk rename operations</li> | ||
| * <li>Completing many in-progress uploads</li> | ||
| * <li>Deep and wide recursive treewalks</li> | ||
| * <li>Reading/prefetching many blocks within a file</li> | ||
| * </ul> | ||
| * In cluster applications, it is more likely that rate limiting is | ||
| * useful during job commit operations, or processes with many threads. | ||
| */ | ||
| @InterfaceAudience.Public | ||
| @InterfaceStability.Unstable | ||
| public interface IORateLimiter { | ||
|
|
||
| /** | ||
| * Acquire IO capacity. | ||
| * <p> | ||
| * The implementation may assign different costs to the different | ||
| * operations. | ||
| * <p> | ||
| * If there is not enough space, the permits will be acquired, | ||
| * but the subsequent call will block until the capacity has been | ||
| * refilled. | ||
| * <p> | ||
| * The path parameter is used to support stores where there may be different throttling | ||
| * under different paths. | ||
| * @param operation operation being performed. Must not be null, may be "", | ||
| * should be from {@link org.apache.hadoop.fs.statistics.StoreStatisticNames} | ||
| * where there is a matching operation. | ||
| * @param source path for operations. | ||
| * Use "/" for root/store-wide operations. | ||
| * @param dest destination path for rename operations or any other operation which | ||
| * takes two paths. | ||
| * @param requestedCapacity capacity to acquire. | ||
| * Must be greater than or equal to 0. | ||
| * @return time spent waiting for output. | ||
| */ | ||
| Duration acquireIOCapacity( | ||
| String operation, | ||
| Path source, | ||
| @Nullable Path dest, | ||
| int requestedCapacity); | ||
|
|
||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.hadoop.fs.impl; | ||
|
|
||
| import org.apache.hadoop.fs.IORateLimiter; | ||
| import org.apache.hadoop.fs.Path; | ||
| import org.apache.hadoop.util.RateLimiting; | ||
| import org.apache.hadoop.util.RateLimitingFactory; | ||
|
|
||
| import static org.apache.hadoop.util.Preconditions.checkArgument; | ||
|
|
||
| /** | ||
| * Implementation support for {@link IORateLimiter}. | ||
| */ | ||
| public final class IORateLimiterSupport { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just a wrapper on top of RestrictedRateLimiting with extra operation name validation right?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. with the op name and path you can be clever:
|
||
|
|
||
| private static final IORateLimiter UNLIMITED = createIORateLimiter(0); | ||
|
|
||
| private IORateLimiterSupport() { | ||
| } | ||
|
|
||
| /** | ||
| * Get a rate limiter source which has no rate limiting. | ||
| * @return a rate limiter source which has no rate limiting. | ||
| */ | ||
| public static IORateLimiter unlimited() { | ||
| return UNLIMITED; | ||
| } | ||
|
|
||
| /** | ||
| * Create a rate limiter with a fixed capacity. | ||
| * @param capacityPerSecond capacity per second. | ||
| * @return a rate limiter. | ||
| */ | ||
| public static IORateLimiter createIORateLimiter(int capacityPerSecond) { | ||
| final RateLimiting limiting = RateLimitingFactory.create(capacityPerSecond); | ||
| return (operation, source, dest, requestedCapacity) -> { | ||
| validateArgs(operation, source, dest, requestedCapacity); | ||
| return limiting.acquire(requestedCapacity); | ||
| }; | ||
| } | ||
|
|
||
| /** | ||
| * Validate the arguments. | ||
| * @param operation | ||
| * @param source | ||
| * @param dest | ||
| * @param requestedCapacity | ||
| */ | ||
| private static void validateArgs(String operation, | ||
| Path source, | ||
| Path dest, | ||
| int requestedCapacity) { | ||
| checkArgument(operation != null, "null operation"); | ||
| checkArgument(source != null, "null source path"); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A multi-delete operation takes a list of paths. Although we have a concept of the base path, I don't think the S3 client cares about every path to be under the base path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3 throttling does as it is per prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand this better...
If we have a list of paths on which we are attempting a bulk operation and the only common prefix for them, is the root itself.
Should we acquire IO Capacity for each individual path or for the root path itself??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really good q. will comment below