HADOOP-19140. [ABFS, S3A] Add IORateLimiter API #6703

steveloughran · 2024-04-03T16:31:05Z

Adds an API (pulled from #6596) to allow callers to request IO capacity for an named operation with optional source and dest paths.

The first use of this would be the bulk delete operation of #6494; there'd be some throttling within the s3a code which set max # of writes per bucket and for the bulk delete the caller would ask for as many as there were entries.

Added new store operations for delete_bulk and delete_dir

How was this patch tested?

New tests.

For code changes:

Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

hadoop-yetus · 2024-04-03T18:50:40Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 19s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	32m 47s		trunk passed
+1 💚	compile	8m 56s		trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	compile	8m 7s		trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	checkstyle	0m 44s		trunk passed
+1 💚	mvnsite	1m 3s		trunk passed
+1 💚	javadoc	0m 48s		trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	javadoc	0m 34s		trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	spotbugs	1m 23s		trunk passed
+1 💚	shadedclient	20m 53s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 30s		the patch passed
+1 💚	compile	8m 30s		the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	javac	8m 30s		the patch passed
+1 💚	compile	8m 6s		the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	javac	8m 6s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 35s		the patch passed
+1 💚	mvnsite	0m 56s		the patch passed
+1 💚	javadoc	0m 43s		the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	javadoc	0m 37s		the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	spotbugs	1m 35s		the patch passed
+1 💚	shadedclient	21m 19s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	16m 31s		hadoop-common in the patch passed.
+1 💚	asflicense	0m 42s		The patch does not generate ASF License warnings.
		138m 38s

Subsystem	Report/Notes
Docker	ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/artifact/out/Dockerfile
GITHUB PR	#6703
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux e24358cb7c53 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `58fb6a3`
Default Java	Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/testReport/
Max. process+thread count	2150 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

mukund-thakur · 2024-04-05T22:28:55Z

...mmon-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/IORateLimiterSupport.java

+/**
+ * Implementation support for {@link IORateLimiter}.
+ */
+public final class IORateLimiterSupport {


This is just a wrapper on top of RestrictedRateLimiting with extra operation name validation right?
I think this can be extended to limit per operation.

with the op name and path you can be clever:

limit by path

use operation name and have a "multiplier" of actual io, to include extra operations made (rename: list, copy, delete). for s3, separate read/write io capacities would need to be requested.

consider some free and give a cost of 0

mukund-thakur · 2024-04-09T17:17:15Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/IORateLimiter.java

+   */
+  Duration acquireIOCapacity(
+      String operation,
+      Path source,


A multi-delete operation takes a list of paths. Although we have a concept of the base path, I don't think the S3 client cares about every path to be under the base path.

s3 throttling does as it is per prefix.

Just to understand this better...
If we have a list of paths on which we are attempting a bulk operation and the only common prefix for them, is the root itself.
Should we acquire IO Capacity for each individual path or for the root path itself??

really good q. will comment below

Adds an API (pulled from apache#6596) to allow callers to request IO capacity for an named operation with optional source and dest paths. Change-Id: I02aff4d3c90ac299c80f388e88195d69e1049fe0

hadoop-yetus · 2024-04-23T21:45:45Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 31s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	46m 17s		trunk passed
+1 💚	compile	17m 52s		trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	compile	17m 12s		trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	checkstyle	1m 15s		trunk passed
+1 💚	mvnsite	1m 39s		trunk passed
+1 💚	javadoc	1m 14s		trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	javadoc	0m 50s		trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	spotbugs	2m 35s		trunk passed
+1 💚	shadedclient	38m 40s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 55s		the patch passed
+1 💚	compile	16m 46s		the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	javac	16m 46s		the patch passed
+1 💚	compile	16m 7s		the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	javac	16m 7s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	1m 14s		the patch passed
+1 💚	mvnsite	1m 34s		the patch passed
+1 💚	javadoc	1m 4s		the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚	javadoc	0m 49s		the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚	spotbugs	2m 52s		the patch passed
+1 💚	shadedclient	38m 43s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	19m 54s		hadoop-common in the patch passed.
+1 💚	asflicense	0m 58s		The patch does not generate ASF License warnings.
		232m 44s

Subsystem	Report/Notes
Docker	ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/artifact/out/Dockerfile
GITHUB PR	#6703
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 1e9683e47802 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `d2e146e`
Default Java	Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/testReport/
Max. process+thread count	2038 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

anujmodi2021 · 2024-09-03T06:31:35Z

hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestIORateLimiter.java

+        .describedAs("delay for %d capacity", capacity)
+        .isEqualTo(Duration.ZERO);
+  }
+}


Nit: EOF warning.

steveloughran · 2024-09-04T10:06:48Z

@anujmodi2021

For the work on manifest committer I was asking for some IOPs per rename, so that if there wasn't enough capacity, only those over capacity renames blocked. It also allows for incremental IO: you don't have to block acquire up front, just ask as you go along.

gets a bit more complex for S3 where dir operations are mimicked by file-by-file. There nwe'd ask for 2 read and 1 write ops per file rename (HEAD (read) + COPY (read + write) and for the bulk delete to be the same #of writes as the delete list. That is already done in its implementation of BulkDelete.

Note that the AWS SDK does split up large COPY operations into multipart copies, so really the IO capacity is (2 * file-size/block size) but as these copies can be so slow I'm not worrying about it. We'd need to replace that bit of the SDK and while we've discussed it.

FYI I've let this work lapse as other things took priority; if you want to take it up -feel free to do so.

github-actions · 2025-09-29T00:23:09Z

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

github-actions bot added trunk Common labels Apr 3, 2024

mukund-thakur reviewed Apr 5, 2024

View reviewed changes

mukund-thakur reviewed Apr 9, 2024

View reviewed changes

HADOOP-19140. [ABFS, S3A] Add IORateLimiter API

d2e146e

Adds an API (pulled from apache#6596) to allow callers to request IO capacity for an named operation with optional source and dest paths. Change-Id: I02aff4d3c90ac299c80f388e88195d69e1049fe0

steveloughran force-pushed the fs/HADOOP-19140-ratelimiter branch from 58fb6a3 to d2e146e Compare April 23, 2024 17:51

anujmodi2021 reviewed Sep 3, 2024

View reviewed changes

github-actions bot added the Stale label Sep 29, 2025

github-actions bot closed this Sep 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HADOOP-19140. [ABFS, S3A] Add IORateLimiter API #6703

HADOOP-19140. [ABFS, S3A] Add IORateLimiter API #6703

Uh oh!

steveloughran commented Apr 3, 2024

Uh oh!

hadoop-yetus commented Apr 3, 2024

Uh oh!

mukund-thakur Apr 5, 2024

Uh oh!

steveloughran Apr 11, 2024

Uh oh!

mukund-thakur Apr 9, 2024

Uh oh!

steveloughran Apr 11, 2024

Uh oh!

anujmodi2021 Sep 3, 2024

Uh oh!

steveloughran Sep 4, 2024

Uh oh!

hadoop-yetus commented Apr 23, 2024

Uh oh!

anujmodi2021 Sep 3, 2024

Uh oh!

steveloughran commented Sep 4, 2024

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HADOOP-19140. [ABFS, S3A] Add IORateLimiter API #6703

HADOOP-19140. [ABFS, S3A] Add IORateLimiter API #6703

Uh oh!

Conversation

steveloughran commented Apr 3, 2024

How was this patch tested?

For code changes:

Uh oh!

hadoop-yetus commented Apr 3, 2024

Uh oh!

mukund-thakur Apr 5, 2024

Choose a reason for hiding this comment

Uh oh!

steveloughran Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

mukund-thakur Apr 9, 2024

Choose a reason for hiding this comment

Uh oh!

steveloughran Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

anujmodi2021 Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

steveloughran Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

hadoop-yetus commented Apr 23, 2024

Uh oh!

anujmodi2021 Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

steveloughran commented Sep 4, 2024

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants