Skip to content

Conversation

@steveloughran
Copy link
Contributor

Adds an API (pulled from #6596) to allow callers to request IO capacity for an named operation with optional source and dest paths.

The first use of this would be the bulk delete operation of #6494; there'd be some throttling within the s3a code which set max # of writes per bucket and for the bulk delete the caller would ask for as many as there were entries.

Added new store operations for delete_bulk and delete_dir

How was this patch tested?

New tests.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 19s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 47s trunk passed
+1 💚 compile 8m 56s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 8m 7s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 0m 44s trunk passed
+1 💚 mvnsite 1m 3s trunk passed
+1 💚 javadoc 0m 48s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 1m 23s trunk passed
+1 💚 shadedclient 20m 53s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 8m 30s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 8m 30s the patch passed
+1 💚 compile 8m 6s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 8m 6s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 35s the patch passed
+1 💚 mvnsite 0m 56s the patch passed
+1 💚 javadoc 0m 43s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 37s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 1m 35s the patch passed
+1 💚 shadedclient 21m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 16m 31s hadoop-common in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
138m 38s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/artifact/out/Dockerfile
GITHUB PR #6703
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux e24358cb7c53 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 58fb6a3
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/testReport/
Max. process+thread count 2150 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

/**
* Implementation support for {@link IORateLimiter}.
*/
public final class IORateLimiterSupport {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a wrapper on top of RestrictedRateLimiting with extra operation name validation right?
I think this can be extended to limit per operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the op name and path you can be clever:

  • limit by path
  • use operation name and have a "multiplier" of actual io, to include extra operations made (rename: list, copy, delete). for s3, separate read/write io capacities would need to be requested.
  • consider some free and give a cost of 0

*/
Duration acquireIOCapacity(
String operation,
Path source,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A multi-delete operation takes a list of paths. Although we have a concept of the base path, I don't think the S3 client cares about every path to be under the base path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3 throttling does as it is per prefix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand this better...
If we have a list of paths on which we are attempting a bulk operation and the only common prefix for them, is the root itself.
Should we acquire IO Capacity for each individual path or for the root path itself??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really good q. will comment below

Adds an API (pulled from apache#6596) to allow callers to request
IO capacity for an named operation with optional source and dest paths.

Change-Id: I02aff4d3c90ac299c80f388e88195d69e1049fe0
@steveloughran steveloughran force-pushed the fs/HADOOP-19140-ratelimiter branch from 58fb6a3 to d2e146e Compare April 23, 2024 17:51
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 17s trunk passed
+1 💚 compile 17m 52s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 17m 12s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 1m 15s trunk passed
+1 💚 mvnsite 1m 39s trunk passed
+1 💚 javadoc 1m 14s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 50s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 2m 35s trunk passed
+1 💚 shadedclient 38m 40s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 55s the patch passed
+1 💚 compile 16m 46s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 16m 46s the patch passed
+1 💚 compile 16m 7s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 16m 7s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 14s the patch passed
+1 💚 mvnsite 1m 34s the patch passed
+1 💚 javadoc 1m 4s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 49s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 2m 52s the patch passed
+1 💚 shadedclient 38m 43s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 54s hadoop-common in the patch passed.
+1 💚 asflicense 0m 58s The patch does not generate ASF License warnings.
232m 44s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/artifact/out/Dockerfile
GITHUB PR #6703
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 1e9683e47802 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d2e146e
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/testReport/
Max. process+thread count 2038 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6703/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

.describedAs("delay for %d capacity", capacity)
.isEqualTo(Duration.ZERO);
}
} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: EOF warning.

@steveloughran
Copy link
Contributor Author

@anujmodi2021

For the work on manifest committer I was asking for some IOPs per rename, so that if there wasn't enough capacity, only those over capacity renames blocked. It also allows for incremental IO: you don't have to block acquire up front, just ask as you go along.

gets a bit more complex for S3 where dir operations are mimicked by file-by-file. There nwe'd ask for 2 read and 1 write ops per file rename (HEAD (read) + COPY (read + write) and for the bulk delete to be the same #of writes as the delete list. That is already done in its implementation of BulkDelete.

Note that the AWS SDK does split up large COPY operations into multipart copies, so really the IO capacity is (2 * file-size/block size) but as these copies can be so slow I'm not worrying about it. We'd need to replace that bit of the SDK and while we've discussed it.

FYI I've let this work lapse as other things took priority; if you want to take it up -feel free to do so.

@github-actions
Copy link
Contributor

We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again.
Thanks all for your contribution.

@github-actions github-actions bot added the Stale label Sep 29, 2025
@github-actions github-actions bot closed this Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants