-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-19140. [ABFS, S3A] Add IORateLimiter API #6703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19140. [ABFS, S3A] Add IORateLimiter API #6703
Conversation
|
🎊 +1 overall
This message was automatically generated. |
| /** | ||
| * Implementation support for {@link IORateLimiter}. | ||
| */ | ||
| public final class IORateLimiterSupport { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a wrapper on top of RestrictedRateLimiting with extra operation name validation right?
I think this can be extended to limit per operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the op name and path you can be clever:
- limit by path
- use operation name and have a "multiplier" of actual io, to include extra operations made (rename: list, copy, delete). for s3, separate read/write io capacities would need to be requested.
- consider some free and give a cost of 0
| */ | ||
| Duration acquireIOCapacity( | ||
| String operation, | ||
| Path source, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A multi-delete operation takes a list of paths. Although we have a concept of the base path, I don't think the S3 client cares about every path to be under the base path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3 throttling does as it is per prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand this better...
If we have a list of paths on which we are attempting a bulk operation and the only common prefix for them, is the root itself.
Should we acquire IO Capacity for each individual path or for the root path itself??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really good q. will comment below
Adds an API (pulled from apache#6596) to allow callers to request IO capacity for an named operation with optional source and dest paths. Change-Id: I02aff4d3c90ac299c80f388e88195d69e1049fe0
58fb6a3 to
d2e146e
Compare
|
🎊 +1 overall
This message was automatically generated. |
| .describedAs("delay for %d capacity", capacity) | ||
| .isEqualTo(Duration.ZERO); | ||
| } | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: EOF warning.
|
For the work on manifest committer I was asking for some IOPs per rename, so that if there wasn't enough capacity, only those over capacity renames blocked. It also allows for incremental IO: you don't have to block acquire up front, just ask as you go along. gets a bit more complex for S3 where dir operations are mimicked by file-by-file. There nwe'd ask for 2 read and 1 write ops per file rename (HEAD (read) + COPY (read + write) and for the bulk delete to be the same #of writes as the delete list. That is already done in its implementation of BulkDelete. Note that the AWS SDK does split up large COPY operations into multipart copies, so really the IO capacity is (2 * file-size/block size) but as these copies can be so slow I'm not worrying about it. We'd need to replace that bit of the SDK and while we've discussed it. FYI I've let this work lapse as other things took priority; if you want to take it up -feel free to do so. |
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Adds an API (pulled from #6596) to allow callers to request IO capacity for an named operation with optional source and dest paths.
The first use of this would be the bulk delete operation of #6494; there'd be some throttling within the s3a code which set max # of writes per bucket and for the bulk delete the caller would ask for as many as there were entries.
Added new store operations for delete_bulk and delete_dir
How was this patch tested?
New tests.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?