-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-18679. Add API for bulk/paged object deletion #5993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18679. Add API for bulk/paged object deletion #5993
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
writing up spec made me decide we should have a .opt to indicate when a bulk delete is a "background" operation, which may be executed at a rate to interfere less with live queries, e.g: smaller pages, rate limited buildup of pages, different throttle retry policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am struggling a bit to understand how this all comes together. This is what I think is happening, but don't really get it:
- Implement an iterator, iterator to have a method which implements the actual bulk delete calls to the store.
- Implement the BulkDelete and Builder interfaces. The build() method will iterate through the iterator, and call a bulkDelete method on the iterator ..
bulkDelete()just creates the builder and returns?- On each iteration, call DeleteProgress to update progress. If using the FAIL_FAST implementation, and there are any failures it returns false.
- If false, call abort(). (where is the abort() to be implemented?)
- Once complete, return the Outcome object
| * will be batched into pages and submitted to the remote filesystem/store | ||
| * for bulk deletion, possibly in parallel. | ||
| * <p> | ||
| * A remote iterator provides the list of paths to delete; all must be under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the base path a requirement? to ensure things are in the same bucket (for S3) or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its for multiple mounted filesystems (viewfs) to direct to the final fs.
| private final boolean successful; | ||
|
|
||
| /** | ||
| * Wast the operation aborted? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo, Was
|
9317477 to
d69fac0
Compare
|
💔 -1 overall
This message was automatically generated. |
Initial pass at writing an API for bulk deletes, targeting S3 and any store with paged delete support. Minimal design of a RemoteIterator to provide the list of paths to delete; a progress report will be provided after pages are deleted so as to provide an update of files deleted, and a way for the application code to abort an ongoing delete -such as after a failure. Change-Id: I3dcbb144232d76b5d4ebf7ad080d187edd6e93e4
Including option "fs.option.bulkdelete.background" to indicate this is a background cleanup and so can be lower priority (somehow) Change-Id: Idb55ebf2a6664fb23e3dbacd3e0ade45cb4936e1
Change-Id: I1b053f3b6573dfb53ade78073d0cdf948a0c207d
d69fac0 to
c7b4e99
Compare
|
💔 -1 overall
This message was automatically generated. |
Initial pass at writing an API for bulk deletes,
targeting S3 and any store with paged delete support.
Minimal design of a RemoteIterator to provide the list of paths to delete; a progress report will be provided after pages are deleted so as to provide an update of files deleted, and a way for the application code to abort an ongoing delete -such as after a failure.
Aspects of implementation to make clear in markdown spec
How was this patch tested?
No tests yet; working on API first.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?