Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 - Delete all files under a folder (recursive delete) #529

Open
2 tasks
ZelCloud opened this issue May 4, 2022 · 4 comments
Open
2 tasks

S3 - Delete all files under a folder (recursive delete) #529

ZelCloud opened this issue May 4, 2022 · 4 comments
Labels
feature-request A feature should be added or improved. high-level-library p2 This is a standard priority issue

Comments

@ZelCloud
Copy link

ZelCloud commented May 4, 2022

Describe the feature

Delete all files underneath a folder (recursive delete) using a prefix.

Use Case

It would be nice to be able to give the prefix and delete under it versus trying to iterate through all the objects and deleting them individually.

ex. Bucket structure

  • folder1
    • file 1
    • file 2
    • file 3
  • folder2
    • file 4
  • folder3

Delete everything under "folder1/"

Proposed Solution

Ideally it would be nice if delete_objects took the prefix builder function argument.

ex.

// Deletes all files in folder1
let bucket = "my-bucket";
let prefix = "folder1/";
let s3res = s3.delete_objects()
        .bucket(bucket)
        .prefix(prefix)
        .send();

A possible workaround for right now might be, iterating through all the objects then building the delete_objects vec using the iterated page keys. Though I'm not sure if there's any gotchas or issues with this approach.

use std::error::Error;
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let shared_config = aws_config::load_from_env().await;
    let s3 = aws_sdk_s3::Client::new(&shared_config);
    let prefix = "prefix";
    let bucket = "bucket";
    let mut pages = s3.list_objects_v2()
                      .bucket(bucket)
                      .prefix(prefix)
                      .into_paginator()
                      .send();

    let mut delete_objects: Vec<ObjectIdentifier> = vec![];
    while let Some(page) = pages.next().await {
        let obj_id = ObjectIdentifier::builder().set_key(Some(page?.key)).build();
        delete_objects.push(obj_id);
    }
    
    let delete = Delete::builder().set_objects(Some(delete_objects)).build();
    
    s3.delete_objects()
      .bucket(bucket)
      .delete(delete)
      .send()
      .await?;

    println!("Objects deleted.");
    
    Ok(())
}

Other Information

Possible temporary workaround provided in proposed solution.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

A note for the community

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue, please leave a comment
@ZelCloud ZelCloud added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels May 4, 2022
@rcoh rcoh added high-level-library and removed needs-triage This issue or PR still needs to be triaged. labels May 4, 2022
@rcoh
Copy link
Contributor

rcoh commented May 4, 2022

makes sense! This isn't something that will be included in the SDK, but is a good candidate for a high level S3 library based on the AWS SDK

@Velfi
Copy link
Contributor

Velfi commented May 4, 2022

This could also be a helpful example

@phyber
Copy link

phyber commented May 4, 2022

Though I'm not sure if there's any gotchas or issues with this approach.

With a sufficiently large list of objects you may run out of RAM with this approach as you're building up a large Vec of all of the objects. The basic approach itself is fine, and this issue could be avoided with a little tweaking. Perhaps by taking the paginated objects a few thousand at a time and deleting those, ensuring you never have to deal with a huge Vec.

@benmanns
Copy link

I think you want to do something like:

  • list objects/versions for your target prefix with pagination
  • take paginated results in chunks of 1000
  • invoke the DeleteObjects API for each chunk

Based on the API examples, it looks like you can delete objects while holding a pagination cursor. This method avoids building a big vector when operating on large buckets.

@jmklix jmklix added the p2 This is a standard priority issue label Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. high-level-library p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

6 participants