Multipart upload can leave futures unpolled, leading to timeout #5366

wjones127 · 2024-02-06T04:33:05Z

Describe the bug

Instead of waiting until the data passed to the writer is uploaded to return ready, we buffer it until there is enough data and then put the request future in FuturesUnordered. I thought this was pretty clever when I originally wrote it, but it has a big flaw: If a lot of time passes between write calls, the request futures can go all that time without being polled. This can cause timeouts such as:

Generic S3 error: Error after 0 retries in 67.379407043s, max_retries:10, retry_timeout:180s, source:error sending request for url (s3://...): operation timed out

To Reproduce

I discovered this downstream in some code that uses multi part uploads: lancedb/lance#1878 (comment)

Expected behavior

I think the best solution is to run the upload tasks inside of a background task created with tokio::task::spawn().

Additional context

It's quite possible this is the same underlying cause of #5106

The text was updated successfully, but these errors were encountered:

tustvold · 2024-02-06T12:15:24Z

Perhaps I am missing something, why would the future not be polled if there is work to do? This sounds like the issue is the user is multiplexing cpu bound tasks on the same threadpool/task, and therefore starving their write tasks.

The conclusion of prior efforts in this space, e.g. #4040, is this is a user problem

wjones127 · 2024-02-06T16:12:45Z

Okay, let me try creating a reproduction in Rust.

wjones127 · 2024-02-06T16:37:34Z

Here is the reproduction:

Create a Multi-part upload
Write 10 MB (trigger a request)
Sleep for 30 seconds
Try writing more data (this will fail with timeout)

I think we would agree that it might be expected behavior if the sleep in step (3) was synchronous/blocking. However, this will fail even if the sleep is async, which makes this feel like a real foot gun.

use object_store::{aws::AmazonS3Builder, path::Path, ObjectStore};
use tokio::io::AsyncWriteExt;

#[tokio::main]
async fn main() {
    // Create an S3 store configured from environment variables
    let store = AmazonS3Builder::from_env().with_bucket_name("lance-performance-testing").build().unwrap();

    let (_id, mut writer) = store.put_multipart(&Path::from("test")).await.unwrap();

    // Upload 10 MB of data
    let data_chunk = vec![0u8; 1024  * 1024];
    for _ in 0..10 {
        writer.write_all(&data_chunk).await.unwrap();
    }

    // Sleep for 30 seconds
    tokio::time::sleep(tokio::time::Duration::from_secs(30)).await;
    
    // try to upload 10 more (should fail here)
    for _ in 0..10 {
        writer.write_all(&data_chunk).await.unwrap();
    }

    // finish.
    writer.shutdown().await.unwrap();
}

called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: Generic { store: "S3", source: Reqwest { retries: 0, max_retries: 10, elapsed: 30.00097806s, retry_timeout: 180s, source: reqwest::Error { kind: Request, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("s3.us-east-1.amazonaws.com")), port: None, path: "/lance-performance-testing/test", query: Some("partNumber=1&uploadId=mOQTaVt2JtGlMwB2IceAR2olH5FyBgk_oJVbZ_zh9XRDU6VbbIRWmx0KKk3MM6vVb2vCA93jP4xIiY9lAo0BHxULl5jNV0issWUnqVyFk.dA.xt3mmaz5U_GK_g1OqYCn4ZMMZ61xIn22FzDWuU.PQ--"), fragment: None }, source: TimedOut } } } }

tustvold · 2024-02-06T18:03:23Z

Hmm... I wonder if the issue here is that write_all is not waiting for in-progress writes to finish? Is it possible the poll_flush implementation isn't quite right? I would expect the above to work, without needing to spawn work.

tustvold · 2024-02-06T18:14:59Z

It looks like this is actually an issue with the underlying IO traits - tokio-rs/tokio#4296

TLDR is the user must explicitly flush when they've finished writing. I agree this is an unfortunate footgun

wjones127 · 2024-02-06T18:24:50Z

I wonder if the issue here is that write_all is not waiting for in-progress writes to finish?

That is correct, but it's due to the design of the multipart write implementation. By design, when I wrote it, poll_write() will return Poll::Ready without waiting for all write requests to complete. If we changed it so that poll_write only returned Ready until the underlying FuturesUnordered is empty again, that would also fix this. Then you would only get concurrency if your call write() with chunks larger than 10MB. Otherwise it will always run one upload at a time. Does this seem like a preferable tradeoff over using spawn? Or do you see another way?

wjones127 · 2024-02-06T18:28:25Z

Oh I see what you mean about flush. Basically, anytime a user expects there will be a significant gap in time between write calls, they must call flush. I think that could work.

tustvold · 2024-02-06T18:29:23Z

By design, when I wrote it, poll_write() will return Poll::Ready without waiting for all write requests to complete.

Yes this is an unfortunate property of the AsyncWriteExt trait, I agree it is unfortunate and the method is definitely confusing, we should definitely document this footgun and link back to the tokio ticket.

tustvold · 2024-03-01T06:28:08Z

label_issue.py automatically added labels {'object-store'} from #5369

ion-elgreco · 2024-03-22T08:17:14Z

@tustvold does this affect only MultiPart?

Because I'm seeing since object store 0.9 quite some timeouts in ADLS and S3 with Delta-RS

tustvold · 2024-03-22T13:28:16Z

This ticket is specifically multipart.

Because I'm seeing since object store 0.9 quite some timeouts in ADLS and S3 with Delta-RS

What version were you running before, 0.8 set some default timeouts (previously they were unlimited).

wjones127 added the bug label Feb 6, 2024

wjones127 self-assigned this Feb 6, 2024

wjones127 mentioned this issue Feb 6, 2024

OUTDATED fix: prevent timeouts while writing to object storage lancedb/lance#1916

Closed

wjones127 mentioned this issue Feb 6, 2024

docs(object-store): add warning to flush #5369

Merged

tustvold closed this as completed in #5369 Feb 7, 2024

tustvold added the object-store Object Store Interface label Mar 1, 2024

tustvold mentioned this issue Mar 3, 2024

Revisit Design of ObjectStore::put_multipart #5458

Closed

This was referenced Mar 14, 2024

Move parallel parquet serialization to blocking threads apache/datafusion#9605

Closed

Enable retries on connection reset by peer when doing scan_parquet against an object store pola-rs/polars#14384

Closed

tustvold mentioned this issue Mar 23, 2024

[object_store] Should object_store retry on broken pipe errors when calling put? #5545

Closed

thomasfrederikhoeck mentioned this issue Apr 3, 2024

S3 requests after the hyper s3 timeout connection duration fail over large table operations delta-io/delta-rs#2301

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multipart upload can leave futures unpolled, leading to timeout #5366

Multipart upload can leave futures unpolled, leading to timeout #5366

wjones127 commented Feb 6, 2024

tustvold commented Feb 6, 2024 •

edited

Loading

wjones127 commented Feb 6, 2024

wjones127 commented Feb 6, 2024

tustvold commented Feb 6, 2024 •

edited

Loading

tustvold commented Feb 6, 2024 •

edited

Loading

wjones127 commented Feb 6, 2024

wjones127 commented Feb 6, 2024

tustvold commented Feb 6, 2024 •

edited

Loading

tustvold commented Mar 1, 2024

ion-elgreco commented Mar 22, 2024

tustvold commented Mar 22, 2024 •

edited

Loading

Multipart upload can leave futures unpolled, leading to timeout #5366

Multipart upload can leave futures unpolled, leading to timeout #5366

Comments

wjones127 commented Feb 6, 2024

tustvold commented Feb 6, 2024 • edited Loading

wjones127 commented Feb 6, 2024

wjones127 commented Feb 6, 2024

tustvold commented Feb 6, 2024 • edited Loading

tustvold commented Feb 6, 2024 • edited Loading

wjones127 commented Feb 6, 2024

wjones127 commented Feb 6, 2024

tustvold commented Feb 6, 2024 • edited Loading

tustvold commented Mar 1, 2024

ion-elgreco commented Mar 22, 2024

tustvold commented Mar 22, 2024 • edited Loading

tustvold commented Feb 6, 2024 •

edited

Loading

tustvold commented Feb 6, 2024 •

edited

Loading

tustvold commented Feb 6, 2024 •

edited

Loading

tustvold commented Feb 6, 2024 •

edited

Loading

tustvold commented Mar 22, 2024 •

edited

Loading