Feature: Multipart S3 Uploads #227

ErikMichelson · 2024-12-18T23:22:32Z

Hi, thanks for developing Gokapi.

Based on a brief look into the code, it seems that uploads to S3 are performed by storing the chunks of an upload onto disk and then uploading the complete stored file using the aws-sdk Uploader.
This approach has the disadvantage that the complete file needs to be stored at least once on disk, those raising the disk space requirements for a Gokapi deployment.

S3 allows performing Multipart Uploads (see link 1), that are based on chunks as well. I took a quick look at the AWS SDK for go which you're using, and it should be possible to use the Multipart upload directly (see link 2).

My idea would be to initialize a Multipart Upload when an upload to Gokapi is started, and then upload each received chunk as a single part to S3. This would reduce the required amount of disk space as well as might resolve the problem of S3 uploads "hanging" on 99 % of the progress bar for several minutes.

1: https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html#mpu-process
2: https://docs.aws.amazon.com/sdk-for-go/api/service/s3/#S3.CreateMultipartUpload

Forceu · 2024-12-19T10:06:57Z

Thanks for the feedback!
Unfortunately there are two problems with the approach:

Deduplication would only be possible, if the chunks are sent in the correct order (otherwise S3 does not support hashing). Also this would always upload the file first, even if it already existed, which will lead to additional cost with AWS
Encryption would only be possible if end-to-end encryption is used or the chunks are sent in the correct order.

Although it would be possible to implement to always send the chunks in the correct order, this also means that in the worst case scenario, where one of the first chunks is not sent yet, the whole file needs to be stored temporary. This scenario is unlikely, but might happen.

Such an implementation would require a lot of rewriting and has some pitfalls. The easier and more robust variant would be to send the chunks in random order and not offer deduplication or server-side encryption., but I am not sure if that is a viable solution to the problem.

Forceu added the enhancement New feature or request label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Multipart S3 Uploads #227

Feature: Multipart S3 Uploads #227

ErikMichelson commented Dec 18, 2024

Forceu commented Dec 19, 2024 •

edited

Loading

Feature: Multipart S3 Uploads #227

Feature: Multipart S3 Uploads #227

Comments

ErikMichelson commented Dec 18, 2024

Forceu commented Dec 19, 2024 • edited Loading

Forceu commented Dec 19, 2024 •

edited

Loading