Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Multipart S3 Uploads #227

Open
ErikMichelson opened this issue Dec 18, 2024 · 1 comment
Open

Feature: Multipart S3 Uploads #227

ErikMichelson opened this issue Dec 18, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@ErikMichelson
Copy link

Hi, thanks for developing Gokapi.

Based on a brief look into the code, it seems that uploads to S3 are performed by storing the chunks of an upload onto disk and then uploading the complete stored file using the aws-sdk Uploader.
This approach has the disadvantage that the complete file needs to be stored at least once on disk, those raising the disk space requirements for a Gokapi deployment.

S3 allows performing Multipart Uploads (see link 1), that are based on chunks as well. I took a quick look at the AWS SDK for go which you're using, and it should be possible to use the Multipart upload directly (see link 2).

My idea would be to initialize a Multipart Upload when an upload to Gokapi is started, and then upload each received chunk as a single part to S3. This would reduce the required amount of disk space as well as might resolve the problem of S3 uploads "hanging" on 99 % of the progress bar for several minutes.

1: https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html#mpu-process
2: https://docs.aws.amazon.com/sdk-for-go/api/service/s3/#S3.CreateMultipartUpload

@Forceu
Copy link
Owner

Forceu commented Dec 19, 2024

Thanks for the feedback!
Unfortunately there are two problems with the approach:

  • Deduplication would only be possible, if the chunks are sent in the correct order (otherwise S3 does not support hashing). Also this would always upload the file first, even if it already existed, which will lead to additional cost with AWS
  • Encryption would only be possible if end-to-end encryption is used or the chunks are sent in the correct order.

Although it would be possible to implement to always send the chunks in the correct order, this also means that in the worst case scenario, where one of the first chunks is not sent yet, the whole file needs to be stored temporary. This scenario is unlikely, but might happen.

Such an implementation would require a lot of rewriting and has some pitfalls. The easier and more robust variant would be to send the chunks in random order and not offer deduplication or server-side encryption., but I am not sure if that is a viable solution to the problem.

@Forceu Forceu added the enhancement New feature or request label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants