You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on a brief look into the code, it seems that uploads to S3 are performed by storing the chunks of an upload onto disk and then uploading the complete stored file using the aws-sdk Uploader.
This approach has the disadvantage that the complete file needs to be stored at least once on disk, those raising the disk space requirements for a Gokapi deployment.
S3 allows performing Multipart Uploads (see link 1), that are based on chunks as well. I took a quick look at the AWS SDK for go which you're using, and it should be possible to use the Multipart upload directly (see link 2).
My idea would be to initialize a Multipart Upload when an upload to Gokapi is started, and then upload each received chunk as a single part to S3. This would reduce the required amount of disk space as well as might resolve the problem of S3 uploads "hanging" on 99 % of the progress bar for several minutes.
Thanks for the feedback!
Unfortunately there are two problems with the approach:
Deduplication would only be possible, if the chunks are sent in the correct order (otherwise S3 does not support hashing). Also this would always upload the file first, even if it already existed, which will lead to additional cost with AWS
Encryption would only be possible if end-to-end encryption is used or the chunks are sent in the correct order.
Although it would be possible to implement to always send the chunks in the correct order, this also means that in the worst case scenario, where one of the first chunks is not sent yet, the whole file needs to be stored temporary. This scenario is unlikely, but might happen.
Such an implementation would require a lot of rewriting and has some pitfalls. The easier and more robust variant would be to send the chunks in random order and not offer deduplication or server-side encryption., but I am not sure if that is a viable solution to the problem.
Hi, thanks for developing Gokapi.
Based on a brief look into the code, it seems that uploads to S3 are performed by storing the chunks of an upload onto disk and then uploading the complete stored file using the aws-sdk Uploader.
This approach has the disadvantage that the complete file needs to be stored at least once on disk, those raising the disk space requirements for a Gokapi deployment.
S3 allows performing Multipart Uploads (see link 1), that are based on chunks as well. I took a quick look at the AWS SDK for go which you're using, and it should be possible to use the Multipart upload directly (see link 2).
My idea would be to initialize a Multipart Upload when an upload to Gokapi is started, and then upload each received chunk as a single part to S3. This would reduce the required amount of disk space as well as might resolve the problem of S3 uploads "hanging" on 99 % of the progress bar for several minutes.
1: https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html#mpu-process
2: https://docs.aws.amazon.com/sdk-for-go/api/service/s3/#S3.CreateMultipartUpload
The text was updated successfully, but these errors were encountered: