Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CopyObjectAsync is much slower than mc tool #372

Closed
ngbrown opened this issue Feb 21, 2020 · 6 comments
Closed

CopyObjectAsync is much slower than mc tool #372

ngbrown opened this issue Feb 21, 2020 · 6 comments

Comments

@ngbrown
Copy link
Contributor

ngbrown commented Feb 21, 2020

In CalculateMultiPartSize, which is triggered if the source is over 5 GiB, it appears the copy is made to maximize the number of parts (which is 10,000), bounded by the minimum part size (5 MiB).

public static object CalculateMultiPartSize(long size)

On the other hand, the mc command line copies with much fewer parts. It seems to optimize for a higher part size (500 MiB), or 100 times larger than the minimum part size.

The speed difference is significant. A 7 GiB file copies between buckets on the commandline in 1 minute. With the .net client, I canceled after 25 minutes, and it was only 56% done.

Is there any way to tune this?

@kannappanr
Copy link
Contributor

kannappanr commented Feb 21, 2020

@ngbrown Thanks for filing this issue. As far as copyobjectpart is concerned, I agree with you since the data is already in the backend. We can use 5GB part size. We do not have to even calculate part size here.

@ngbrown
Copy link
Contributor Author

ngbrown commented Feb 21, 2020

The go client seems to do it quite a bit differently by getting the number of parts first:

https://github.com/minio/minio-go/blob/8e3928466d18a04caf35a76a0f4a41d65ab1c9fc/api-compose-object.go#L525

I did start a branch to use be able to use different minimum part sizes. It's not tune-able like I suggested, but it is a lot faster than the 5 MiB parts. I started off with 512 MiB parts for copying between servers, and 64 MiB for uploading.

ngbrown@7f2acd1

If you have a suggestion on which way to go, I can update my branch and start a pull request.

@poornas
Copy link
Contributor

poornas commented Feb 21, 2020

@ngbrown, the reason mc is much faster than dotnet SDK here is because minimum part size used by mc is 128MiB vs the much lower 5MiB on dotnet SDK. We can make the part size configurable so that you can optionally bump up the minimum part size if your network can handle it.

@kannappanr
Copy link
Contributor

@ngbrown, we can set the part size in the case of CopyObjectPart to 5GiB and the PutObjectPart can still be 5MiB. In another PR we can set the default to be 64MiB, but also allow clients to override that value. Also, please use already defined constants for the numbers, MaximumPartSize and MinimumPartSize

@ngbrown
Copy link
Contributor Author

ngbrown commented Feb 24, 2020

Regarding setting the part size to 5GiB, I used 512 MiB because parts completing appears to be the only way that there can be any progress reported on server copies.

I realize there's no progress reported in the .NET client right now, but IProgress<T> is a globally available interface for exactly that sort of thing.

@kannappanr, You mentioned using the existing constants, MaximumPartSize and MinimumPartSize. These values are defined by the S3 API, so I'm not sure where I would have used them in my code changes. Is the idea that those would be directly set as the means of configuring? Currently the constants class is internal.

I would suggest adding two new constants, DefaultUploadPartSize and DefaultServerTransferPartSize, then use those as the defaults for configuration. Doing the configuring in a constructor configuration object seems to be the standard way in .NET libraries. I would add a MinioClient constructor overload using a new configuration object and obsolete the current one.

@BigUstad
Copy link
Contributor

PR #530 has the fix to speed up PUT & COPY operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants