Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azcopy command taking lot of time while copying large number of files having small size. #2739

Open
divyn10 opened this issue Jun 27, 2024 · 3 comments

Comments

@divyn10
Copy link

divyn10 commented Jun 27, 2024

Which version of the AzCopy was used?

10.24.0

Which platform are you using? (ex: Windows, Mac, Linux)

Linux (running on a k8s pod having image peterdavehello/azcopy)

What command did you run?

az copy [src blob with sas] [destination blob with sas] --overwrite=prompt --from-to=BlobBlob --s2s-preserve-access-tier=false --check-length=false --include-directory-stub=false --s2s-preserve-blob-tags=true --recursive=true --log-level=ERROR

What problem was encountered?

I am trying to copy across region and subscription, there is one container having lot of folder each containing a single file.
The size of file is small (around 2-3 Kbs) but the number of such folders are huge (more than 4-5M)

also using export AZCOPY_CONCURRENCY_VALUE=2000 as suggested here

It is taking lot of time, any solution to speed up this?

How can we reproduce the problem in the simplest way?

Have you found a mitigation/solution?

@ashruti-msft
Copy link
Collaborator

Speeding up AzCopy, especially when dealing with a large number of small files across regions and subscriptions, can be challenging due to the nature of the operation and the limitations of network latency and bandwidth. Can you check if azcopy is able to effectively utilize available bandwidth?
Also, try upgrading to the latest version and see if it improves performance.

@divyn10
Copy link
Author

divyn10 commented Jul 5, 2024

@ashruti-msft Tried with the upgraded version as well. Still taking lot of time.

Below is the data on the blob, which I am trying to migrate.

Active blobs: 1,35,62,799 blobs, 116.62 GiB (1,25,22,15,68,871 bytes). 
Snapshots: 0 blobs, 0 B (0 bytes). Versions: 3,04,91,482 blobs, 351.84 GiB (3,77,78,39,94,902 bytes). 
Deleted blobs: 42,61,418 blobs, 82.43 GiB (88,50,99,41,622 bytes). 
Total: 4,83,15,699 items, 550.89 GiB (5,91,51,55,05,395 bytes).

@tanyasethi-msft
Copy link
Member

Thanks @divyn10 for your response. You can take the following measures to optimize the performance -

  1. Ensure that each jobs transfers fewer than a million files - AzCopy job tracking mechanism incurs a significant amount of overhead.
  2. Consider setting the --log-level parameter of your copy, sync, or remove command to ERROR
  3. Setting AZCOPY_CONCURRENT_SCAN to a higher number (Linux only)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants