Skip to content

Cost of data transfers

Ze Qian Zhang edited this page Sep 1, 2021 · 6 revisions
  • AzCopy v10 by default uses 8MB block sizes. Your files are uploaded to the service in 8MB chunks.
  • The block size is configurable through the block-size flag, and can be increased up to 100MB.

Ingress

  • You will get charged for transactions costs.
  • If you have 1TB data, it will cost AzCopy at least 131,072 REST API requests using 8MB blocks. Note that network failures, or throttling errors may trigger retries for the REST requests but will likely be negligible overall.
  • Based on the pricing info here, for an LRS Hot account in West US 2 the ingress will cost 85 cents.
  • Please leverage the pricing calculator for estimates.

Egress

  • In addition to the above, for egress AzCopy will need to enumerate blobs using List Blobs calls. 10,000 List Blobs calls cost only 5 cents. Each List Blob call returns up to 5000 blob items in the list.
  • Performing service-to-service transfers counts towards egress on the source storage account. Based upon the bandwidth pricing details here, within the first 10 TB (but past the first free 5GB) of egress each month, You'll be paying $0.112 per GB transferred out of the source storage account.
    • Note that S2S transfers within the same region do not count towards egress.

Parallel Scanning

By default AzCopy uses parallel hierarchical listing for the Blob endpoint in order to speed up the listing process. For example, given a virtual directory structure that looks like the following:

/dir1

/dir1/dir2

/dir1/dir3

/dir1/dir4

AzCopy first issues a list call at the /dir1 level, and it will get back all the sub-directories in addition to the blobs that are directly under /dir1. Subsequent listing calls would then be issued for /dir1/dir2, /dir1/dir3, and /dir1/dir4 in parallel. This access pattern is optimized for most scenarios, where lots of virtual directories are present and each contains some number of files and sub-directories. And in the case of smaller blobs, parallel listing may also boost the throughput as it makes sure that the scanning process is not the bottleneck.

The down side is that more IOs are used for listing, compared to a flat listing, which issues a call at the /dir1 level and gets back all the blobs under it recursively. The flat listing pattern is most optimized for cases where most blobs are sitting on the same level, e.g. they are all directly under /dir1.

To reduce the IOs/cost or optimize for a flat structure, the user can choose to disable parallel hierarchical listing by setting the environment variable AZCOPY_DISABLE_HIERARCHICAL_SCAN to true.