Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azcopy list - show only top level directories #858

Closed
vmazalov opened this issue Jan 31, 2020 · 16 comments
Closed

azcopy list - show only top level directories #858

vmazalov opened this issue Jan 31, 2020 · 16 comments

Comments

@vmazalov
Copy link

Which version of the AzCopy was used?

Note: The version is visible when running AzCopy without any argument

10.3.4

Which platform are you using? (ex: Windows, Mac, Linux)

Windows

What command did you run?

Note: Please remove the SAS to avoid exposing your credentials. If you cannot remember the exact command, please retrieve it from the beginning of the log file.

azcopy list

What problem was encountered?

It shows list of all files recursively, but we want to show only list of directories at the top level.

How can we reproduce the problem in the simplest way?

Have you found a mitigation/solution?

@gtiao
Copy link

gtiao commented Apr 6, 2021

I'd also like to have this feature. Our directories have hundreds of thousands of files in subdirectories, so it makes azcopy ls uninformative.

@dannyman
Copy link

👍🏻 it is weird that I can only list recursively. I just want a quick list at the top level.

@dinith95
Copy link

yes, I would like to have this feature. I have a set of subdirectories with a large number of files ( thousands maybe millions ) so the plane azcopy list command becomes very time consuming and difficult to deal with.

@joeshull
Copy link

+1 here

@delgadom
Copy link

bash workaround...

# find top level contents
azcopy ls "https://{store}.blob.core.windows.net/{dir}?SAS" | cut -d/ -f 1 | awk '!a[$0]++'

# find directories with depth N
N=3
azcopy ls "https://{store}.blob.core.windows.net/{dir}?SAS" | cut -d/ -f 1-${N} | awk '!a[$0]++'

@vasanth-bk
Copy link

+1 here

@KingLTS
Copy link

KingLTS commented Aug 24, 2022

Same here, I sync many thousands of files with azcopy. The workaround to cut the top-level directories works, but is very slow.

I recommend an optional parameter like "--depth" to determine the required level. But a parameter "--top-level-only" would also work for me.

@KrisJanssen
Copy link

Yes, this is sorely needed...

@marc-hb
Copy link

marc-hb commented Dec 3, 2022

sftp:// is the best workaround if you can afford it.
https://learn.microsoft.com/en-us/azure/data-factory/connector-sftp

I recommend the lftp client, it has been working great for us. lftp has a mirror command that has been working much, much better than azcopy sync.

You can of course use any other sftp:// client at the same time, they're not mutually exclusive. And keep using azcopy too if you can find a reason for it (we could not)

Here's a list of other azcopy bugs that lftp + sftp:// does NOT have:

@matthai
Copy link

matthai commented Mar 18, 2023

+1

1 similar comment
@gabrielwebster
Copy link

+1

@marc-hb
Copy link

marc-hb commented Jul 15, 2023

Please delete "+1" comments and use the "thumbs up" button at the top. Unlike "+1" comments, the latter is usable by both machines and humans.

@sizhky
Copy link

sizhky commented Jul 28, 2023

This is a work around using python's azure-blob-storage sdk

from azure.storage.blob import BlobServiceClient
from azure.storage.blob import BlobPrefix

# Replace these with your actual connection string and container name
connection_string = "your_connection_string"
container_name = "your_container_name"

# Create a BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string(connection_string)

# Get a reference to the container
container_client = blob_service_client.get_container_client(container_name)

# List only the top-level directories (virtual directories)
directories = set()

# Use walk_blobs with delimiter="/" to list virtual directories only
for blob in container_client.walk_blobs(name_starts_with='folder/where/you/want/to/ls/', delimiter='/'):
    if isinstance(blob, BlobPrefix):
        directories.add(blob.name[:-1])  # Remove trailing '/' from the name

# Print the top-level directories
for directory in directories:
    print(directory)

@seanmcc-msft
Copy link
Member

Unfortunately, we are not going to add this functionality. We are focused on documentation, testing, and quality, and this feature would represent a large increase in scope for the product.

@KrisJanssen
Copy link

KrisJanssen commented Aug 9, 2024

‘Completed’

@marc-hb
Copy link

marc-hb commented Aug 10, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests