-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot delete empty files and folders containing them #2466
Comments
Do you see any folders with consecutive / in the resource path you are trying to delete? If the two levels (below) look exactly the same to you, it is very likely this is the case, since it is the behavior of our navigation bug on this kinds of resource paths. path (<- empty file) Neither AzCopy nor us support resources with "" names. Although it seems working before, the old code actually has undefined behavior with some operations on those kinds of resources. Even though the name "" is allowed, it is not recommended. If possible, you can modify your the resource path that your Spark program writes to to prevent this. See Azure Storage Blob naming convention for more information about what Blob naming conventions are. |
Thank you for the respense but sincerily I didn't get the point. I checked the blob namig convention from the link you sugget and one blob name example is "/a/b.txt" and is written the following: "You can take advantage of the delimiter character when enumerating blobs". So basically is specified that is possible to simulate the hierarchical structure using the "/" character. var df = spark.range(1,2000) This is the simulated hierarcal structure that is created: my (<- empty file) I tried to delete the root dir "my" with the Azure Storage Explorer and basically it deletes only the files: _SUCCESS, part-00000-37923afa-7ac0-4e61-a68e-e6385e0aa466-c000.csv, part-00001-37923afa-7ac0-4e61-a68e-e6385e0aa466-c000.csv; but the empty files and (so all the directory containing them) remained there. I tried to use azure-cli with the following command: |
Sorry for being unclear. For example (see the screenshot), suppose in my blob container
Due to some improper path handling, we incorrectly show a file named |
The point here is that we must create output files with Spark with a hierarchical structure. As I explained with the previous code example, we are simply using the Spark api to create output files passing an url with a hierarchical structure (for example "wasbs://[email protected]/my/test/out/path"), nothing more. |
We have recently discovered a problem with deleting blobs with metadata hdi_isFolder=True. This can occur when you are using some automatic tools to upload data to your blob container. @sdecri Could you please verify if your data with the problem has the metadata set on them? See #2665 for detailed explanation. |
I can confirm that the blob representing the directory (but also the empty file described above) has the metadata hdi_isFolter=true. |
I am uploading blobs to storage from databricks notebook and observing the same behaviour. For every directory in output path, and empty blob ("Directory markers" with metadata hdi_isfolder="true") is created. |
For the record, anyone blocked by this can try:
|
This has been fixed in AzCopy 10.5. We'll be updating the integrated version to that in release 1.14.1. The integration update has been merged. |
Going to repoen this and put in 1.15. AzCopy's fix appears to only be working if that file is the only thing you're deleting. Will add that to our known issues, go ahead and ship, and we'll pick up AzCopy's fix to that in the future. |
Storage Explorer Version: 1.11.2
Build Number: 20191217.4
Platform/OS: Ubuntu - Windows 10
Architecture: x64
Regression From: Yes it worked in a previous version but I don't remeber which one. I think it worked before the introduction of azCopy.
Bug Description
When I created folders with Spark API, an empty file is created for each folder and subfolders of the heirarchy. If I wnat to delete these files using Azure Storage Explorer an error occurs with the message: "failed to perform remove command due to error: nothing found to remove", and the file is not removed. The same error occurs also if I try to delete a folder containing an empty file.
Steps to Reproduce
path (<- empty file)
path (<- directory)
_|- to (<- empty file)
_|- to (<- directory)
___|- output (<- empty file)
___|- output (<- directory)
______|- part-0000.csv (<- content of one dataframe partition)
______|- .... (<- content of other dataframe partitions)
Expected Experience
I want to delete these empty files and also folders containing them using Azure Storage Explorer like I did in the previous versions.
Actual Experience
Empty files and folders containing them cannot be deleted
Additional Context
From the Azure protal it is (fortunately) possible to delete these empty files but not folders including them.
The text was updated successfully, but these errors were encountered: