Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subfolder with no name under /data folder #12065

Open
umiddey opened this issue Jan 23, 2025 · 2 comments
Open

Subfolder with no name under /data folder #12065

umiddey opened this issue Jan 23, 2025 · 2 comments

Comments

@umiddey
Copy link

umiddey commented Jan 23, 2025

Hello,

We have some processes on Glue that creates/merges into Iceberg tables. We noticed that the warehouse subfolder has created a folder with no name but a single /, screenshot attached here:

Image

Because of this, we are facing issues (bad path due to double slash after data, like data//) while trying to read the snaphot using the python library pyiceberg (we need it as we are creating a customized tool to test the latest iceberg snapshots for data quality).

Some guidance on how to solve this or why this folder is created would be nice.

EDIT 1: The unnamed subfolder has data files in it.

EDIT 2: Some more context; I see this pattern only with tables having merge-on-read property.

@RussellSpitzer
Copy link
Member

Are you using the HadoopFileIO or the S3FileIO? I know HadoopFileIO has some quirks with S3 where it creates psudeo-directories some times to mimic posix.

@umiddey
Copy link
Author

umiddey commented Jan 23, 2025

Are you using the HadoopFileIO or the S3FileIO? I know HadoopFileIO has some quirks with S3 where it creates psudeo-directories some times to mimic posix.

I am using the S3FileIO, not HadoopFileIO. Seems it is not exclusive to the file system. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants